This is awesome, I'm happy that Cloudflare is adding more attention into running Python via WebAssembly at the Edge.
I'll try to summarize on how they got it running and what are the drawbacks that they have from their current approach (note: I have deep context on running Python with WebAssembly at the Edge as part of my work in Wasmer).
Cloudflare Workers are enabling Python at the Edge by using Pyodide [1] (Python compiled to WebAssembly via Emscripten).
They bundled Pyodide into Workerd [2], and then use V8 snapshots [3] to try to accelerate startup times.
On their best case, cold starts of Python in Cloudflare Workers are about 1 second.
While this release is great as it allows them to measure the interest of running Python at the Edge, it has some drawbacks. So, what are those?
* Being tied to use only one version of Python/Pyodide (the one that Workerd embeds)
* Package resolution is quite hacky and tied to workerd. Only precompiled "native packages" will be allowed to be used at runtime (eg. using a specific version of numpy will turn to be challenging)
* Architecturally tied to the JS/v8 world, which may show some challenges as they aim to reduce cold start times (in my opinion, it will be quite hard for them to achieve <100ms startup time with their current architecture).
In any case, I welcome this initiative with my open hands and look forward all the cool apps that people will now build with this!
I believe that your summary misunderstands how we will handle versioning. The pyodide /package versions will be controlled by the compatibility date, and we will be able to support multiple in production at once. For packages like langchain (or numpy as you mentioned) the plan is to update quite frequently.
Could you expand on why you believe V8 will be a limiting factor? It is quite a powerful Wasm runtime, and most of the optimizations we have planned don’t really depend on the underlying engine.
Edit: Also just want to clarify that this is not a POC, it is a Beta that we will continue improving on and eventually GA.
> pyodide /package versions will be controlled by the compatibility date
That's exactly the issue that I'm mentioning. Ideally you should be able to pin any Python version that you want to use in your app: 2.7, 3.8 or 3.9 regardless of a Workerd compatibility date. Some packages might work in Python 3.11 but not in 3.12, for example.
Unfortunately, Python doesn't have the full transpiler architecture that JS ecosystem has, and thus "packaging" Python applications into different "compatibility" bundles will prove much more challenging (webpack factor).
> Could you expand on why you believe V8 will be a limiting factor?
Sure thing! I think we probably all agree that V8 is a fantastic runtime. However, the tradeoffs that make V8 great for a browser use case, makes the runtime more challenging for Edge environments (where servers can do more specialized workloads on trusted environments).
Namely, those are:
* Cold starts: V8 Isolates are a bit heavy to initialize. On it's current form it can add up from ~2-5ms in startup just by initializing an Isolate
* Snapshots can be quite heavy to save and restore
* Not architected with the Edge use case in mind: there are many tricks that you can do if you skip the JS middleware and go all in into a Wasm runtime, that are hard to do with the current V8/Workerd architecture.
In any case, I would love to be proven wrong on the long term and I cheer for <100ms cold starts when running Python in Cloudflare Workers. Keep up the good work!
We discussed a separate configuration field for Python version. It’s not technically challenging, this was a design choice we made to simplify configuration for users and encourage more efficiencies in terms of shared dependencies.
Your concerns about V8 would impact JavaScript Workers as well and do not match what we see in production. It is also definitely possible to invoke C++ host functions directly from Wasm with V8.
> Your concerns about V8 would impact JavaScript Workers as well and do not match what we see in production
Interesting! I thought V8 snapshots were mainly used in the Pyodide context, as I could not find any other usage in WorkerD (other than promise tagging and jsg::MemoryTracker).
Are you using V8 snapshots as well for improving cold starts in JS applications?
I was responding to your point about isolates and cold starts. Snapshots are unique to Python, but V8 does not seem relevant here, all this is doing is initializing the linear buffer that backs Wasm memory for a particular instance. We have a lot of ideas here, some of which are mentioned in the blog post.
I disagree about V8 not being optimized for edge environments. The needs of a browser are actually very much aligned with needs of edge, namely secure sandboxing, extremely fast startup, and an extreme commitment to backwards compatibility (important so that all apps can always run on a single runtime version).
Additionally, V8 is just much better at running JavaScript than you can hope to achieve in a Wasm-based JS implementation. And JavaScript is the most popular web development language (even server-side).
> On it's current form it can add up from ~2-5ms in startup just by initializing an Isolate
So, you and I seemingly have a disagreement on what "cold start" means. Wasmer advertises its own "cold start" time to be 50ns. This is only remotely possible if the application is already loaded in memory and ready to go before the request arrives. In my mind, this is not a "cold start". If the application is already loaded, then it's a "warm start". I haven't spent the time to benchmark our warm start time (TBH I'm a little unclear on what, exactly, is counted in this measurement), but if the app is already loaded, we can complete whole requests in a matter of microseconds, so the 5ms number isn't the correct comparison.
To me, "cold start" time is the time to load an application, without prior knowledge of what application will be needed. That means it includes the time to fetch the application code from storage. For a small application, we get around 5ms.
Note that the time to initialize an isolate isn't actually on the critical path to cold start, since we can pre-initialize isolates and have them ready to go before knowing what application they will run. That said, we haven't implemented this optimization historically, since the benefit would be relatively small.
However, with Pyodide this changes a bit. We can pre-initialize Pyodide isolates, before we know which Python app needs to run. Again, this isn't implemented yet, but we expect the benefits to be much larger than with plain JS isolates, so we plan to do so.
> Ideally you should be able to pin any Python version that you want to use in your app:
Minimizing application size is really essential to making edge compute inexpensive -- to run every one of two million developers' applications in every of our hundreds of locations at a reasonable price, we need to be able to run thousands of apps simultaneously on each machine. If each one bundles its entire language runtime, that's not gonna fit. That does mean that many applications have to agree to use the same versions of common runtime libraries, so that they can share the same copies of that code. The goal is to keep most updates to Pyodide backwards-compatible so that we can just keep everyone on the latest version. When incompatible changes must be made, we'll have to load multiple versions per machine, but that's still better than one copy per app.
Hey Kenton, great to see you chiming in here as well!
> Additionally, V8 is just much better at running JavaScript than you can hope to achieve in a Wasm-based JS implementation. And JavaScript is the most popular web development language (even server-side).
I agree with this statement as of today. Stay tuned because very cool things are coming on Wasm land (Spidermonkey will soon support JITted workloads inside of Wasm, bringing the speed much closer to V8!)
> Note that the time to initialize an isolate isn't actually on the critical path to cold start, since we can pre-initialize isolates and have them ready to go before knowing what application they will run
That's a good point. Although, you are kind of optimizing now the critical path to cold start by actually knowing what the app is running (if is Python, restore it from a Snapshot). So even though if isolate initialization is not in the critical path, there are other things on the critical path that amounts for the extra second of latency in cold starts for Python, I would assume.
> Minimizing application size is really essential to making edge compute inexpensive
By leveraging on proper-defined dependencies, you just need to compile and load in memory the dependency module once (lets say Python) and have "infinite" capacity for initializing them. Basically, if you put Python out of the picture and consider it a dependency of an app, then you can suddenly scale apps as much as you want there!
For example: having 10 Python versions (running thousands of apps) will have a overhead of 5Mb (Python binary size in avg) * 10 versions (plus a custom memory for each initialization of the app, which is required in either strategy) ~= 50Mb, so the overhead of pinning a specific Python version should be truly minimal on the server (at least when fully leveraging on a Wasm runtime)
As a side note, Wasmer offers an Edge product that has none of the drawbacks commented when running Python in Cloudflare Workers, providing incredibly fast cold-start times:
Wasmer claims to cold start in 50ns. This is obviously impossible: That's 1000x faster than an NVMe read, which is about the fastest cold storage you can get.
At least make your claims credible before posting them in competitors' HN threads.
I believe this statement was based on Wasmer Edge product page, which was mainly measuring instantiation time if the module is already loaded in the Edge Node (which can be assumed true for most common programs such as WinterJS, static-web-server and more). As it seems the 50ns timing was not well understood, we updated our product page to reflect a more accurate timing thanks to your feedback.
We can get into sub-millisecond cold starts (loading a module from disk, instantiating it and serving the first request) for the best case scenario, a timing now properly reflected in the Wasmer Edge product page.
In any case, and to be clear, we can get into much faster cold starts than the full second than Cloudflare Workers currently offers for Python. Wasmer Edge should be at least 10x faster than Cloudflare Workers for Python cold starts, with a cold-start time of less than <100ms. You can also expect to see much faster cold start times in next releases :)
> measuring instantiation time if the module is already loaded in the Edge Node (which can be assumed true for most common programs such as WinterJS, static-web-server and more)
It sounds like you're saying that your hosting never incurs cold starts in the first place, because you always preload all customers' applications into memory before serving any traffic.
That's a fine optimization, if you can fit everything. Cloudflare Workers also preloads the most popular apps before serving traffic -- but we can't load all applications this way since they wouldn't all fit into memory. We only consider it a "cold start" when a request arrives for an application that wasn't preloaded.
Pyodide uses its own event loop which just subscribes to the JavaScript event loop. My suspicion is that this will be more efficient than using uvloop since v8's event loop is quite well optimized. It also allows us to await JavaScript thenables from Python and Python awaitables from JavaScript, whereas I would be worried about how this behaves with separate event loops. Also, porting uvloop would probably be hard.
I'll try to summarize on how they got it running and what are the drawbacks that they have from their current approach (note: I have deep context on running Python with WebAssembly at the Edge as part of my work in Wasmer).
Cloudflare Workers are enabling Python at the Edge by using Pyodide [1] (Python compiled to WebAssembly via Emscripten). They bundled Pyodide into Workerd [2], and then use V8 snapshots [3] to try to accelerate startup times.
On their best case, cold starts of Python in Cloudflare Workers are about 1 second.
While this release is great as it allows them to measure the interest of running Python at the Edge, it has some drawbacks. So, what are those?
In any case, I welcome this initiative with my open hands and look forward all the cool apps that people will now build with this![1] https://pyodide.org/
[2] https://github.com/cloudflare/workerd/blob/main/docs/pyodide...
[3] https://github.com/cloudflare/workerd/pull/1875
Edit: updated wording from "proof of concept" to "release" to reflect the clarification from the Cloudflare team