This is awesome, I'm happy that Cloudflare is adding more attention into running...

kflansburg · on April 2, 2024

I believe that your summary misunderstands how we will handle versioning. The pyodide /package versions will be controlled by the compatibility date, and we will be able to support multiple in production at once. For packages like langchain (or numpy as you mentioned) the plan is to update quite frequently.

Could you expand on why you believe V8 will be a limiting factor? It is quite a powerful Wasm runtime, and most of the optimizations we have planned don’t really depend on the underlying engine.

Edit: Also just want to clarify that this is not a POC, it is a Beta that we will continue improving on and eventually GA.

syrusakbary · on April 2, 2024

> pyodide /package versions will be controlled by the compatibility date

That's exactly the issue that I'm mentioning. Ideally you should be able to pin any Python version that you want to use in your app: 2.7, 3.8 or 3.9 regardless of a Workerd compatibility date. Some packages might work in Python 3.11 but not in 3.12, for example.

Unfortunately, Python doesn't have the full transpiler architecture that JS ecosystem has, and thus "packaging" Python applications into different "compatibility" bundles will prove much more challenging (webpack factor).

> Could you expand on why you believe V8 will be a limiting factor?

Sure thing! I think we probably all agree that V8 is a fantastic runtime. However, the tradeoffs that make V8 great for a browser use case, makes the runtime more challenging for Edge environments (where servers can do more specialized workloads on trusted environments).

Namely, those are:

  * Cold starts: V8 Isolates are a bit heavy to initialize. On it's current form it can add up from ~2-5ms in startup just by initializing an Isolate
  * Snapshots can be quite heavy to save and restore
  * Not architected with the Edge use case in mind: there are many tricks that you can do if you skip the JS middleware and go all in into a Wasm runtime, that are hard to do with the current V8/Workerd architecture.

In any case, I would love to be proven wrong on the long term and I cheer for <100ms cold starts when running Python in Cloudflare Workers. Keep up the good work!

kflansburg · on April 2, 2024

We discussed a separate configuration field for Python version. It’s not technically challenging, this was a design choice we made to simplify configuration for users and encourage more efficiencies in terms of shared dependencies.

Your concerns about V8 would impact JavaScript Workers as well and do not match what we see in production. It is also definitely possible to invoke C++ host functions directly from Wasm with V8.

syrusakbary · on April 2, 2024

> Your concerns about V8 would impact JavaScript Workers as well and do not match what we see in production

Interesting! I thought V8 snapshots were mainly used in the Pyodide context, as I could not find any other usage in WorkerD (other than promise tagging and jsg::MemoryTracker).

Are you using V8 snapshots as well for improving cold starts in JS applications?

kflansburg · on April 2, 2024

I was responding to your point about isolates and cold starts. Snapshots are unique to Python, but V8 does not seem relevant here, all this is doing is initializing the linear buffer that backs Wasm memory for a particular instance. We have a lot of ideas here, some of which are mentioned in the blog post.

syrusakbary · on April 2, 2024

Awesome. Eager to see how the product evolves :)

kentonv · on April 2, 2024

(Cloudflare Workers tech lead here.)

I disagree about V8 not being optimized for edge environments. The needs of a browser are actually very much aligned with needs of edge, namely secure sandboxing, extremely fast startup, and an extreme commitment to backwards compatibility (important so that all apps can always run on a single runtime version).

Additionally, V8 is just much better at running JavaScript than you can hope to achieve in a Wasm-based JS implementation. And JavaScript is the most popular web development language (even server-side).

> On it's current form it can add up from ~2-5ms in startup just by initializing an Isolate

So, you and I seemingly have a disagreement on what "cold start" means. Wasmer advertises its own "cold start" time to be 50ns. This is only remotely possible if the application is already loaded in memory and ready to go before the request arrives. In my mind, this is not a "cold start". If the application is already loaded, then it's a "warm start". I haven't spent the time to benchmark our warm start time (TBH I'm a little unclear on what, exactly, is counted in this measurement), but if the app is already loaded, we can complete whole requests in a matter of microseconds, so the 5ms number isn't the correct comparison.

To me, "cold start" time is the time to load an application, without prior knowledge of what application will be needed. That means it includes the time to fetch the application code from storage. For a small application, we get around 5ms.

Note that the time to initialize an isolate isn't actually on the critical path to cold start, since we can pre-initialize isolates and have them ready to go before knowing what application they will run. That said, we haven't implemented this optimization historically, since the benefit would be relatively small.

However, with Pyodide this changes a bit. We can pre-initialize Pyodide isolates, before we know which Python app needs to run. Again, this isn't implemented yet, but we expect the benefits to be much larger than with plain JS isolates, so we plan to do so.

> Ideally you should be able to pin any Python version that you want to use in your app:

Minimizing application size is really essential to making edge compute inexpensive -- to run every one of two million developers' applications in every of our hundreds of locations at a reasonable price, we need to be able to run thousands of apps simultaneously on each machine. If each one bundles its entire language runtime, that's not gonna fit. That does mean that many applications have to agree to use the same versions of common runtime libraries, so that they can share the same copies of that code. The goal is to keep most updates to Pyodide backwards-compatible so that we can just keep everyone on the latest version. When incompatible changes must be made, we'll have to load multiple versions per machine, but that's still better than one copy per app.

syrusakbary · on April 2, 2024

Hey Kenton, great to see you chiming in here as well!

> Additionally, V8 is just much better at running JavaScript than you can hope to achieve in a Wasm-based JS implementation. And JavaScript is the most popular web development language (even server-side).

I agree with this statement as of today. Stay tuned because very cool things are coming on Wasm land (Spidermonkey will soon support JITted workloads inside of Wasm, bringing the speed much closer to V8!)

> Note that the time to initialize an isolate isn't actually on the critical path to cold start, since we can pre-initialize isolates and have them ready to go before knowing what application they will run

That's a good point. Although, you are kind of optimizing now the critical path to cold start by actually knowing what the app is running (if is Python, restore it from a Snapshot). So even though if isolate initialization is not in the critical path, there are other things on the critical path that amounts for the extra second of latency in cold starts for Python, I would assume.

> Minimizing application size is really essential to making edge compute inexpensive

By leveraging on proper-defined dependencies, you just need to compile and load in memory the dependency module once (lets say Python) and have "infinite" capacity for initializing them. Basically, if you put Python out of the picture and consider it a dependency of an app, then you can suddenly scale apps as much as you want there!

For example: having 10 Python versions (running thousands of apps) will have a overhead of 5Mb (Python binary size in avg) * 10 versions (plus a custom memory for each initialization of the app, which is required in either strategy) ~= 50Mb, so the overhead of pinning a specific Python version should be truly minimal on the server (at least when fully leveraging on a Wasm runtime)

hoodchatham · on April 2, 2024

Are people maintaining wasi ports of Python 2.7 and 3.8?

syrusakbary · on April 7, 2024

As a side note, Wasmer offers an Edge product that has none of the drawbacks commented when running Python in Cloudflare Workers, providing incredibly fast cold-start times:

https://wasmer.io/templates?language=python

kentonv · on April 8, 2024

Wasmer claims to cold start in 50ns. This is obviously impossible: That's 1000x faster than an NVMe read, which is about the fastest cold storage you can get.

At least make your claims credible before posting them in competitors' HN threads.

syrusakbary · on April 8, 2024

> Wasmer claims to cold start in 50ns

I believe this statement was based on Wasmer Edge product page, which was mainly measuring instantiation time if the module is already loaded in the Edge Node (which can be assumed true for most common programs such as WinterJS, static-web-server and more). As it seems the 50ns timing was not well understood, we updated our product page to reflect a more accurate timing thanks to your feedback.

We can get into sub-millisecond cold starts (loading a module from disk, instantiating it and serving the first request) for the best case scenario, a timing now properly reflected in the Wasmer Edge product page.

In any case, and to be clear, we can get into much faster cold starts than the full second than Cloudflare Workers currently offers for Python. Wasmer Edge should be at least 10x faster than Cloudflare Workers for Python cold starts, with a cold-start time of less than <100ms. You can also expect to see much faster cold start times in next releases :)

kentonv · on April 9, 2024

> measuring instantiation time if the module is already loaded in the Edge Node (which can be assumed true for most common programs such as WinterJS, static-web-server and more)

It sounds like you're saying that your hosting never incurs cold starts in the first place, because you always preload all customers' applications into memory before serving any traffic.

That's a fine optimization, if you can fit everything. Cloudflare Workers also preloads the most popular apps before serving traffic -- but we can't load all applications this way since they wouldn't all fit into memory. We only consider it a "cold start" when a request arrives for an application that wasn't preloaded.

deanCommie · on April 2, 2024

> (in my opinion, it will be quite hard for them to achieve <100ms startup time with their current architecture).

Who's running python workloads with sub-100ms latency requirements?

panqueca · on April 2, 2024

Does this architecture supports uvloop?

hoodchatham · on April 2, 2024

Pyodide uses its own event loop which just subscribes to the JavaScript event loop. My suspicion is that this will be more efficient than using uvloop since v8's event loop is quite well optimized. It also allows us to await JavaScript thenables from Python and Python awaitables from JavaScript, whereas I would be worried about how this behaves with separate event loops. Also, porting uvloop would probably be hard.

syrusakbary · on April 2, 2024

As far as I know uvloop is not supported in Pyodide, mainly because it requires compiling libuv into WebAssembly (which is possible but not trivial).

In any case, it shall be possible to run uvloop fully inside of WebAssembly. However, doing so will prove challenging using their current architecture