To be clear, Popen is very different from all the other options. That's for running other programs.
Process is low-level and is almost never what you want. Pool is "mid-level", and usually isn't what you want. ProcessPoolExecutor is usually what you want, it is the "one obvious way to do it". That's not at all clear from the docs though.
The one obvious way to do it, in general, is: subprocess.run for running external processes, subprocess.Popen for async interaction with external processes, and concurrent.futures.ProcessPoolExecutor for Python multiprocessing.
Your other complaints about actually using the multiprocessing stuff are 100% valid. Error handling, cancellation, etc. is all very difficult. Passing data back and forth between the main process and subprocesses is not trivial.
But I do want to emphasize that there is a somewhat-well-defined gradient of lower- and higher-level tools in the standard library, and your "obvious way to do it" should usually start at the higher end of that gradient.
You might also want to look into the third-party Joblib library, which makes process parallelism a lot less painful for the straightforward use case of "run a function on a large amount of data, using multiple OS processes."
You're saying ProcessPoolExecutor is the "one obvious way to do it" but mention how the docs don't make this clear... That makes it not obvious. And since Python has built-in async/await keywords for asyncio now, shouldn't that be the one obvious correct way of doing concurrency?
Imagining I'm a newbie to Python concurrency, I Googled "concurrency in Python" and picked the first result from the official docs. https://docs.python.org/3/library/concurrency.html It's a list of everything except asyncio, and the first item on the list is the low-level `threading` :S At least that page mentions ThreadPoolExecutor, queue, and asyncio as alternatives, but I'm still lost on what is the correct way.
I would say that criticizing the documentation is distinct from criticizing the language itself. The Python standard library has had documentation problems for a while now, but realistically so does pretty much every other programming language. If you want to learn how to do things, you need a book.
If you're still interested in the topic, async/await is intended to be single threaded by default, but has some support for pushing jobs off to threads or processes, using a concurrent.futures Executor internally. Normally if I want process parallelism however, I don't bother with async/await and I go for the more explicit solution.
Again, I think there is a very clear sense of the one obvious way to do it in the minds of many python programmers, but it might not be expressed well in the official documentation. This would be a great opportunity to write a book, for example.
The language itself has the issue of there being many separate ways to do equivalent things here. And async/await wasn't in the language until recently, so people got used to the old ways.
I didn't need a book to deal with Javascript concurrency, for example. JS had its event loop as far back as I can remember, but users are getting concurrency via that without really understanding it anyway. It got promises a while back. Async/await is just syntactical sugar on top of promises. There's hardly any other way to do things. NodeJS has extensions for subprocesses and worker threads, but you don't end up there unless you're looking for a way to do parallelism, and even then you can get by with small Stackoverflow examples.
Process is low-level and is almost never what you want. Pool is "mid-level", and usually isn't what you want. ProcessPoolExecutor is usually what you want, it is the "one obvious way to do it". That's not at all clear from the docs though.
The one obvious way to do it, in general, is: subprocess.run for running external processes, subprocess.Popen for async interaction with external processes, and concurrent.futures.ProcessPoolExecutor for Python multiprocessing.
Your other complaints about actually using the multiprocessing stuff are 100% valid. Error handling, cancellation, etc. is all very difficult. Passing data back and forth between the main process and subprocesses is not trivial.
But I do want to emphasize that there is a somewhat-well-defined gradient of lower- and higher-level tools in the standard library, and your "obvious way to do it" should usually start at the higher end of that gradient.
You might also want to look into the third-party Joblib library, which makes process parallelism a lot less painful for the straightforward use case of "run a function on a large amount of data, using multiple OS processes."