IMO the big win for Elixir/Nx/Bumblebee/etc is that you can do batched distributed inference out of the box without deploying anything separate to your app or hitting an API. Massive complexity reduction and you can more easily scale up or down. https://hexdocs.pm/nx/Nx.Serving.html#content
I've been really curious about BEAM languages but never made the leap. How well does it manage heterogeneous compute? I'm used to other languages making me define what happens on CPU vs GPU and defining cross-machine talk around those kinds of considerations.
What parts of that does elixir (and company) allow me to not write? Is there a good balance between abstractions when it comes to still maybe wanting control over what goes where (heteregeneity)?
Super curious and kinda looking for an excuse here :)
The BEAM is pretty high level, and it's REALLY good at managing distributed compute at the thread or device level.
If you have a parallelizeable workflow, it's very easy to make it (properly!) parallel locally, where by "properly" I mean having supervision trees, sane restart behavior, etc.
And once you have that you can extend that parallelism to different nodes in a network (with the same sanity around supervision and discovery) basically for free. Like, one-line-of-code for free.
Nonetheless, it's all message-passing, and so pretty high level. AFAIK it's not designed for parallelizing compute at GPU scale.
That being said, if you have multiple GPUs and multiple machines that have to coordinate between them, Elixir/Erlang is pretty much perfect.
If you have a multistage workflow with concurrent and possibly heterogenous requests that hit all the time you can very easily batch and distribute the workflow among compute resources in "the most natural way possible" easily, without having to resort to grace periods, etc. which introduce latency. I think that would be much harder to accomplish with java.
You might want to look at the java aparapi project
Aparapi allows Java developers to take advantage of the compute power of GPU and APU devices by executing data parallel code fragments on the GPU rather than being confined to the local CPU. It does this by converting Java bytecode to OpenCL at runtime and executing on the GPU, if for any reason Aparapi can't execute on the GPU it will execute in a Java thread pool.
And there's also a scale to 0 story for when you're not using that GPU at all: https://github.com/phoenixframework/flame
1 language/toolchain. 1 deployable app. Real time and distributed machine learning baked in. 1 dev can go really far.