The thing I find infuriating about docker is once you try to use it for serious machine learning, you absolutely require mounting your data. On a small scale we're talking s3 mounts, for large scale it's ceph mounts or (please don't) NFS.
Oh and it must be an actual mount because speed. Or to put it differently: if you don't use kernel mounts your AI engineers need to design their training loop 10x better. It can work but they'll need to predict long in advance what data they need, get it with separate threads and have a proper internal filesystem. Which is mostly not happening, and also I don't like giving that as an answer.
And you can't mount in docker! Not even FUSE.
So we fake distributions mount command to send a GRPC message "out", where the real (I know, containerization, there's only 1 kernel) kernel will mount it so it becomes visible in the container at the point requested. It's not a great solution.
I should have switched to running docker containers in Firecracker looooong ago. Loooooong ago.
I interpreted it to mean, it was much harder than something else -- than another approach. But I couldn't figure out what other approach they had in mind.
Oh and it must be an actual mount because speed. Or to put it differently: if you don't use kernel mounts your AI engineers need to design their training loop 10x better. It can work but they'll need to predict long in advance what data they need, get it with separate threads and have a proper internal filesystem. Which is mostly not happening, and also I don't like giving that as an answer.
And you can't mount in docker! Not even FUSE.
So we fake distributions mount command to send a GRPC message "out", where the real (I know, containerization, there's only 1 kernel) kernel will mount it so it becomes visible in the container at the point requested. It's not a great solution.
I should have switched to running docker containers in Firecracker looooong ago. Loooooong ago.
reply