It'll mostly help for debugging and lowering RAM (not VRAM) usage. Otherwise it ...

jmward01 · on July 13, 2024

Pretty universally I have seen performance improvements in code when complexity is reduced and this could drop complexity considerably. I wouldn't be surprised to see a double digit percent improvement in tokens per sec when an optimized pytorch eventually comes out with this. There may even be hidden gains on GPU memory usage that come out of this as people clean up code and start implementing better tricks because of it.

imtringued · on July 13, 2024

Yeah, one of the dumbest things about Dataloaders running in a different process is that you are logging into the void.