That's true. Most ML algorithms have the iterate-until-converge pattern. How about some tasks like hyper parameter tuning or trying out different algorithms against the same data set? Those can be run in parallel.
ML research should aim to produce more parallel algorithms.
I realize this isn't always entirely decoupled in certain online learning approaches. I don't work in ML, am certainly not an expert, and am genuinely curious where this space is at now in terms of hardware requirements for SOTA methodologies these days, especially inference phase HW requirements for just running stuff that's out there.
ML research should aim to produce more parallel algorithms.