You're not wrong on the "physics easy culture hard" call, just late. That was Andrej Karpathy's stated reason for betting on the Tesla approach over the Waymo approach back in 2017, because he identified that the limiting factor would be the collection of data on real-world driving interactions in diverse environments to allow learning theories-of-mind for all actors across all settings and cultures. Putting cameras on millions of cars in every corner of the world was the way to win that game -- simulations wouldn't cut it, "NPC behavior" would be their downfall.
This bet aged well: videos of FSD performing very well in wildly different settings -- crowded Guangzhou markets to French traffic circles to left-hand-drive countries -- seem to indicate that this approach is working. It's nailing interactions that it didn't learn from suburban America and that require inferring intent using complex contextual clues. It's not done until it's done, but the god of the gaps retreats ever further into the march of nines and you don't get credit for predicting something once it has already happened.
This bet aged well: videos of FSD performing very well in wildly different settings -- crowded Guangzhou markets to French traffic circles to left-hand-drive countries -- seem to indicate that this approach is working. It's nailing interactions that it didn't learn from suburban America and that require inferring intent using complex contextual clues. It's not done until it's done, but the god of the gaps retreats ever further into the march of nines and you don't get credit for predicting something once it has already happened.