So how come they are able do this in real time, on a headset, over the internet, yet next gen gaming consoles don’t even get close to that level of detail?
Games do a lot more than just "render a high quality bust of a person", you have whole environments and entire systems that are interactive. Most technical demos get away with higher fidelity because of this, and when you finally see it implemented in games, they've been scaled back a lot.
I’ve had the pleasure of sitting on a network that was, in practice, not bandwidth limited and it has led me to conclude that the terrible experience in practice is caused by retail ISPs being absolute dogshit. If you can get on a really well run ISP like Fiber7 in Switzerland, or a $BigCorp network, things are much better and demos like this are no problem.
Game consoles have lots of other details to worry about like the background (this demo is just an empty black background), NPCs and everything they need to do, game logic, physics, etc
latest consoles could, it's more about the software. Also easier if you have nothing else to render than a face. https://www.unrealengine.com/en-US/metahuman looks pretty good, not many games are using UE5 yet