The math of NN training isn't complex at all. Designing the software stack to make a new pytorch backend is very doable with the budgets these AI companies have.
I suspect that whenever you look like you're making good progress on this front, nvidia gives you a lot of chips for free on condition you shelve the effort though!
The latest example being Tesla, who were designing their own hardware and software stack for NN training, then suspiciously got huge numbers of H100's ahead of other clients and cancelled the dojo effort.
I doubt that's what happened. They had designs that were massively expensive to fab/package, had much worse performance than the latest Nvidia hardware, and still needed massive amounts of custom in-house development.
To combat all of these issues, they were fighting with Nvidia (and losing) for access to leading edge nodes, which kept going up in price. Their personnel costs kept rising as the company became more politicized, people left to join other companies (e.g. densityai), and they became embroiled in the salary wars to replace them.
My suspicion is that Musk told them to just buy Nvidia instead of waiting around for years of slow iteration to get something competitive.
The custom silicon I was involved with experienced similar issues. It was too expensive and slow to try competing with Nvidia, and no one could stomach the costs to do so.
I suspect that whenever you look like you're making good progress on this front, nvidia gives you a lot of chips for free on condition you shelve the effort though!
The latest example being Tesla, who were designing their own hardware and software stack for NN training, then suspiciously got huge numbers of H100's ahead of other clients and cancelled the dojo effort.