What’s baffling about all this is both AMD and Intel have competing offerings and those offerings see next to no traction in spite of being much more attractive from the cost standpoint. I understand why they aren’t taking off on the training side: fragmentation is very counterproductive there. But why not deploy them in large quantities for inference at least? The effort of porting transformer based models is trivial for both vendors, and the performance is very competitive
It really doesn’t matter if you have cuda or not if you’re going to run inference at scale. As I said above (speaking from experience), porting models for inference is not a technically difficult problem. Indeed with both Intel Gaudi or AMD MI series of accelerators a lot of the popular architectures and their derivatives are supported either out of the box or with minimal tweaks.
The question is what is the performance per watt of AMD's and Intel's products. My guess is both have significantly worse performance per watt. Energy and cooling are huge data center expenses and paying less for a product which requires more energy and cool is not a good idea because it costs more.