It really doesn’t matter if you have cuda or not if you’re going to run inference at scale. As I said above (speaking from experience), porting models for inference is not a technically difficult problem. Indeed with both Intel Gaudi or AMD MI series of accelerators a lot of the popular architectures and their derivatives are supported either out of the box or with minimal tweaks.