It really doesn’t matter if you have cuda or not if you’re going to run inferenc...

It really doesn’t matter if you have cuda or not if you’re going to run inference at scale. As I said above (speaking from experience), porting models for inference is not a technically difficult problem. Indeed with both Intel Gaudi or AMD MI series of accelerators a lot of the popular architectures and their derivatives are supported either out of the box or with minimal tweaks.