While we can't be sure of most of those answers, they have stated it is running in Azure.
Also we can probably assume the pricing is likely to be somewhat in proportion to the cost to run (possibly subsidised to gain market, but they are unlikely to be taking a giant/unsustainable loss per query here, particularly as they seem to announce price decreases when they increase model performance).
Does anyone know the kind of actual infrastructure something like gpt4-32k actually run on?
I mean when I actually type something in the prompt, what actually happens behind the scenes?
Is the answer computed on a single NVidia GPU?
Or is it dedicated H/W not known to the general public?
How big is that GPU?
How much RAM does it have?
Is my conversation run by a single GPU instance that is dedicated to me or is that GPU shared by multiple users?
If the latter, how many queries per seconds can a single GPU handle?
Where is that GPU?
Does it run in an Azure data center?
Is the API usage cost actually reflective of the HW cost or is it heavily subsidized?
Is a single GPU RAM size the bottleneck for how large a model can be?
Is any of that info public ?