Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This actually begs the question:

Does anyone know the kind of actual infrastructure something like gpt4-32k actually run on?

I mean when I actually type something in the prompt, what actually happens behind the scenes?

Is the answer computed on a single NVidia GPU?

Or is it dedicated H/W not known to the general public?

How big is that GPU?

How much RAM does it have?

Is my conversation run by a single GPU instance that is dedicated to me or is that GPU shared by multiple users?

If the latter, how many queries per seconds can a single GPU handle?

Where is that GPU?

Does it run in an Azure data center?

Is the API usage cost actually reflective of the HW cost or is it heavily subsidized?

Is a single GPU RAM size the bottleneck for how large a model can be?

Is any of that info public ?



Mark Russinovich shares some of it in this recent Ignite session: https://ignite.microsoft.com/en-US/sessions/49347847-9ae4-43... *I work at Microsoft but have nothing to do with the datacenter engineering or other insights into the details behind it.


So 14400 H100 for GPT-4, but that's just a fraction of the new system that Azure is building for OpenAI.

FWIW, I most enjoyed the 29TB machine demo at the end.


While we can't be sure of most of those answers, they have stated it is running in Azure.

Also we can probably assume the pricing is likely to be somewhat in proportion to the cost to run (possibly subsidised to gain market, but they are unlikely to be taking a giant/unsustainable loss per query here, particularly as they seem to announce price decreases when they increase model performance).


Azure VMSS (uniform orchestration) + 2000 to 3000 GPU enabled servers. I'm not sure about what kind of GPU is on these servers.


> Is the answer computed on a single NVidia GPU?

Most likely given that one of their open positions for a GPU programmer includes

> high technical competence for writing custom CUDA kernels and pushing GPUs to their limits.

Edit: only narrows it down to NVidia hardware, IDK if single GPU or not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: