>~If they're running on, say, two RTX 6000s for a total draw of ~600 watts, that...

>~If they're running on, say, two RTX 6000s for a total draw of ~600 watts, that would be a response time of 1.44 seconds. So obviously the median prompt doesn't go to some high-end thinking model users have to pay for.

You're not accounting for batches for the optimal gpu utilization, maybe it can takes 30 seconds but it completed 30 requests.