Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>~If they're running on, say, two RTX 6000s for a total draw of ~600 watts, that would be a response time of 1.44 seconds. So obviously the median prompt doesn't go to some high-end thinking model users have to pay for.

You're not accounting for batches for the optimal gpu utilization, maybe it can takes 30 seconds but it completed 30 requests.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: