Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Again I just tap the sign.

All of your benchmarks mean nothing to me until you include Claude Sonnet on them.

In my experience, GPT hasn’t been able to compete with Claude in years for the daily “economically valuable” tasks I work on.





Since as per Anthropics own benchmarks Sonnet 4.5 is beaten by Opus 4.5 would it not suffice to infer the rest?

https://x.com/OpenAI/status/1999182104362668275


Claude is pretty trash for anything besides coding

Yeah, but that is the whole point of Claude. And that's why we are interested in the comparison.

What are you basing that on? Between Sonnet and Opus I don't think I'm reaching for Gemini 3 at all.

That hasn't been my experience at all. I always wondered if we just get used to how to prompt a given model and that it hard to transition to another.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: