Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree 100%.

For some reason Gemini seems to be worse at it than Claude lately. Since mostly moving to 3 I've had it go back and change the tests rather than fixing the bug on what seems to be a regular basis. It's like it's gotten smart enough to "cheat" more. You really do still have to pay attention that the tests are valid.



Yep. It's incredibly annoying that obviously these AI companies are turning the "IQ knob" on these models up and down without warning or recourse. First OpenAI, then Anthropic and now Google. I'm guessing it's a cost optimization. OpenAI even said that part out loud.

Of course, for customers it is just one more reason you need to be looking at every AI outputs. Just because they did something perfect yesterday doesn't mean they won't totally screw up the exact same thing today. Or you could say it's one more advantage of local models: you control the knobs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: