Hacker Newsnew | past | comments | ask | show | jobs | submit | XCSme's commentslogin

Just a few more credits and it will finally fix that bug without introducing new ones, exactly how I asked

I can stop any time I want, and in fact I am going to stop. Just one more (bug)fix.

This joke is getting old kinda Opus4.5 handles all the bugs in one go and also doesn’t introduce new ones at least for me. Very rarely i get stuck with it like i did with past generations of AI

How long the usual self debugging cycle ? it seems to be around 10 minutes for me (untyped language)

It's also laggy for me, with a 5900x + 3090...

The demo is not loading for me: https://i.snipboard.io/gkNxDO.jpg Probably because of: https://i.snipboard.io/XDQUWG.jpg

Contracting work doing n8n AI automations.

My main personal project is a self-hosted Hotjar alternative: https://www.uxwizz.com


But most benchmarks are not about that...

Are there even any "hallucination" public benchmarks?


"Benchmarks" for LLMs are a total hoax, since you can train them on the benchmarks themselves.

I would assume a good benchmark has hidden tests, or something randomly generated that is harder to game

Maybe they are just more consistent, which is a bit hard to notice immediately.

Neurons are huge

Oh, tensorflow. When AI actually meant creating your own networks...

I am curious if there are any local-model grammar tools out there? There are many, good-enough tiny models that could technically run in the browser.

In my experience building AI Agents, having more specialized agents works better than one big agent. Simply because LLMs are dumb and make A LOT of mistakes. The longer the system prompt and more instructions you give it, the more likely it will miss some of them.

For example, in one architecture, I used to have a support agent reply directly in the customer's language, but the translations were very poor. Now I have one agent thinking of the answer in English, and then one dedicated agent for translation. The translations are A LOT better now.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: