More

XCSme · 2025-12-14T22:21:52 1765750912

Just a few more credits and it will finally fix that bug without introducing new ones, exactly how I asked

baobabKoodaa · 2025-12-14T22:31:28 1765751488

I can stop any time I want, and in fact I am going to stop. Just one more (bug)fix.

michelsedgh · 2025-12-14T23:00:01 1765753201

This joke is getting old kinda Opus4.5 handles all the bugs in one go and also doesn’t introduce new ones at least for me. Very rarely i get stuck with it like i did with past generations of AI

agumonkey · 2025-12-14T23:17:42 1765754262

How long the usual self debugging cycle ? it seems to be around 10 minutes for me (untyped language)

XCSme · 2025-12-13T21:37:50 1765661870

It's also laggy for me, with a 5900x + 3090...

XCSme · 2025-12-13T21:35:52 1765661752

The demo is not loading for me: https://i.snipboard.io/gkNxDO.jpg Probably because of: https://i.snipboard.io/XDQUWG.jpg

XCSme · 2025-12-13T21:31:58 1765661518

Contracting work doing n8n AI automations.

My main personal project is a self-hosted Hotjar alternative: https://www.uxwizz.com

XCSme · 2025-12-12T10:37:49 1765535869

But most benchmarks are not about that...

Are there even any "hallucination" public benchmarks?

andrepd · 2025-12-12T11:13:25 1765538005

"Benchmarks" for LLMs are a total hoax, since you can train them on the benchmarks themselves.

XCSme · 2025-12-12T19:40:02 1765568402

I would assume a good benchmark has hidden tests, or something randomly generated that is harder to game

XCSme · 2025-12-11T23:53:31 1765497211

Maybe they are just more consistent, which is a bit hard to notice immediately.

XCSme · 2025-12-11T13:58:06 1765461486

Neurons are huge

XCSme · 2025-12-10T13:25:58 1765373158

Oh, tensorflow. When AI actually meant creating your own networks...

XCSme · 2025-12-05T23:41:09 1764978069

I am curious if there are any local-model grammar tools out there? There are many, good-enough tiny models that could technically run in the browser.

XCSme · 2025-12-05T23:18:28 1764976708

In my experience building AI Agents, having more specialized agents works better than one big agent. Simply because LLMs are dumb and make A LOT of mistakes. The longer the system prompt and more instructions you give it, the more likely it will miss some of them.

For example, in one architecture, I used to have a support agent reply directly in the customer's language, but the translations were very poor. Now I have one agent thinking of the answer in English, and then one dedicated agent for translation. The translations are A LOT better now.