I've been saying for a while that given a proper harness, small local models can...

zambelli · 2026-05-19T22:12:18 1779228738

Lol, I love that framing. Yeah, the small models have impressed me a lot during this work. The reasoning can be quite good, and definitely sufficient for a lot of cases. Just gotta nudge em back on track Every now and then and they'll figure it out.

Aurornis · 2026-05-20T05:38:51 1779255531

The problem is that you get similar quality as if you gave a junior unlimited time to work on a problem and told them to keep trying different things until the goal is reached.

Even the SOTA models have this problem when the work is complicated enough. The problem is amplified more with the small models.

coip · 2026-05-21T16:08:43 1779379723

One important facet of this is it’s not far from “giving unlimited juniors unlimited time…”

Where the limits are set by hardware for agentic execution (compute/network/storage) && inference speed

Zetaphor · 2026-05-20T12:52:00 1779281520

There's a lot of valuable things that can be done in that range, especially when token costs aren't a concern. Not every problem requires SOTA

Aurornis · 2026-05-20T13:42:59 1779284579

> especially when token costs aren't a concern. Not every problem requires SOTA

If token costs aren’t a concern I’m using SOTA for everything.

Even SOTA gets it wrong and hallucinates, but at a lower rate. I don’t want to waste my time.

lixquid · 2026-05-20T16:43:26 1779295406

I believe they mean token costs aren't a concern when you're not paying for a SOTA model via API, and are instead running local models.

Infinite monkeys on infinite typewriters, and all that.

Zetaphor · 2026-05-20T18:42:42 1779302562

Correct, I have local hardware, not infinite money.

cornholio · 2026-05-19T22:28:36 1779229716

If I understood correctly, the model will get it right because it knows when it isn't right.

zambelli · 2026-05-19T22:30:27 1779229827

Essentially, yes that's right! There's some subtlety in how to let it know it was wrong (returning things as tool errors because it trained on that), but that's the gist of it - sort of a self-correcting architecture.

tomjakubowski · 2026-05-20T00:52:18 1779238338

https://en.wikipedia.org/wiki/Apophatic_theology

jon_richards · 2026-05-20T02:26:12 1779243972

I was expecting this https://knowyourmeme.com/memes/the-missile-knows-where-it-is

forlorn_mammoth · 2026-05-20T14:11:57 1779286317

the missile knows where it is because it knows where it isn't

andai · 2026-05-20T10:49:38 1779274178

Prior art: https://ghuntley.com/ralph/

koolba · 2026-05-19T22:59:57 1779231597

A thousand monkeys on a thousand typewriters…

zambelli · 2026-05-19T23:40:45 1779234045

That is the whole challenge, actually! A new metric I'm going to dogfood into forge is ETTWS - estimated time to working solution.

A simple retry loop around your whole workflow could, in some cases, be all you need. But it could mean many blind attempts to get through a workflow successfully. And hopefully there isn't a payment step partway through!

The fewer hard errors nix the whole workflow, the lower your ETTWS.

killing_time · 2026-05-20T06:28:56 1779258536

Is it strange that I immediately interpreted ETTWS to be Estimated Time To William Shakespeare?

Mithriil · 2026-05-20T15:32:51 1779291171

It's relevant to the "thousand monkeys on a thousand typewriters".

jononor · 2026-05-21T08:42:50 1779352970

The one true AGI metric!

beacon294 · 2026-05-20T05:22:37 1779254557

Have you read the MAKER/MDAP paper? 1 million sequential tasks.

zambelli · 2026-05-20T05:48:24 1779256104

No, I haven't - hadn't heard of it. I'll try to squeeze in a quick read in the coming weeks!

DiogenesKynikos · 2026-05-20T02:30:16 1779244216

This is a thousand unusually smart monkeys who speak every major human language fluently and are proficient in every major programming language, but sometimes still make bizarre mistakes and need to be put back on track.

jplusequalt · 2026-05-20T03:01:51 1779246111

This is fun for you?

bratbag · 2026-05-20T05:45:11 1779255911

I found it fun to read.