More

Escapade5160 · 2026-05-27T03:26:04 1779852364

Ben Jordan did a fantastic piece on how harmful data centers are to the people living near them.

Escapade5160 · 2026-05-25T01:08:03 1779671283

And four-fiths the cost of a consumer PC build.

Escapade5160 · 2026-05-19T22:09:42 1779228582

I've been saying for a while that given a proper harness, small local models can perform incredibly well. When you have a system that can try everything, it will eventually get it right as long as you can prevent it from getting it wrong in the meantime.

zambelli · 2026-05-19T22:12:18 1779228738

Lol, I love that framing. Yeah, the small models have impressed me a lot during this work. The reasoning can be quite good, and definitely sufficient for a lot of cases. Just gotta nudge em back on track Every now and then and they'll figure it out.

Aurornis · 2026-05-20T05:38:51 1779255531

The problem is that you get similar quality as if you gave a junior unlimited time to work on a problem and told them to keep trying different things until the goal is reached.

Even the SOTA models have this problem when the work is complicated enough. The problem is amplified more with the small models.

coip · 2026-05-21T16:08:43 1779379723

One important facet of this is it’s not far from “giving unlimited juniors unlimited time…”

Where the limits are set by hardware for agentic execution (compute/network/storage) && inference speed

Zetaphor · 2026-05-20T12:52:00 1779281520

There's a lot of valuable things that can be done in that range, especially when token costs aren't a concern. Not every problem requires SOTA

Aurornis · 2026-05-20T13:42:59 1779284579

> especially when token costs aren't a concern. Not every problem requires SOTA

If token costs aren’t a concern I’m using SOTA for everything.

Even SOTA gets it wrong and hallucinates, but at a lower rate. I don’t want to waste my time.

lixquid · 2026-05-20T16:43:26 1779295406

I believe they mean token costs aren't a concern when you're not paying for a SOTA model via API, and are instead running local models.

Infinite monkeys on infinite typewriters, and all that.

Zetaphor · 2026-05-20T18:42:42 1779302562

Correct, I have local hardware, not infinite money.

cornholio · 2026-05-19T22:28:36 1779229716

If I understood correctly, the model will get it right because it knows when it isn't right.

zambelli · 2026-05-19T22:30:27 1779229827

Essentially, yes that's right! There's some subtlety in how to let it know it was wrong (returning things as tool errors because it trained on that), but that's the gist of it - sort of a self-correcting architecture.

tomjakubowski · 2026-05-20T00:52:18 1779238338

https://en.wikipedia.org/wiki/Apophatic_theology

jon_richards · 2026-05-20T02:26:12 1779243972

I was expecting this https://knowyourmeme.com/memes/the-missile-knows-where-it-is

forlorn_mammoth · 2026-05-20T14:11:57 1779286317

the missile knows where it is because it knows where it isn't

andai · 2026-05-20T10:49:38 1779274178

Prior art: https://ghuntley.com/ralph/

koolba · 2026-05-19T22:59:57 1779231597

A thousand monkeys on a thousand typewriters…

zambelli · 2026-05-19T23:40:45 1779234045

That is the whole challenge, actually! A new metric I'm going to dogfood into forge is ETTWS - estimated time to working solution.

A simple retry loop around your whole workflow could, in some cases, be all you need. But it could mean many blind attempts to get through a workflow successfully. And hopefully there isn't a payment step partway through!

The fewer hard errors nix the whole workflow, the lower your ETTWS.

killing_time · 2026-05-20T06:28:56 1779258536

Is it strange that I immediately interpreted ETTWS to be Estimated Time To William Shakespeare?

Mithriil · 2026-05-20T15:32:51 1779291171

It's relevant to the "thousand monkeys on a thousand typewriters".

jononor · 2026-05-21T08:42:50 1779352970

The one true AGI metric!

beacon294 · 2026-05-20T05:22:37 1779254557

Have you read the MAKER/MDAP paper? 1 million sequential tasks.

zambelli · 2026-05-20T05:48:24 1779256104

No, I haven't - hadn't heard of it. I'll try to squeeze in a quick read in the coming weeks!

DiogenesKynikos · 2026-05-20T02:30:16 1779244216

This is a thousand unusually smart monkeys who speak every major human language fluently and are proficient in every major programming language, but sometimes still make bizarre mistakes and need to be put back on track.

jplusequalt · 2026-05-20T03:01:51 1779246111

This is fun for you?

bratbag · 2026-05-20T05:45:11 1779255911

I found it fun to read.

Escapade5160 · 2026-05-17T22:45:16 1779057916

Setup hooks. Hooks are how your harness forces compliance with your own rules.

Escapade5160 · 2026-05-08T03:25:22 1778210722

Am I correct in my understanding that they are not actually able to 100% know what Claude is thinking? They have trained a new model to make a guess about what Claude is thinking, but we cannot validate that the guess is 100% valid, right? They are basically saying "we have trained a model to reaffirm what we believe Claude is thinking" ? Hoping I'm wrong in my understanding of this because this does not appear to be good research to me.

kovek · 2026-05-08T04:21:15 1778214075

Maybe you can't 100% know what every layer "thinks", if you go through all the layers, you might see a cohesive "thinking" story. So, if there is any information you lose at layer N, you might learn some of it in layer N+1. The masking in the layers is not deterministic so the model can't really consistently lie throughout the layers. It doesn't chose what information we get to inspect. There might be a game of whack-a-mole, but you might get a general sentiment. I think the more layers there are, the more the model itself can hide very nuanced lies (But by that time we'd have a better mind-reading model).

However, I haven't read about it yet. I'm really excited to look into it!

red75prime · 2026-05-08T04:09:57 1778213397

> "we have trained a model to reaffirm what we believe Claude is thinking" ?

It's more like "We have trained a model to produce a text that allows reconstruction of activations and the text happened to coincide with the results of other interpretability methods even after extensive training, while we expected it to devolve into unintelligible mess."

They found something unexpected and useful. They report it, while outlining limitations and ways to improve. It looks like a fine research to me.

Escapade5160 · 2026-03-26T01:37:24 1774489044

I am in the same boat. Reading is a transaction and lately everyone wants to put 60 seconds of effort into writing an article and expect me to put 10 minutes into reading it, and I just can't. The writing feels dead, soulless even. Every sentence or phrase is structured like a mongering, click baity headline and it's insufferable.

Escapade5160 · 2026-03-24T22:03:33 1774389813

At this point markdown is going to be the foundation of the entire AI web. Someone the other day showed off Markdown as a responsive frontend protocol. Now we've got email. How long until we're writing classes in markdown? We can only abstract this so far before we confuse AI more than help it.

whattheheckheck · 2026-03-25T01:32:23 1774402343

Look up Configuration Complexity clock

Escapade5160 · 2026-03-20T02:29:14 1773973754

The means of production are just files with special extensions.

Escapade5160 · 2026-03-20T02:06:56 1773972416

Theoretically it only requires it for birth. One can argue that once we achieve the singularity, it could immediately scale on its own as it decides.

tw1984 · 2026-03-20T09:18:32 1773998312

> One can argue that once we achieve the singularity, it could immediately scale on its own as it decides.

even if this is true, someone needs to build the platform and the software required to get to the singularity.

one can also argue that lots of $ is required to get to the singularity, taking control of how the world builds, deploys and operates the digital world is a proven avenue to get such $.

Escapade5160 · 2026-03-11T03:08:46 1773198526

I recently tried to learn it and found it frustrating. A lot of docs are for 0.15 but the latest is (or was) 0.16 which changed a lot of std so none of the existing write ups were valid anymore. I plan to revisit once it gets more stable because I do like it when I get it to work.

Cloudef · 2026-03-11T03:16:09 1773198969

0.16 is the development version. 0.15.2 is latest release.