I've been saying for a while that given a proper harness, small local models can perform incredibly well. When you have a system that can try everything, it will eventually get it right as long as you can prevent it from getting it wrong in the meantime.
Lol, I love that framing. Yeah, the small models have impressed me a lot during this work. The reasoning can be quite good, and definitely sufficient for a lot of cases. Just gotta nudge em back on track Every now and then and they'll figure it out.
The problem is that you get similar quality as if you gave a junior unlimited time to work on a problem and told them to keep trying different things until the goal is reached.
Even the SOTA models have this problem when the work is complicated enough. The problem is amplified more with the small models.
Essentially, yes that's right! There's some subtlety in how to let it know it was wrong (returning things as tool errors because it trained on that), but that's the gist of it - sort of a self-correcting architecture.
That is the whole challenge, actually! A new metric I'm going to dogfood into forge is ETTWS - estimated time to working solution.
A simple retry loop around your whole workflow could, in some cases, be all you need. But it could mean many blind attempts to get through a workflow successfully. And hopefully there isn't a payment step partway through!
The fewer hard errors nix the whole workflow, the lower your ETTWS.
This is a thousand unusually smart monkeys who speak every major human language fluently and are proficient in every major programming language, but sometimes still make bizarre mistakes and need to be put back on track.
Am I correct in my understanding that they are not actually able to 100% know what Claude is thinking? They have trained a new model to make a guess about what Claude is thinking, but we cannot validate that the guess is 100% valid, right? They are basically saying "we have trained a model to reaffirm what we believe Claude is thinking" ? Hoping I'm wrong in my understanding of this because this does not appear to be good research to me.
Maybe you can't 100% know what every layer "thinks", if you go through all the layers, you might see a cohesive "thinking" story. So, if there is any information you lose at layer N, you might learn some of it in layer N+1. The masking in the layers is not deterministic so the model can't really consistently lie throughout the layers. It doesn't chose what information we get to inspect. There might be a game of whack-a-mole, but you might get a general sentiment. I think the more layers there are, the more the model itself can hide very nuanced lies (But by that time we'd have a better mind-reading model).
However, I haven't read about it yet. I'm really excited to look into it!
> "we have trained a model to reaffirm what we believe Claude is thinking" ?
It's more like "We have trained a model to produce a text that allows reconstruction of activations and the text happened to coincide with the results of other interpretability methods even after extensive training, while we expected it to devolve into unintelligible mess."
They found something unexpected and useful. They report it, while outlining limitations and ways to improve. It looks like a fine research to me.
I am in the same boat. Reading is a transaction and lately everyone wants to put 60 seconds of effort into writing an article and expect me to put 10 minutes into reading it, and I just can't. The writing feels dead, soulless even. Every sentence or phrase is structured like a mongering, click baity headline and it's insufferable.
At this point markdown is going to be the foundation of the entire AI web. Someone the other day showed off Markdown as a responsive frontend protocol. Now we've got email. How long until we're writing classes in markdown? We can only abstract this so far before we confuse AI more than help it.
> One can argue that once we achieve the singularity, it could immediately scale on its own as it decides.
even if this is true, someone needs to build the platform and the software required to get to the singularity.
one can also argue that lots of $ is required to get to the singularity, taking control of how the world builds, deploys and operates the digital world is a proven avenue to get such $.
I recently tried to learn it and found it frustrating. A lot of docs are for 0.15 but the latest is (or was) 0.16 which changed a lot of std so none of the existing write ups were valid anymore. I plan to revisit once it gets more stable because I do like it when I get it to work.
reply