This is truly tremendous to watch. Eleven years from TPP, and we're watching the current best-in-class AI try its best at the same. Who'll get there first, the historical gestalt of Twitch users or the just-shy-of-10^26 FLOPS [0] AI model?
Now here's a concept for anyone with more money than sense: ClaudePlaysTwitchPlaysPokemon, where it's TPP but every participant is Claude. Would hivemind AI consensus perform better than a single AI? Anthropic's certainly looking into it! [1]
A few years ago there was another AI that tried to beat Pokemon. It wasn't a LLM. I think it was an LSTM trained with reinforcement learning. It got stuck in Mt Moon.
Right now, Claude has been stuck in Mt Moon for nearly a day. It keeps forgetting where it has been. It also almost always runs from battles instead of changing Pokemon or fighting.
At one point it got stuck in a Pokemon center when it mistook the character's red hat for the red carpet around the exit. It kept pressing down and wondering why it wasn't working. It only broke out of that when it mistakenly concluded it had successfully exited the Pokemon center. Then it wandered around a bit and only realized it was still in the Pokemon center after talking to Nurse Joy.
> It also almost always runs from battles instead of changing Pokemon or fighting.
I believe this is because all of its Pokemon are on the verge of fainting, so it's trying to conserve them while it tries to find its way out.
> It keeps forgetting where it has been.
I'm wondering if this could be solved with a better harness; on one hand, that hurts the elegance of having one model dedicated to playing the game, but their existing harness is already cheating a little (they have a second LLM for verification). They're frequently compacting what's in context, which means its visual memory is quite poor - that could potentially be a point of improvement?
Now here's a concept for anyone with more money than sense: ClaudePlaysTwitchPlaysPokemon, where it's TPP but every participant is Claude. Would hivemind AI consensus perform better than a single AI? Anthropic's certainly looking into it! [1]
[0]: https://www.oneusefulthing.org/p/a-new-generation-of-ais-cla...
[1]: https://www.anthropic.com/news/visible-extended-thinking