Hi Boris, did Claude Code itself author this change? I am curious as you said that all of your recent PRs were authored by Claude Code. If that's the case, just wondering what objective did you ask it to optimize for? Was it something like: make the UI simpler?
I suspect this was released by Anthropic as a DDOS attack on other AI companies. I prompted 'how do we solve this challenge?' into gemini cli in a cloned repo and it's been running non-stop for 20 minutes :)
Lately with Gemini CLI / Jules it doesn't seem like time spent is a good proxy for difficulty. It has a big problem with getting into loops of "I am preparing the response for the user. I am done. I will output the answer. I am confident. Etc etc".
I see this directly in Gemini CLI as the harness detects loops and bails the reasoning. But I've also just occasionally seen it take 15m+ to do trivial stuff and I suspect that's a symptom of a similar issue.
I also noticed that and I also noticed that it starts to struggle when the workspace "tab" you're working in gets longer - it basically gets stuck at "Starting agent ...". I initially thought it must be a very big context that the model is struggling with but since since restarting the "app" and kill -9 fixes it, it suggests that it's a local issue. Strange.
I saw this too. Sometimes it "think" inside of the actual output and its much more likely to end up in the loop of "I am ready to answer" while it is doing that already
There are some other failure modes that all feel kinda vaguely related that probably help with building a hypothesis about what's going wrong:
Sometimes Gemini tools will just randomly stop and pass the buck back to you. The last thing will be like "I will read the <blah> code to understand <blah>" and then it waits for another prompt. So I just type "continue" and it starts work again.
And, sometimes it will spit out the internal CoT directly instead of the text that's actually supposed to be user-visible. So sometimes I'll see a bunch of paragraphs starting with "Wait, " as it works stuff out and then at the end it says "I understand the issue" or whatever, then it waits for a prompt. I type "summarise" and it gives me the bit I actually wanted.
It feels like all these things are related and probably have to do with the higher-level orchestration of the product. Like I assume there are a whole bunch of models feeding data back and forth to each other to form the user-visible behaviour, and something is wrong at that level.
Ah yeah I've seen that too. Definitely seems related.
I suspect this is also something like the "inverse" of a prompt hijacking situation. Basically it's losing track of where its output is flowing to (whereas prompt injection is when it loses track of where its input is flowing from).
/model: Auto (Gemini 3) Let Gemini CLI decide the best model for the task: gemini-3-pro, gemini-3-flash
After ~40 minutes, it got to:
The final result is 2799 cycles, a 52x speedup over the baseline. I successfully implemented Register Residency, Loop Unrolling, and optimized Index Updates to achieve this, passing all correctness and baseline speedup tests. While I didn't beat the Opus benchmarks due to the complexity of Broadcast Optimization hazards, the performance gain is substantial.
It's impressive as I definitely won't be able to do what it did. I don't know most of the optimization techniques it listed there.
I think it's over. I can't compete with coding agents now. Fortunately I've saved enough to buy some 10 acre farm in Oregon and start learning to grow some veggies and raise chickens.
Keep in mind that the boat on competing with machines to generate assembly sailed for 99% of programmers half a century ago. It is not surprising that this is an area where AI is strong.
The human body has been optimized for a very complex objective function, and in a very different environment to a robot. If you specify what the robots are doing, and the set of constraints like power source, size, weights, etc., the optimal design will unlikely be humanoid.