Hacker Newsnew | past | comments | ask | show | jobs | submit | pvalue005's commentslogin

Hi Boris, did Claude Code itself author this change? I am curious as you said that all of your recent PRs were authored by Claude Code. If that's the case, just wondering what objective did you ask it to optimize for? Was it something like: make the UI simpler?


I suspect this was released by Anthropic as a DDOS attack on other AI companies. I prompted 'how do we solve this challenge?' into gemini cli in a cloned repo and it's been running non-stop for 20 minutes :)


Lately with Gemini CLI / Jules it doesn't seem like time spent is a good proxy for difficulty. It has a big problem with getting into loops of "I am preparing the response for the user. I am done. I will output the answer. I am confident. Etc etc".

I see this directly in Gemini CLI as the harness detects loops and bails the reasoning. But I've also just occasionally seen it take 15m+ to do trivial stuff and I suspect that's a symptom of a similar issue.


I've noticed using antigravity and vscode, Gemini 3 pro often comes back with model too busy or something like that and basically 500s.

Seems like capacity because it works a lot better late at night.

I don't see the same with the claude models in antigravity.


I also noticed that and I also noticed that it starts to struggle when the workspace "tab" you're working in gets longer - it basically gets stuck at "Starting agent ...". I initially thought it must be a very big context that the model is struggling with but since since restarting the "app" and kill -9 fixes it, it suggests that it's a local issue. Strange.


Anecdotally, I notice better performance and output quality across most providers outside of 8a-5p ET.


Yeah that's a separate issue though, it predates the time when the looping issues got really common, for me at least.


I saw this too. Sometimes it "think" inside of the actual output and its much more likely to end up in the loop of "I am ready to answer" while it is doing that already


I feel like sometimes it just loops those messages when it doesn't actually generate new tokens. But I might be wrong


There are some other failure modes that all feel kinda vaguely related that probably help with building a hypothesis about what's going wrong:

Sometimes Gemini tools will just randomly stop and pass the buck back to you. The last thing will be like "I will read the <blah> code to understand <blah>" and then it waits for another prompt. So I just type "continue" and it starts work again.

And, sometimes it will spit out the internal CoT directly instead of the text that's actually supposed to be user-visible. So sometimes I'll see a bunch of paragraphs starting with "Wait, " as it works stuff out and then at the end it says "I understand the issue" or whatever, then it waits for a prompt. I type "summarise" and it gives me the bit I actually wanted.

It feels like all these things are related and probably have to do with the higher-level orchestration of the product. Like I assume there are a whole bunch of models feeding data back and forth to each other to form the user-visible behaviour, and something is wrong at that level.


At one point it started spitting out its CoT in the comments of the code it’s supposed to be changing.


Ah yeah I've seen that too. Definitely seems related.

I suspect this is also something like the "inverse" of a prompt hijacking situation. Basically it's losing track of where its output is flowing to (whereas prompt injection is when it loses track of where its input is flowing from).


Which Gemini model did you use? My experience since launch of G3Pro has been that it absolutely sucks dog crap through a coffee straw.


/model: Auto (Gemini 3) Let Gemini CLI decide the best model for the task: gemini-3-pro, gemini-3-flash

After ~40 minutes, it got to:

The final result is 2799 cycles, a 52x speedup over the baseline. I successfully implemented Register Residency, Loop Unrolling, and optimized Index Updates to achieve this, passing all correctness and baseline speedup tests. While I didn't beat the Opus benchmarks due to the complexity of Broadcast Optimization hazards, the performance gain is substantial.

It's impressive as I definitely won't be able to do what it did. I don't know most of the optimization techniques it listed there.

I think it's over. I can't compete with coding agents now. Fortunately I've saved enough to buy some 10 acre farm in Oregon and start learning to grow some veggies and raise chickens.


Keep in mind that the boat on competing with machines to generate assembly sailed for 99% of programmers half a century ago. It is not surprising that this is an area where AI is strong.


Did you check that it did the things it claims it did?


> grow some veggies and raise chickens.

Maybe Claude will be able to do that soon, too.


After an hour with a few prompts, the first working version got to 3529 cycles (41x speedup) for me. I was using Gemini 3 pro preview.


we've lost the plot.

you can't compete with an AI on doing an AI performance benchmark?


This is not an AI performance benchmark, this is an actual exercise given to potential human employees during a recruitment process.


Hilarious that this got a downvote, hello Satya!


> sucks dog crap through a coffee straw.

That would be impressive.


Only if the dog didn't get too much human food the night before.


New LLM benchmark incoming? I bet once it's done, people will still say it's not AGI.


When they get the hardware capable of that, a different industry will be threatened by AI. The oldest industry.


Song of Solomon I guess


Textile?


The emperor's (empresses?) new textile.


The human body has been optimized for a very complex objective function, and in a very different environment to a robot. If you specify what the robots are doing, and the set of constraints like power source, size, weights, etc., the optimal design will unlikely be humanoid.


that's ok, they are bayesian


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: