I resent your implication that I am baselessly hyping. I've open sourced a few O...

Denzel · 2026-02-12T01:15:15 1770858915

First, very cool! Thank you for sharing some actual projects with the prompts logged.

I think you and I have different definitions of “one-shotting”. If the model has to be steered, I don’t consider that a one-shot.

And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

Honestly, your experience in these repos matches my daily experience with these models almost exactly.

I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.

Dylan16807 · 2026-02-12T03:56:53 1770868613

> I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.

I'd be hesitant to use that as a way to evaluate things. Different systems run at different speeds. I want to see how much it can get done before it breaks, in different scenarios.

minimaxir · 2026-02-12T01:38:12 1770860292

I never claimed Opus 4.5 can one-shot things? Even human-written software takes a few iterations to add/polish new features as they come to mind.

> And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

That's less due to the model being wrong and more due to myself not knowing what I wanted because I am definitely not a UI/UX person. See my reply in the sibling thread.

Denzel · 2026-02-12T13:59:28 1770904768

Apologies, I may have misinterpreted the passage below from your repo:

> This crate was developed with the assistance of Claude Opus 4.5 initially to answer the shower thought "would the Braille Unicode trick work to visually simulate complex ball physics in a terminal?" Opus 4.5 one-shot the problem, so I decided to further experiment to make it more fun and colorful.

Also, yes, I don’t dispute that human written software takes iteration as well. My point is that the significance of autonomous agentic coding feels exaggerated if I’m holding the LLM’s hand more than I have to hold a senior engineer’s hand.

That doesn’t mean the tech isn’t valuable. The claims just feel over exaggerated.

minimaxir · 2026-02-12T18:00:25 1770919225

If you click the video that line links to, it one-shot the original problem as very explicitly defined as a PoC, not the entire project. The final project shipped is substantially different, and that's the difference between YOLO vibecoding and creating something useful.

There's also the embarrassing corner physics bugs present in that video, which was something that required a fix in the first few prompts.