All my information about this is being based on feels, because debugging isn't really feasible. Verbose mode is a mess, and there's no alternative.
It still does what I need so I'm okay with it, but I'm also on the $20 plan so it's not that big of a worry for me.
I did sense that the big wave of companies is hitting Anthropic's wallet. If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.
Anyway, getting some transparency on this would be nice.
> If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.
It is entirely due to Opus 4.5 being an inflection point codingwise over previous LLMs. Most of the buzz there has been organic word of mouth due to how strong it is.
Opus 4.5 is expensive to put it mildly, which makes Claude Code more compelling. But even now, token providers like Openrouter have Opus 4.5 as one of its most popular models despite the price.
The real annoying thing about Opus 4.5 is that it's impossible to publicly say "Opus 4.5 is an order of magnitude better than coding LLMs released just months before it" without sounding like a AI hype booster clickbaiting, but it's the counterintuitive truth, to my personal frustration.
I have been trying to break this damn model since its November release by giving it complex and seemingly impossible coding tasks but this asshole keeps doing them correctly. GPT-5.3-Codex has been the same relative to GPT-5.2-Codex, which just makes me even more frustrated.
Weird, I broke Opus 4.5 pretty easily by giving some code, a build system, and integration tests that demonstrate the bug.
CC confidently iterated until it discovered the issue. CC confidently communicated exactly what the bug was, a detailed step-by-step deep dive into all the sections of the code that contributed to it. CC confidently suggested a fix that it then implemented. CC declared victory after 10 minutes!
The bug was still there.
I’m willing to admit I might be “holding it wrong”. I’ve had some successes and failures.
It’s all very impressive, but I still have yet to see how people are consistently getting CC to work for hours on end to produce good work. That still feels far fetched to me.
I don't know how to say this but either you haven't written any complex code or your definition of complex and impossible is not the same as mine, or you are "ai hyper booster clickbaiting" (your words).
It strains belief that anyone working on a moderate to large project would not have hit the edge cases and issues. Every other day I discover and have to fix a bug that was introduced by Claude/Codex previously (something implement just slightly incorrect or with just a slightly wrong expectation).
Every engineer I know working "mid-to-hard" problems (FANG and FANG adjacent) has broken every LLM including Opus 4.6, Gemini 3 Pro, and GPT-5.2-Codex on routine tasks. Granted the models have a very high success rate nowadays but they fail in strange ways and if you're well versed in your domain, these are easy to spot.
Granted I guess if you're just saying "build this" and using "it runs and looks fine" as the benchmark then OK.
All this is not to say Opus 4.5/6 are bad, not by a long shot, but your statement is difficult to parse as someone who's been coding a very long time and uses these agents daily. They're awesome but myopic.
I resent your implication that I am baselessly hyping. I've open sourced a few Opus 4.5-coded projects (https://news.ycombinator.com/item?id=46543359) (https://news.ycombinator.com/item?id=46682115) that while not moderate-to-large projects, are very niche and novel without much if any prior art. The prompts I used are included with each those projects: they did not "run and look fine" on first run, and were refined just as with normal software engineering pipelines.
You might argue I'm No True Engineer because these aren't serious projects but I'd argue most successful uses of agentic coding aren't by FANG coders.
> I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.
I'd be hesitant to use that as a way to evaluate things. Different systems run at different speeds. I want to see how much it can get done before it breaks, in different scenarios.
I never claimed Opus 4.5 can one-shot things? Even human-written software takes a few iterations to add/polish new features as they come to mind.
> And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.
That's less due to the model being wrong and more due to myself not knowing what I wanted because I am definitely not a UI/UX person. See my reply in the sibling thread.
Wait, are you really saying you have never had Opus 4.5 fail at a programming task you've given it? That strains credulity somewhat... and would certainly contribute to people believing you're exaggerating/hyping up Opus 4.5 beyond what can be reasonably supported.
Also, "order of magnitude better" is such plainly obvious exaggeration it does call your objectivity into question about Opus 4.5 vs. previous models and/or the competition.
Opus 4.5 does made mistakes but I've found that's more due to ambiguous/imprecise functional requirements on my end rather than an inherent flaw of the agent pipeline. Giving it more clear instructions to reduce said ambiguity almost always fixes it, so I do not consider Opus failing. One of the very few times Opus 4.5 got completely stuck was, after tracing, an issue in a dependency's library which inherently can't be fixed on my end.
I am someone who has spent a lot of time with Sonnet 4.5 before that and was a very outspoken skeptic of agentic coding (https://news.ycombinator.com/item?id=43897320) until I gave Opus 4.5 a fair shake.
It still cannot solve a synchronization issue in my fairly simple online game, completely wrong analysis back to back and solutions that actually make the problem worse. Most training data is probably react slop so it struggles with this type of stuff.
But I have to give it to Amodei and his goons in the media, their marketing is top notch. Fear-mongering targeted to normies about the model knowing it is being evaluated and other sort of preaching to the developers.
Yes, as all of modern politics illustrates, once one has staked out a position on an issue it is far more important to stick to one's guns regardless of observations rather than update based on evidence.
Not hype. Opus 4.5 is actually useful to one-shot things from detailed prompts for documentation creation, it's actually functional for generating code in a meaningful way. Unfortunately it's been nerfed, and Opus 4.6 is clearly worse from my few days of working with it since release.
The use of inflection point in the entire software industry is so annoying and cringy. It's never used correctly, it's not even used correctly in the Claude post everyone is referencing.
all the ad blockers I used to use stop working, and it became an annoying game of cat and mouse that I didn't have time for. Luckily, most of the time I can "skip" the ad in like five seconds, and it gives me a moment to catch up on incoming Slack messages.
One day I visited DistroWatch.com. The site deliberately tweaked its images so ad blockers would block some "good" images. It took me awhile to figure out what was going on. The site freely admitted what it was doing. The site's point was: you're looking at my site, which I provide for free, yet you block the thing that lets me pay for the site?
I stopped using ad blockers after that. If a site has content worth paying for, I pay. If it is a horrible ad-infested hole, I don't visit it at all. Otherwise, I load ads.
Which overall means I pay for more things and visit less crap things and just visit less things period. Which is good.
Moreover you don’t even need a 0-day to fall for phishing. All you need is to be a little tired or somehow not paying attention (inb4 “it will never happen to ME, I am too smart for that”)
At $JOB IT actually bundles uBlock in all the browsers available to us, as per CIA (or one of those 3-letter agencies, might've even been the NSA) guidelines it's a very important security tool. I work in banking.
> FWIW I think LLMs are a dead end for software development
Thanks for that, and it's worth nothing FYI.
LLMs are probably the most impressive machine made in recorded human existence. Will there be a better machine? I'm 100% confident there will be, but this is without a doubt extremely valuable for a wide array of fields, including software development. Anyone claiming otherwise is just pretending at this point, maybe out of fear and/or hope, but it's a distorted view of reality.
> FWIW I think LLMs are a dead end for software development, and that the people who think otherwise are exceptionally gullible.
By this do you mean there isn't much more room for future improvement, or that you feel it is not useful in its current form for software development? I think the latter is hard position to defend, speaking as a user of it. I am definitely more productive with it now, although I'm not sure I enjoy software development as much anymore (but that is a different topic)
> By this do you mean there isn't much more room for future improvement
I don't expect that LLM technology will improve in a way that makes it significantly better . I think the training pool is poisoned, and I suspect that the large AI labs have been cooking the benchmark data for years to suspect that their models are improving more quickly than they are in reality.
That being said, I'm sure some company will figure out new strategies for deploying LLMs that will cause a significant improvement.
But I don't expect that improvements are going to come from increased training.
> [Do] you feel it is not useful in its current form for software development?
IME using LLMs for software development corrodes my intuitive understanding of an enterprise codebase.
Since the advent of LLMs, I've been asked to review many sloppy 500+/1000+ line spam PRs written by arrogant Kool-Aid drinking coworkers. If someone is convinced that Claude Code is AGI, they won't hesitate to drop a slop bomb on you.
Basically I feel that coding using LLMs degrades my understanding of what I'm working on and enables coworkers to dominate my day with spam code review requests.
> IME using LLMs for software development corrodes my intuitive understanding of an enterprise codebase.
I feel you there, I definitely notice that. I find I can output high quality software with it (if I control the design and planning, and iterate), but I lack that intuitive feel I get about how it all works in practice. Especially noticeable when debugging; I have fewer "Oh! I bet I know what is going on!" eureka moments.
I don’t understand how you can conclude that LLMs are a dead end: I’ve already seen so much useful software generated by LLMs, there’s no denying that they are a useful tool. They may not replace seniors developers, and they have their limitations, but it’s quite amazing what they already do achieve.
I notice and think about the astroturfing from time to time.
It seems so gross.
But I guess with all of the trillions of investor dollars being dumped into the businesses, it would be irresponsible to not run guerrilla PR campaigns
> FWIW I think LLMs are a dead end for software development, and that the people who think otherwise are exceptionally gullible.
I think this takes away from the main thrust of your argument which is the marketing campaign and to me makes you seem conspiratorial minded. LLMs can be both useful and also mass astroturfing can be happening.
Personally I have witnessed non coders (people who can code a little but have not done any professional software building) like my spouse do some pretty amazing things. So I don’t think it’s useless.
It can be all of:
1. It’s useful for coding
2. There’s mass social media astroturfing happening
3. There’s a massive social overhype train that should be viewed skeptically
4. Theres some genuine word of mouth and developer demand to try the latest models out of curiosity, with some driven by the hype train and irrational exuberance and some by fear for their livelihoods.
This is a very naive mindset, getting the last 10% is going to be 90% of the work. Learn from other projects that have tried and failed. I can guarantee you LibreOffice was not built with "our own and customer docs" as a test harness.
Yes the world is overpopulated. However a lot of developed societies based their framework for budgets on permanent growth, for example the idea that the young paying tax will always pay for the benefits of the elderly. However with the population rate decreasing, suddenly there is lots of older population expecting things like healthcare and pensions to be paid for them. With less and less young people to cover the cost in tax, there is a looming defecit which is very worrying to a lot of economists.
I don't care enough about the drama to deep-dive, but as far as I can tell both parties are at fault. At least Bazzite did not make a "post mortem" blog post on a project that is still active. Bit petty if you ask me
Im still of the opinion that programming should be enjoyed more as a hobby/skill now. Just like a builder cannot build an entire house by himself, programmers now cannot build without an orchestrator.
Nonetheless, you can build a cabiner just for funsies and to feel like you've accomplished something
I've been pretty bummer out by Rainbow 6 Siege X announcing they will never support Linux due to a lack of kernel-level anti-cheat support. While I can use NVIDIA shield to play from my Windows pc, id rather play something natively with friends (for context, we usually play 3v3's for funsies.
My goal is not to make an exact clone, but to make a smaller map version for 3v3 that is a bit more quick paced.
For context, it's a bomb defusal game where the main goal is intel and gadgets. You need to make the other side waste their gadgets so it comes down to a gun v gun fight.
Let's say you're a one-of-a-kind kid that already is making useful contributions, but $1 is a lot of money for you, then suddenly your work becomes useless?
It feels weird to pay for providing work anyway. Even if its LLM gunk, you're paying to work (let alone pay for your LLM).
It is a privileged solution. And a stupid one, too. Because $1 is worth a lot more for someone in India, than someone in USA. If you want to implement this more fairly, you'd be looking at something like GDP or BBP plus geolock. Streaming services perfected this mechanism already.
This might be by design. Almost anyone writing software professionally at a level beyond junior is getting paid enough that $1 isn't a significant expense, whether in India or elsewhere. Some projects will be willing to throw collaboration and inclusivity out the window if it means cutting their PR spam by 90% and only reducing their pool of available professional contributors by 5%.
Indian here. You are correct. Expecting any employed Indian software developer to not be able to spare 1$ is stupid. Like how exactly poor do you think we are?!
You misunderstood the point. The point isn't that you are poor. The point is that the burden of the money lies on average heavier on you than someone from USA. This creates an uneven playing field.
I like to compare it with donations. If you get a USD donated, that is the same USD regardless of who gave it. Right? Right?!? Either way you don't know how heavy the burden is on the person who donated. You probably don't care. But it matters to the person who donated.
A $1 fee is fine for Indian software developers and it kills the spam. If it's a greater burden for people in India than the US, well, not all solutions are perfect, but some are useful.
Because it discriminates a marginalized group which is by tradition very important to the FOSS community: students
Also, no it wouldn't kill spam. The spam would be moved to pwned machines where the owner would suddenly have an incentive (financial) to fix the system, if they know.
What remains is people who would be so rich that $1 means nothing to them. Ie. white collar criminals who are already rich enough to not care.
I think the point was that if an aspirational minimum wage worker on a borrowed computer wants to put up a PR then it would cost them less than ten minutes of wages to afford $1USD in the US, while the same worker in India would need to put up about half a day's wages.
This is very noble in theory, but in practice you're not going to get many high-quality PRs from someone who's never been paid to write software and has no financial support.
so we continue to make the rich richer and the broke students struggle more to get valuable experience. Very easy to point in 10-20 years under the coming "engineer crisis" why 'suddenly' can't support the systems we built.
Students don't have a lot of money to burn here. They're borrowing money to study. You'll miss out on them. However, you're unlikely to notice. I mean, there is no control group in such experiment.
I think the open source ecosystem would definitely notice long-term. Most people who become regular contributors start out in university or earlier - that's wen you have the most time to spend on hobbies like oss.
>contributing to an open source project that you're likely already benefiting from.
Yes, but many people benefit for free. You see the backwards incentives of making the most interested (i.e. the ones who may provide the most work to your project) pay?
And none of that even guarantee support. Meanwhile you donate more and you get to tell people what the build. It's all out of what.
4. Do not refund + Auto-send discouragement response.
5. Do not refund + Block.
6. Do not refund + Block + Report SPAM (Boom!)
And typically use $1 fee, to discourage spam.
And $10 fee, for important, open, but high frequency addresses, as that covers the cost of reviewing high throughput email, so useful email did get identified and reviewed. (With the low quality communication subsidizing the high quality communication.)
The latter would be very useful in enabling in-demand contact doors to remain completely open, without being overwhelmed. Think of a CEO or other well known person, who does want an open channel of feedback from anyone, ideally, but is going to have to have someone vet feedback for the most impactful comments, and summarize any important trend in the rest. $10 strongly disincentives low quality communication, and covers the cost of getting value out of communication (for everyone).
It still does what I need so I'm okay with it, but I'm also on the $20 plan so it's not that big of a worry for me.
I did sense that the big wave of companies is hitting Anthropic's wallet. If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.
Anyway, getting some transparency on this would be nice.
reply