Hacker Newsnew | past | comments | ask | show | jobs | submit | ramon156's commentslogin

All my information about this is being based on feels, because debugging isn't really feasible. Verbose mode is a mess, and there's no alternative.

It still does what I need so I'm okay with it, but I'm also on the $20 plan so it's not that big of a worry for me.

I did sense that the big wave of companies is hitting Anthropic's wallet. If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.

Anyway, getting some transparency on this would be nice.


> If you hadn't realized, a LOT of companies switched to Claude. No idea why, and this is coming from someone who loves Claude Code.

It is entirely due to Opus 4.5 being an inflection point codingwise over previous LLMs. Most of the buzz there has been organic word of mouth due to how strong it is.

Opus 4.5 is expensive to put it mildly, which makes Claude Code more compelling. But even now, token providers like Openrouter have Opus 4.5 as one of its most popular models despite the price.


Everyone and I mean everyone keeps parroting this "inflection point" marketing hype, which is so damn tiring.

Believe me, I wish it was just parroting.

The real annoying thing about Opus 4.5 is that it's impossible to publicly say "Opus 4.5 is an order of magnitude better than coding LLMs released just months before it" without sounding like a AI hype booster clickbaiting, but it's the counterintuitive truth, to my personal frustration.

I have been trying to break this damn model since its November release by giving it complex and seemingly impossible coding tasks but this asshole keeps doing them correctly. GPT-5.3-Codex has been the same relative to GPT-5.2-Codex, which just makes me even more frustrated.


Weird, I broke Opus 4.5 pretty easily by giving some code, a build system, and integration tests that demonstrate the bug.

CC confidently iterated until it discovered the issue. CC confidently communicated exactly what the bug was, a detailed step-by-step deep dive into all the sections of the code that contributed to it. CC confidently suggested a fix that it then implemented. CC declared victory after 10 minutes!

The bug was still there.

I’m willing to admit I might be “holding it wrong”. I’ve had some successes and failures.

It’s all very impressive, but I still have yet to see how people are consistently getting CC to work for hours on end to produce good work. That still feels far fetched to me.


I don't know how to say this but either you haven't written any complex code or your definition of complex and impossible is not the same as mine, or you are "ai hyper booster clickbaiting" (your words).

It strains belief that anyone working on a moderate to large project would not have hit the edge cases and issues. Every other day I discover and have to fix a bug that was introduced by Claude/Codex previously (something implement just slightly incorrect or with just a slightly wrong expectation).

Every engineer I know working "mid-to-hard" problems (FANG and FANG adjacent) has broken every LLM including Opus 4.6, Gemini 3 Pro, and GPT-5.2-Codex on routine tasks. Granted the models have a very high success rate nowadays but they fail in strange ways and if you're well versed in your domain, these are easy to spot.

Granted I guess if you're just saying "build this" and using "it runs and looks fine" as the benchmark then OK.

All this is not to say Opus 4.5/6 are bad, not by a long shot, but your statement is difficult to parse as someone who's been coding a very long time and uses these agents daily. They're awesome but myopic.


I resent your implication that I am baselessly hyping. I've open sourced a few Opus 4.5-coded projects (https://news.ycombinator.com/item?id=46543359) (https://news.ycombinator.com/item?id=46682115) that while not moderate-to-large projects, are very niche and novel without much if any prior art. The prompts I used are included with each those projects: they did not "run and look fine" on first run, and were refined just as with normal software engineering pipelines.

You might argue I'm No True Engineer because these aren't serious projects but I'd argue most successful uses of agentic coding aren't by FANG coders.


First, very cool! Thank you for sharing some actual projects with the prompts logged.

I think you and I have different definitions of “one-shotting”. If the model has to be steered, I don’t consider that a one-shot.

And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

Honestly, your experience in these repos matches my daily experience with these models almost exactly.

I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.


> I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.

I'd be hesitant to use that as a way to evaluate things. Different systems run at different speeds. I want to see how much it can get done before it breaks, in different scenarios.


I never claimed Opus 4.5 can one-shot things? Even human-written software takes a few iterations to add/polish new features as they come to mind.

> And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.

That's less due to the model being wrong and more due to myself not knowing what I wanted because I am definitely not a UI/UX person. See my reply in the sibling thread.


Wait, are you really saying you have never had Opus 4.5 fail at a programming task you've given it? That strains credulity somewhat... and would certainly contribute to people believing you're exaggerating/hyping up Opus 4.5 beyond what can be reasonably supported.

Also, "order of magnitude better" is such plainly obvious exaggeration it does call your objectivity into question about Opus 4.5 vs. previous models and/or the competition.


Opus 4.5 does made mistakes but I've found that's more due to ambiguous/imprecise functional requirements on my end rather than an inherent flaw of the agent pipeline. Giving it more clear instructions to reduce said ambiguity almost always fixes it, so I do not consider Opus failing. One of the very few times Opus 4.5 got completely stuck was, after tracing, an issue in a dependency's library which inherently can't be fixed on my end.

I am someone who has spent a lot of time with Sonnet 4.5 before that and was a very outspoken skeptic of agentic coding (https://news.ycombinator.com/item?id=43897320) until I gave Opus 4.5 a fair shake.


It still cannot solve a synchronization issue in my fairly simple online game, completely wrong analysis back to back and solutions that actually make the problem worse. Most training data is probably react slop so it struggles with this type of stuff.

But I have to give it to Amodei and his goons in the media, their marketing is top notch. Fear-mongering targeted to normies about the model knowing it is being evaluated and other sort of preaching to the developers.


But I used to be a skeptic but now in the last month

Yes, as all of modern politics illustrates, once one has staked out a position on an issue it is far more important to stick to one's guns regardless of observations rather than update based on evidence.

I will change my mind on this in the next month.

Not hype. Opus 4.5 is actually useful to one-shot things from detailed prompts for documentation creation, it's actually functional for generating code in a meaningful way. Unfortunately it's been nerfed, and Opus 4.6 is clearly worse from my few days of working with it since release.

The use of inflection point in the entire software industry is so annoying and cringy. It's never used correctly, it's not even used correctly in the Claude post everyone is referencing.

What euphemism better describes the trend?

If it's a trend, there's not an inflection point. The inflection point would be a point where the trend breaks.

step function

No, I just think that timing wise it finally made it through everyone’s procurement process.

I can't watch a YouTube video without seeing a Claude ad. Same for friends. Safe for non-programmer friends.

The below remark is unrelated to the main topic of this thread.

Why would you even watch a YouTube video with ads?

There are ad blockers, sponsor segment blockers, etc. If you use them, it will block almost every kind of YouTube ad.


all the ad blockers I used to use stop working, and it became an annoying game of cat and mouse that I didn't have time for. Luckily, most of the time I can "skip" the ad in like five seconds, and it gives me a moment to catch up on incoming Slack messages.

There are ad extensions that just turn those 5 second ads into like 200 ms ads. They just speed them up, it's great. Looks like a random flicker.

I used to use ad blockers.

One day I visited DistroWatch.com. The site deliberately tweaked its images so ad blockers would block some "good" images. It took me awhile to figure out what was going on. The site freely admitted what it was doing. The site's point was: you're looking at my site, which I provide for free, yet you block the thing that lets me pay for the site?

I stopped using ad blockers after that. If a site has content worth paying for, I pay. If it is a horrible ad-infested hole, I don't visit it at all. Otherwise, I load ads.

Which overall means I pay for more things and visit less crap things and just visit less things period. Which is good.


Not safe, before even knowing if a site has the content you want you can be redirected to malware through ad networks

not even joking


On an up to date Safari on Mac, not a realistic concern, and if it were, I’d use security software, not an ad blocker.

0 days exist and they are exploited in the wild sometimes

An ad-blocker /is/ security software. You don’t have to take it from me, you can read from the Cybersecurity and Infrastructure Security Agency

> AT-A-GLANCE RECOMMENDATIONS

> Standardize and Secure Web Browsers

> Deploy Advertisement Blocking Software

> Isolate Web Browsers from Operating Systems

> Implement Protective Domain Name System Technologies

Literally their second recommendation on this pamphlet about securing web browsers: https://www.cisa.gov/sites/default/files/publications/Capaci...

Moreover you don’t even need a 0-day to fall for phishing. All you need is to be a little tired or somehow not paying attention (inb4 “it will never happen to ME, I am too smart for that”)


At $JOB IT actually bundles uBlock in all the browsers available to us, as per CIA (or one of those 3-letter agencies, might've even been the NSA) guidelines it's a very important security tool. I work in banking.

Modern advertisement is malware.


They have insane marketing push, across HN and reddit too btw.

NFT moment :) Where did it end btw?

I can. I use brave

> and there's no alternative.

Use the pi coding agent. Bare-bones context, easy to hack.


[flagged]


This has to be a bot account, right? 2 days old.

Yesterday "I don't know about you, but I benefit so much from using Claude at work that I would gladly pay $1,500-$2,000 per month to keep using it."


Agreed, those comments are all over the map, and so many comments in 2 days!

Agreed, those comments are all over the map, and 22 comments in 2 days!

Bots don't write like me

> FWIW I think LLMs are a dead end for software development

Thanks for that, and it's worth nothing FYI.

LLMs are probably the most impressive machine made in recorded human existence. Will there be a better machine? I'm 100% confident there will be, but this is without a doubt extremely valuable for a wide array of fields, including software development. Anyone claiming otherwise is just pretending at this point, maybe out of fear and/or hope, but it's a distorted view of reality.


> FWIW I think LLMs are a dead end for software development, and that the people who think otherwise are exceptionally gullible.

By this do you mean there isn't much more room for future improvement, or that you feel it is not useful in its current form for software development? I think the latter is hard position to defend, speaking as a user of it. I am definitely more productive with it now, although I'm not sure I enjoy software development as much anymore (but that is a different topic)


> By this do you mean there isn't much more room for future improvement

I don't expect that LLM technology will improve in a way that makes it significantly better . I think the training pool is poisoned, and I suspect that the large AI labs have been cooking the benchmark data for years to suspect that their models are improving more quickly than they are in reality.

That being said, I'm sure some company will figure out new strategies for deploying LLMs that will cause a significant improvement.

But I don't expect that improvements are going to come from increased training.

> [Do] you feel it is not useful in its current form for software development?

IME using LLMs for software development corrodes my intuitive understanding of an enterprise codebase.

Since the advent of LLMs, I've been asked to review many sloppy 500+/1000+ line spam PRs written by arrogant Kool-Aid drinking coworkers. If someone is convinced that Claude Code is AGI, they won't hesitate to drop a slop bomb on you.

Basically I feel that coding using LLMs degrades my understanding of what I'm working on and enables coworkers to dominate my day with spam code review requests.


> IME using LLMs for software development corrodes my intuitive understanding of an enterprise codebase.

I feel you there, I definitely notice that. I find I can output high quality software with it (if I control the design and planning, and iterate), but I lack that intuitive feel I get about how it all works in practice. Especially noticeable when debugging; I have fewer "Oh! I bet I know what is going on!" eureka moments.


This is a bot.

I don’t understand how you can conclude that LLMs are a dead end: I’ve already seen so much useful software generated by LLMs, there’s no denying that they are a useful tool. They may not replace seniors developers, and they have their limitations, but it’s quite amazing what they already do achieve.

Have you seen all the dogshit software generated by LLMs?

I notice and think about the astroturfing from time to time.

It seems so gross.

But I guess with all of the trillions of investor dollars being dumped into the businesses, it would be irresponsible to not run guerrilla PR campaigns


> FWIW I think LLMs are a dead end for software development, and that the people who think otherwise are exceptionally gullible.

I think this takes away from the main thrust of your argument which is the marketing campaign and to me makes you seem conspiratorial minded. LLMs can be both useful and also mass astroturfing can be happening.

Personally I have witnessed non coders (people who can code a little but have not done any professional software building) like my spouse do some pretty amazing things. So I don’t think it’s useless.

It can be all of:

1. It’s useful for coding

2. There’s mass social media astroturfing happening

3. There’s a massive social overhype train that should be viewed skeptically

4. Theres some genuine word of mouth and developer demand to try the latest models out of curiosity, with some driven by the hype train and irrational exuberance and some by fear for their livelihoods.


I'm not trying to be rhetorically effective, I'm stating my true belief

IN MY GENUINELY HELD OPINION, LLMs generate shit code and the people who disagree don't know what good code looks like.


LLMs are super efficient at generating boilerplate for lots of APIs, which is a time consuming and tedious part of programming.

> LLMs are super efficient at generating boilerplate for lots of APIs

Yes they are. This is true.

> which is a time consuming and tedious part of programming.

In my experience, this is a tedious part of programming which I do not spend very much time on.

In my experience LLM generated API boilerplate is acceptable, yet still sloppier than anything I would write by hand.

In my experience LLMs are quite bad at generating essentially every other type of code, especially if you are not generating JS/TS or HTML/CSS.


> They are aggressively manipulating social media with astroturfed accounts, in particular this site and Reddit.

This is a very naive mindset, getting the last 10% is going to be 90% of the work. Learn from other projects that have tried and failed. I can guarantee you LibreOffice was not built with "our own and customer docs" as a test harness.

At this point I agree, this is brogramming and its getting boring.

> brogramming

Is it back? I remember 2011 it had a high time in an ad agency I was working back then :-D


Is this a stupid question? Why do we want high fertility rates anyway? Isn't the world overpopulated?

Yes the world is overpopulated. However a lot of developed societies based their framework for budgets on permanent growth, for example the idea that the young paying tax will always pay for the benefits of the elderly. However with the population rate decreasing, suddenly there is lots of older population expecting things like healthcare and pensions to be paid for them. With less and less young people to cover the cost in tax, there is a looming defecit which is very worrying to a lot of economists.

I don't care enough about the drama to deep-dive, but as far as I can tell both parties are at fault. At least Bazzite did not make a "post mortem" blog post on a project that is still active. Bit petty if you ask me

Im still of the opinion that programming should be enjoyed more as a hobby/skill now. Just like a builder cannot build an entire house by himself, programmers now cannot build without an orchestrator.

Nonetheless, you can build a cabiner just for funsies and to feel like you've accomplished something


Should be blame the majority of the users, or should we accept times change?

Codeberg is close to what i need

Finally trying out Godot on a real project.

I've been pretty bummer out by Rainbow 6 Siege X announcing they will never support Linux due to a lack of kernel-level anti-cheat support. While I can use NVIDIA shield to play from my Windows pc, id rather play something natively with friends (for context, we usually play 3v3's for funsies.

My goal is not to make an exact clone, but to make a smaller map version for 3v3 that is a bit more quick paced.

For context, it's a bomb defusal game where the main goal is intel and gadgets. You need to make the other side waste their gadgets so it comes down to a gun v gun fight.


Sorry, but this seems like a privileged solution.

Let's say you're a one-of-a-kind kid that already is making useful contributions, but $1 is a lot of money for you, then suddenly your work becomes useless?

It feels weird to pay for providing work anyway. Even if its LLM gunk, you're paying to work (let alone pay for your LLM).


It is a privileged solution. And a stupid one, too. Because $1 is worth a lot more for someone in India, than someone in USA. If you want to implement this more fairly, you'd be looking at something like GDP or BBP plus geolock. Streaming services perfected this mechanism already.

This might be by design. Almost anyone writing software professionally at a level beyond junior is getting paid enough that $1 isn't a significant expense, whether in India or elsewhere. Some projects will be willing to throw collaboration and inclusivity out the window if it means cutting their PR spam by 90% and only reducing their pool of available professional contributors by 5%.

Indian here. You are correct. Expecting any employed Indian software developer to not be able to spare 1$ is stupid. Like how exactly poor do you think we are?!

It's not that outrageous. Apparently, 90% of India is living on less than $10 per day (https://ourworldindata.org/grapher/share-living-with-less-th...)

I suspect most of these people are not software engineers with a computer?

>Like how exactly poor do you think we are?!

I get laid off and suddenly I'm poor and am weighing optins. And I'm American.


You misunderstood the point. The point isn't that you are poor. The point is that the burden of the money lies on average heavier on you than someone from USA. This creates an uneven playing field.

I like to compare it with donations. If you get a USD donated, that is the same USD regardless of who gave it. Right? Right?!? Either way you don't know how heavy the burden is on the person who donated. You probably don't care. But it matters to the person who donated.


Why let the perfect be the enemy of the good?

A $1 fee is fine for Indian software developers and it kills the spam. If it's a greater burden for people in India than the US, well, not all solutions are perfect, but some are useful.


Because it discriminates a marginalized group which is by tradition very important to the FOSS community: students

Also, no it wouldn't kill spam. The spam would be moved to pwned machines where the owner would suddenly have an incentive (financial) to fix the system, if they know.

What remains is people who would be so rich that $1 means nothing to them. Ie. white collar criminals who are already rich enough to not care.


I think the point was that if an aspirational minimum wage worker on a borrowed computer wants to put up a PR then it would cost them less than ten minutes of wages to afford $1USD in the US, while the same worker in India would need to put up about half a day's wages.

This is very noble in theory, but in practice you're not going to get many high-quality PRs from someone who's never been paid to write software and has no financial support.


so we continue to make the rich richer and the broke students struggle more to get valuable experience. Very easy to point in 10-20 years under the coming "engineer crisis" why 'suddenly' can't support the systems we built.

So only employed software developers are allowed to make PRs?

I've contributed almost full time to free software as a student. When I became a professional software developer, suddenly I lost the time to do it.

Students don't have a lot of money to burn here. They're borrowing money to study. You'll miss out on them. However, you're unlikely to notice. I mean, there is no control group in such experiment.

I think the open source ecosystem would definitely notice long-term. Most people who become regular contributors start out in university or earlier - that's wen you have the most time to spend on hobbies like oss.

Not that word, in the context of contributing to an open source project that you're likely already benefiting from.

ie, if you want to contribute code, you must also contribute financially.


>contributing to an open source project that you're likely already benefiting from.

Yes, but many people benefit for free. You see the backwards incentives of making the most interested (i.e. the ones who may provide the most work to your project) pay?

And none of that even guarantee support. Meanwhile you donate more and you get to tell people what the build. It's all out of what.


It is simple: we simply add a whitelist for the child prodigies.

You get it refunded

The default could should be to refund.

That would make not-refunding culturally crass unless it was warranted.

With manual options for:

0. (Default, refund)

1. (Default refund) + Auto-send discouragement response. (But allow it.)

2. (Default refund) + Block.

3. Do not refund

4. Do not refund + Auto-send discouragement response.

5. Do not refund + Block.

6. Do not refund + Block + Report SPAM (Boom!)

And typically use $1 fee, to discourage spam.

And $10 fee, for important, open, but high frequency addresses, as that covers the cost of reviewing high throughput email, so useful email did get identified and reviewed. (With the low quality communication subsidizing the high quality communication.)

The latter would be very useful in enabling in-demand contact doors to remain completely open, without being overwhelmed. Think of a CEO or other well known person, who does want an open channel of feedback from anyone, ideally, but is going to have to have someone vet feedback for the most impactful comments, and summarize any important trend in the rest. $10 strongly disincentives low quality communication, and covers the cost of getting value out of communication (for everyone).


$10 will be a honeypot for scammers.

I don't think most people are going to pay $10 to get an email through without checking.

Might be worth strongly suggesting a check, at permission time.

But I am sure you are right.

Maybe receivers don't get the money. They just get to burn whoever is sending them email they don't want? A thought anyway.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: