Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.
That's vibecoding with an extra documentation step.
Also, Sonnet is not the model you'd want to use if you want to minimize cleanup. Use the best available model at the time if you want to attempt this, but even those won't vibecode everything perfectly for you. This is the reality of AI, but at least try to use the right model for the job.
> Therefore I need more time and effort with Gen AI than I needed before
Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.
That's how most non-junior engineers settle into using AI.
Ignore all of the LinkedIn and social media hype about prompting apps into existence.
EDIT: Replaced a reference to Opus and GPT-5.5 with "best available model at the time" because it was drawing a lot of low-effort arguments
> Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.
It is NOT the way to work with humans basically because most software engineers I worked with in my career were incredibly smart and were damn good at identifying edge cases and weird scenarios even when they were not told and the domain wasn't theirs to begin with. You didn't need to write lengthy several page long Jira tickets. Just a brief paragraph and that's it.
With AI, you need to spell everything out in detail. But that's NO guarantee either because these models are NOT deterministic in their output. Same prompt different output each time. That's why every chat box has that "Regenerate" button. So your output with even a correct and detailed prompt might not lead to correct output. You're just literally rolling a dice with a random number generator.
Lastly - no matter how smart and expensive the model is, the underlying working principles are the same as GPT-2. Same transformers with RL on top, same random seed, same list of probabilities of tokens and same temperature to select randomly one token to complete the output and feedback in again for the next token.
This is not true in my experience at all. I never write such detailed spec for AI - and that is my value as the human in the loop - to be iterative, to steer and make decisions. The AI in fact catches more edge cases than I do, and can point me to things that I never considered myself. Our productivity has increased manyfold, and code quality has increased significantly because writing tests is no longer a chore or an afterthought, or the biggest one for us - "test setup is too complicated". All of that is gone. And it is showing in a decrease in customer reported issues
> It is NOT the way to work with humans basically because most software engineers I worked with in my career were incredibly smart and were damn good at identifying edge cases and weird scenarios even when they were not told and the domain wasn't theirs to begin with.
I have no clue what AI you're using, but both Claude and Codex, you just explain the outcome, and they are pretty smart figuring out stuff on complex codebases.You don't even need a paragraph, just say "doing this I got an error".
> NO guarantee either because these models are NOT deterministic in their output. Same prompt different output each time.
So, exactly like humans. But a bit more predictable and way more reliable.
> That's why every chat box has that "Regenerate" button.
If you're using the chat box to write code, that's a human error, not an LLM one. Don't blame "AI" for your ignorance.
> no matter how smart and expensive the model is, the underlying working principles are the same as GPT-2.
Sure. Every machine is a smoke machine if operated wrong enough. This tells me you should not get your insight from random YT videos. As a bit of nugget, some of the underlying working principles of the chat system also powered search engines; and their engineers also drank water, like hitler.
> the underlying working principles are the same as GPT-2
I don't think anyone was claiming otherwise. Sonnet is still better at writing code than GPT-2, and worse than Opus. Workflows that work with Opus won't always work with Sonnet, just as you can't use GPT-2 in place of Sonnet to do code autocomplete.
> That's why every chat box has that "Regenerate" button.
Wait, are you doing this in the web chat interface?!
That's definitely not a good way. You need to be using a harness (like Claude Code) where the agent can plan its work, explore the codebase, execute code, run tests, etc. With this sort of set up, your prompts can be short (like 1 to 5 sentences) and still get great results.
I use claud CLI or OpenCode. The "Regenerate" example is just to illustrate that same prompt would produce different output each time. You're rolling a dice.
But that's also basically true for humans. It's harder to "prove" humans are random, but wouldn't you think a person would do things slightly differently when given the same tasks but on different days? People change their minds a lot, it's just that there's no "reconsider" button for people so you feel a bit of social friction if you pester somebody to rethink an issue. But it's no different.
I'd be really surprised if your point is that humans, unlike AI, are super deterministic and that's why they are so much more trustworthy and smarter than AI...
> Opus or GPT-5.5 are the only ways to even attempt this.
It’s pretty funny to claim that a model released 22 hours ago is the bare minimum requirement for AI-assisted programming. Of course the newest models are best at writing code, but GPT-* and Claude have written pretty decent systems for six months or so, and they’ve been good at individual snippets/edits for years.
Is it actually the case that 5.5 is that much better at implementing specs than its very capable predecessor released a month ago? Just seems like a baseless and silly claim about a model that has barely been out long enough for anyone to do serious work with it.
> Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.
You're assuming that finding the places where AI needs help isn't already a larger task than just writing it yourself. AI can be helpful in development in very limited scenarios but the main thrust of the comment above yours is that it takes longer to read and understand code than to write it and AI tooling is currently focused on writing code.
We're optimizing the easy part at the expense of the difficult part - in many cases it simply isn't worth the trouble (cases where it is helpful, imo, exist when AI is helping with code comprehension but not new code production).
> You're assuming that finding the places where AI needs help isn't already a larger task than just writing it yourself.
Not assuming anything, I'm well versed in how to do this.
Anyone who defers to having AI write massive blocks of code they don't understand is going to run into this.
You have to understand what you want and guide the AI to write it.
The AI types faster than me. I can have the idea and understand and then tell the LLM to rearrange the code or do the boring work faster than I can type it.
If you are trying to sell it, you are doing a poor job and effectively siding with OP while desperately trying to write the opposite.
Juniors are mostly better than what you write as behavior, I certainly never had to correct as much after any junior as OP writes. If you have 'boring code' in your codebase, maybe it signals not that great architecture (and I presume we don't speak about some codegens which existed since 90s at least).
Also, any senior worth their salt wants to intimately understand their code, the only way you can anyhow guarantee correctness. Man, I could go on and on and pick your statements one by one but that would take long.
This isn't about touch typing or IDE tricks. I'm an IDE power user and - reasoning aside - I used to run circles around my peers when it comes to raw code editing efficiency. This is increasingly an obsolete workflow. LLMs can execute codebase-wide refactors in seconds. You can use them as a (foot-)shotgun, or as a surgical tool.
Same with debuggers. I run into people with 10 years of experience who are still trying to printf debug complex problems that would be easy with 5 minutes in a debugger.
I think we're seeing something similar with AI: There are devs who spend a couple days trying to get AI to magically write all of their code for them and then swear it off forever, thinking they're the only people who see the reality of AI and everyone else is wrong.
At the same time - there are devs that spend two days setting up a debugger for a simple problem that would be easy with five minutes and printf. AI is a tool and it's a useful tool - it's not always the best tool for the job and the real skill is in knowing when you use it and when not to.
It's a sort of context of life that the easy problems are solved - those where an extreme answer is always correct are things we no longer even consider problems... most of the options that remain have their advantages and disadvantages so the true answer is somewhere in the middle.
Right, but then the AI doesn't have a positive ROI. In all fairness, it never has a positive ROI but now its much more negative, to the point the accountants will put an end to the experiment after year end reveals how negative it really is.
The problem I have with this take is it's focused on solving the right now problem.
Yes, it's quicker to do it yourself this time, but if we build out the artifacts to do a good enough job this time, next time it'll have all the context it needs to take a good shot at it, and if you get overtaken by AI in the meantime you've got an insane head start.
I don't believe that investing more of my time in a slower process now would result in an advantage if that other process was refined. I've toyed around with these tools and know enough to get an environment up and running so what would I gain from using them more right now if those tools may significantly change before they're adapted to more efficient usage?
I'm okay not being at the bleeding edge - I can see the remains of the companies that aggressively switch to the new best thing. Sometimes it'll pay off and sometimes it won't. I am comfortable being a person that waits until something hits a 2.0 and the advantages and disadvantages are clear before seriously considering a migration.
If you don't do it yourself and you don't get overtaken by AI, you've lost the head start to be better next time - humans learn, and they atrophy as well.
> Writing detailed specs and then giving them to an AI is not the optimal way to work with AI.
> That's vibecoding with an extra documentation step.
Read uncharitably, yeah. But you're making a big assumption that the writing of spec wasn't driven by the developer, checked by developer, adjusted by developer. Rewritten when incorrect, etc.
> You can still make the decisions, call the shots
One way to do this is to do the thinking yourself, tell it what you want it to do specifically and... get it to write a spec. You get to read what it thinks it needs to do, and then adjust or rewrite parts manually before handing off to an agent to implement. It depends on task size of course - if small or simple enough, no spec necessary.
It's a common pattern to hand off to a good instruction following model - and a fast one if possible. Gemini 3 Flash is very good at following a decent spec for example. But Sonnet is also fine.
> Stop trying to use it as all-or-nothing
Agree. Some things just aren't worth chasing at the moment. For example, in native mobile app development, it's still almost impossible to get accurate idiomatic UI that makes use of native components properly and adheres to HIG etc
this is my workflow, converse with it to write a spec. I'm reviewing the spec myself. Ask it to trace out how it would implement it. I know the codebase because it was originally written mostly by hand. Correct it with my best practices. Have it challenge my assumptions and read the code to do so. then it s usually good enough to go on it's on. the beauty of having a well defined spec is that once it's done, I can have another agent review it and it generates good feedback if it deviates from the spec at all.
I'm unsure if this is actually faster than me writing it myself, but it certainly expends less mental energy for me personally.
The real gains I'm getting are with debugging prod systems, where normally I would have to touch five different interfaces to track down an issue, I've just encompassed it all within an mcp and direct my agent on the debugging steps(check these logs, check this in the db, etc)
Sure, Opus is next level than Sonnet, but it still doesn't free OP from these handcuffs - It is reading the code, understanding it and making a mental model that's way more labour intensive.
The OP's problem was treating the situation as two extremes: Either write everything myself, or defer entirely to the AI and be forced to read it later.
I was trying to explain that this isn't how successful engineers use AI. There is a way to understand the code and what the AI is doing as you're working with it.
Writing a spec, submitting it to the AI (a second-tier model at that) and then being disappointed when it didn't do exactly what you wanted in a perfect way is a tired argument.
Is doing that faster than just writing it by hand? Remember to include the time you need to review the code afterwards. The research so far says it isn't faster. Yet people keep doubling down on it and thinking winning an Internet argument is going to matter when it hits the fan in the near future.
To be clear, I'm not saying that they can do this.
I'm saying that if you're trying to have AI write code for you and you want to do as little cleanup as possible, you have to use the best model available.
"Writing detailed specs and then giving them to an AI is not the optimal way to work with AI." Perfect. I loosely define things, and then correct it, and tell it to make the corrections, and it gets trained, but you have to constantly watch it. Its like a glorified auto-typer.
"Ignore all of the LinkedIn and social media hype about prompting apps into existence." Absolutely, its not hype, its pure marketing bullshitzen.
That's vibecoding with an extra documentation step.
Also, Sonnet is not the model you'd want to use if you want to minimize cleanup. Use the best available model at the time if you want to attempt this, but even those won't vibecode everything perfectly for you. This is the reality of AI, but at least try to use the right model for the job.
> Therefore I need more time and effort with Gen AI than I needed before
Stop trying to use it as all-or-nothing. You can still make the decisions, call the shots, write code where AI doesn't help and then use AI to speed up parts where it does help.
That's how most non-junior engineers settle into using AI.
Ignore all of the LinkedIn and social media hype about prompting apps into existence.
EDIT: Replaced a reference to Opus and GPT-5.5 with "best available model at the time" because it was drawing a lot of low-effort arguments