It is like, when you need some prediction (e.g. about market behavior), knowing that somewhere out there there is a person who will make the perfect one. However, instead of your problem being to make the prediction, now it is how to find and identify that expert. Is that type of problem that you converted yours into any less hard though?
I too had some great minor successes, the current products are definitely a great step forward. However, every time I start anything more complex I never know in advance if I end up with utterly unusable code, even after corrections (with the "AI" always confidently claiming that now it definitely fixed the problem), or something usable.
All those examples such as yours suffer from one big problem: They are selected afterwards.
To be useful, you would have to make predictions in advance and then run the "AI" and have your prediction (about its usefulness) verified.
Selecting positive examples after the work is done is not very helpful. All it does is prove that at least sometimes somebody gets something useful out of using an LLM for a complex problem. Okay? I think most people understand that by now.
PS/Edit: Also, success stories we only hear about but cannot follow and reproduce may have been somewhat useful initially, but by now most people are beyond that, willing to give it a try, and would like to have a link to the working and reproducible example. I understand that work can rarely be shared, but then those examples are not very useful any more at this point. What would add real value for readers of these discussions now is when people who say they were successful posted the full, working, reproducible example.
EDIT 2: Another thing: I see comments from people who say they did tweak CLAUDE.md and got it to work. But the point is predictability and consistency! If you have that one project where you twiddled around with the file and added random sentences that you thought could get the LLM to do what you need, that's not very useful. We already know that trying out many things sometimes yields results. But we need predictability and consistency.
We are used to being able to try stuff, and when we get it working we could almost always confidently say that we found the solution, and share it. But LLMs are not that consistent.
My point is that these are not minor successes, and not occasional. Not every attempt is equally successful, but a significant majority of my attempts are. Otherwise I wouldn't be letting it run for longer and longer without intervention.
For me this isn't one project where I've "twiddled around with the file and added random sentences". It's an increasingly systematic approach to giving it an approach to making changes, giving it regression tests, and making it make small, testable changes.
I do that because I can predict with a high rate of success that it will achieve progress for me at this point.
There are failures, but they are few, and they're usually fixed simply by starting it over again from after the last succesful change when it takes too long without passing more tests. Occasionally it requires me to turn off --dangerously-skip-permissions and guide it through a tricky part. But that is getting rarer and rarer.
No, I haven't formally documented it, so it's reasonable to be skeptical (I have however started packaging up the hooks and agents and instructions that consistently work for me on multiple projects. For now, just for a specific client, but I might do a writeup of it at some point) but at the same time, it's equally warranted to wonder whether the vast difference in reported results is down to what you suggest, or down to something you're doing differently with respect to how you're using these tools.
New hires perform consistently. Even if you can't predict beforehand how well they'll work, after a short observation time you can predict very well how they will continue to work.
But that is exactly the problem, no?
It is like, when you need some prediction (e.g. about market behavior), knowing that somewhere out there there is a person who will make the perfect one. However, instead of your problem being to make the prediction, now it is how to find and identify that expert. Is that type of problem that you converted yours into any less hard though?
I too had some great minor successes, the current products are definitely a great step forward. However, every time I start anything more complex I never know in advance if I end up with utterly unusable code, even after corrections (with the "AI" always confidently claiming that now it definitely fixed the problem), or something usable.
All those examples such as yours suffer from one big problem: They are selected afterwards.
To be useful, you would have to make predictions in advance and then run the "AI" and have your prediction (about its usefulness) verified.
Selecting positive examples after the work is done is not very helpful. All it does is prove that at least sometimes somebody gets something useful out of using an LLM for a complex problem. Okay? I think most people understand that by now.
PS/Edit: Also, success stories we only hear about but cannot follow and reproduce may have been somewhat useful initially, but by now most people are beyond that, willing to give it a try, and would like to have a link to the working and reproducible example. I understand that work can rarely be shared, but then those examples are not very useful any more at this point. What would add real value for readers of these discussions now is when people who say they were successful posted the full, working, reproducible example.
EDIT 2: Another thing: I see comments from people who say they did tweak CLAUDE.md and got it to work. But the point is predictability and consistency! If you have that one project where you twiddled around with the file and added random sentences that you thought could get the LLM to do what you need, that's not very useful. We already know that trying out many things sometimes yields results. But we need predictability and consistency.
We are used to being able to try stuff, and when we get it working we could almost always confidently say that we found the solution, and share it. But LLMs are not that consistent.