I'd assume the person giving the praise is at least a bit of all 3.
> It’s a weird catch-22 giving praise like that to LLMs.
It's a bit asymmetrical though isn't it -- judging quality is in fact much easier than producing it.
> you might be able to intuit and fill in the gaps left my the LLM and not even know it
Just because you are able to fill gaps with it doesn't mean it's not good. With all of these tools you basically have to fill gaps. There are still differences between Cline vs Cursor vs Aider vs Codebuff.
Personally I've found Cline to be the best to date, followed by Cursor.
> There’s still a skill floor required to accurately judge something.
Sure but it's not high at all.
Your typical sysadmin is doing a lot of Googling. If perplexity can tell you exactly what to do 90% of the time without error, that's a pretty good sysadmin.
Your typical programmer is doing a lot of googling and write-eval loops. If you are doing many flawless write-eval loops with the help of cline, cline is a pretty good programmer.
A lot of things AI is helping with also have good, easy to observe / generate, real-time metrics you can use to judge excellence.
It depends. For a sysadmin maybe not, but for data scientists, the bar would be pretty high just to understand the math jargon.
> If perplexity can tell you exactly what to do 90% of the time without error
That “if” is carrying a lot of weight. Anecdotally I haven’t seen any llm be correct 90% of the time. IIRC SOTA on swebench (which tbf isn’t a great benchmark) is around 30%.
> flawless write-eval loops with the help of cline, cline is a pretty good programmer.
I’m not really sure what you mean by “flawless” but having a rubber duck is always more helpful than harmful.
> A lot of things AI is helping with also have good, easy to observe / generate, real-time metrics you can use to judge excellence.
> A lot of things AI is helping with also have good, easy to observe / generate, real-time metrics you can use to judge excellence.
Exactly what I illustrated earlier: your developer productivity metrics. If you're turning code around faster, setting up your network better, turning around insights faster, the AI is working.
> It depends. For a sysadmin maybe not, but for data scientists, the bar would be pretty high just to understand the math jargon.
Why does an AI coding agent need to understand math jargon -- it just helps you write better code. Are you even familiar with what data scientists do? Seems not because if you were, you'd see clearly where the tool would be applied and do a good/bad job.
Reminder: we're talking about evaluating whether Codebuff / alternatives are "pretty good" at X. Just go play with the tools.
tgtweak expressed their opinion on how good the tool rates at some tasks {sysadmin, data engineering, cloud architecture} and your response was to question how someone could have an opinion about it. The obvious answer is that they used the tools and found it useful for those tasks. It may only be _subjectively_ good at what they're using for but it's also a rando's opinion on the internet. As another rando I very much agree with what the person you responded to is saying. You're not going to get more rigor from this discourse - go form a real opinion of your own.
I would consider myself adept at all three, not top 1% in either but the intersection of all 3 easily.
Context I have hired hundreds of engineers and built many engineering teams from scratch to 50+, and have been doing systems administration, solutions architecture, infrastructure design, devops, cloud orchestration and data platform design for 25 years.
I'm not bluffing when I say Claude's latest sonnet model and Cline in vscode has really been 99th percentile good on everything I've thrown at it (with some direction, as needed) and has done more productive, quality work than a team of 10 engineers in the last week alone.
If you haven't tried it I can understand your pessimism.
I haven’t built engineering teams, but I’ve been in the server programming field for 15 years.
I have tried Claude (with aider) for programming tasks and have been impressed that it could do anything (with handholding) but haven’t been convinced that it’s something that will change how I write code forever.
It’s nice that I can describe how to graph some data in a csv and get 80% of the way there in python after a few rounds of clarification. Claude refused to use seaborn for some reason, but that’s no big deal.
Every time I’ve tried using it for work, though, I was sorely disappointed.
I recently convinced myself that it was pretty helpful in building a yjs backed text editor, but last week realized that it led me down an incorrect path with regards to the prosemirror plugin and I had to rewrite a good chunk of the code.
It’s a weird catch-22 giving praise like that to LLMs.
If you are, then you might be able to intuit and fill in the gaps left my the LLM and not even know it.
And if you’re not, then how could you judge?
Not really much to do with that you were saying, really, just a thought I had.