Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's been kind of enlightening seeing leadership at $BIGCORP push AI coding solutions like they're guaranteed to be a 10x increase in velocity in every context. Feedback from ICs isn't wholly negative - there are definitely situations where it can be useful, like quickly grokking common applications of common tools, or semi-intelligently applying a diff pattern that is more then just a regex - but there's a complete unwillingness to hear any feedback that isn't "this tech is a total paradigm shift that allows us to finally get rid of all these pesky and expensive developers". Reports of, for instance, the introduction of subtle bugs that take extended amounts of time to understand and fix, are met with outright hostility and accusations of incompetence. When a complex defect or escalation drags on, a common question is "why haven't you asked AI to fix it yet", belying a total misunderstanding of the sorts of tasks the tool is applicable to. The kool-aid is not so much drunk as rectally infused. If valuations are based on this sort of outlook, whew, this market is totally fucked.


I think a weird irony is that the model's inability to know when its response is good is both the reason why often the output is not useful, and why when it's very useful, they can't capture the value efficiently.

Like, I was encouraged to use AI assistants more after a colleague saved a bunch of time debugging some issue where copilot (IIRC) immediately identified an obscure issue. Probably in that case, we should have been willing to pay a decent amount for that one valuable response -- it may have saved a significant amount of engineer time. But I've also had copilot give me stuff that isn't even syntactically correct, or had copilot chat make up a newer version of a language and tell me to use it. Cases where it's a waste of time are worth negative dollars.


Sounds like a good ol’ fashioned case of confirmation bias. ‘Look at this one good suggestion the AI made! Wow!’… all while ignoring the many unhelpful outputs.


I don't think it's just confirmation bias where we ignore some bad results (which presumes we know up front that they're bad) -- I think because these models are specifically RLHFed to learn what we think looks good, you can't judge quality just by looking at the outputs and deciding whether they seem plausible. You actually have to do the follow-up of seeing whether they're correct/useful, which may be much more involved.

E.g. to judge the quality of a particular coding example, one may need to have/create a project in which that code would be used, install actual libraries it invokes, create data for it to operate on, etc. In cases where the assistant was basically giving me wrong information about scala 3 metaprogramming capabilities, I could only determine they were BS by actually trying to compile the program (in the context of a project with sbt config that pulls in some relevant libraries, sets appropriate flags etc).

But of course the model doesn't do this, the high-level exec doesn't do this, and so "these examples look great!" can be an honest evaluation, based on the inability to actually meaningfully validate.



Love the example of the guy using an llm all day to make a simple crud app. Basic auto generated crud apps have existed forever. I still remember showing my boss my django admin built in a day back in 2005. He told me to tell no one about this because he was afraid he would have to layoff devs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: