Keep in mind that this is the stupidest the LLM will ever be and we can expect major improvements every few months. On the other hand junior devs will always be junior devs. At some point python and C++ will be like assembly now, something that’s always out there but not something the vast majority of developers will ever need to read or write.
> Keep in mind that this is the stupidest the LLM will ever be and we can expect major improvements every few months.
We have seen no noticable improvements (at usable prices) for 7 months, when the original Sonnet 3.5 came out.
Maybe specialized hardware for LLM inference will improve so rapidly that o1 (full) will be quick and cheap enough a year from now, but it seems extremely unlikely. For the end user, the top models hadn't gotten cheaper for kore than a year until the release of Deepseek v3 a few weeks ago. Even that is currently very slow at non-Deepseek providers, and who knows just how subsidized the pricing and speed at Deepseek itself is, given political interests.
For my caveat "at usable prices", no, there haven't been any. o1 (full) and now o3 have been advancements, but are hardly available for real-world use given limitations and pricing.
> we can expect major improvements every few months.
I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5. I do believe things will improve over time, mainly due to advancements in hardware. With better hardware, we can better brute force correct answers.
> junior devs will always be junior devs
Junior developers turn into senior developers over time.
> I'm not sure this is grounded in reality. We've already seen articles related to how OpenAI is behind schedule with GPT-5.
Progress by Google, meta, Microsoft, Qwen and Deepseek is unhampered by OpenAI’s schedule. Their latest — including Gemini 2.0, Llama 3.3, Phi 4 — and the coding fine tunes that follow are all pretty good.
Sure, but if the advancements are to catch up to OpenAI, then major improvements by other vendors are nice and all, but I don't believe that was what the commenter was implying. Right now the leaders in my opinion are OpenAI and Anthropic and unless they are making major improvements every few months, the industry as a whole is not making major improvements.
OpenAI and Anthropic are definitely among the leaders. Playing catch-up to these leaders' mind-share and technology is some of the motivation for others. Calling the progress being made in the space by Google (Gemini), MSFT (Phi), Meta (llama), Alibaba (Qwen) "nice and all" is a position you might be pleasantly surprised to reconsider if this technology interests you. And don't sleep on Apple and AMZ -
In the space covered by Tabby, Copilot, aider, Continue and others, capabilities continue to improve considerably month-over-month.
In the segments of the industry I care most about, I agree 100% with what the commenter said w/r/t expecting major improvements every few months. Pay even passing attention to huggingface and github and see work being done by indies as well as corporate behemoths happening at breakneck pace. Some work is pushing the SOTA. Some is making the SOTA more widely available. Lots of it is different approaches to solving similar challenges. Most of it benefits consumers and creators looking use and learn from all of this.
I wish this was true as being a shitty programmer who is old , I would benefit from this as much as anyone here but I think it is delusional.
From my experience I wouldn't even say LLMs are stupid. The LLM is a carrier and the intelligence is in the training data. Unfortunately, the training data is not going to get smarter.
If any of this had anything to do with reality then we should already have a programming specific model only trained on CS and math textbooks that is awesome. Of course, that doesn't work because the LLM is not abstracting the concepts how we normally think of in order to be stupid or intelligent.
It hardly shocking that next token prediction on math and CS textbooks is of limited use. You hardly have to think about it to see how flawed the whole idea is.