We are pretty much plateauing in base model performance since gpt4. It's mostly tooling and integration now. The target is also AGI so no matter your product you will get measured on your progress towards it. With new "sota" models popping up left and right you also have no good way of user retention because the user is mostly interested in the models performance not the funny meme generator you added. looking at you openai...
"They called me bubble boy..." - some dude at Deutsche.
So, how do you feel about the recent IMO stuff? Don't they cause a consistency problem for your view that we've plateaued-- to me at least, I felt we were something like two years away from this kind of thing.
Probably very expensive to run of course, probably ridiculously so, but they were able to solve really difficult maths problems.
What is the training cost of such human? Reliability is another concern. There is no manufacturer whom you can pay 10 billion and get few 1000 of trained processor.
>We are pretty much plateauing in base model performance since gpt4.
Reasoning models didn't even exist at the time, LLMs were struggling a lot with math at the time, now it's completely different with SOTA models, there have been massive improvements since gpt4.
The transformer paper was published in 2017, and within 8 years (less so, if i'm being honest), we have bots that passed the Turing test. To people with shorter term memories, passing the turing test was a big deal.
My point is that even if things are pleatuing, a lot of these advancements are done in step change fashion. All it takes is one or two good insights to make massive leaps, and just because things are plateauing now, it's a bad predictor for how things will be in the future.
"They called me bubble boy..." - some dude at Deutsche.