As if anyone riding this wave and making billions is not sitting on top of thousands of papers and millions of lines of open source code. And as if releasing llama is one of the main reasons we got here in AI…
I’m almost shocked this spooked the market as much as it did, as if the market was so blind to past technological innovation to not see this coming.
Innovation ALWAYS follows this path. Something is invented in a research capacity. Someone implements it for the ultra rich. The price comes down and it becomes commoditized. It was inevitable that “good enough” models became ultra cheap to run as they were refined and made efficient. Anybody looking at LLMs could see they were a brute forced result wasting untold power because they “worked” despite how much overkill they were to get to the end result. Them becoming lean was the obvious next step, now that they had gotten pretty good to the point of some diminishing returns.
sure, but what nobody expected how QUICKLY the efficiency progress has been - aviation took about 30 years to progress from "the rich" to "everybody", personal computers about 20 years (from 1980s to 2000s), I think the market expected to have at least 10 years of "rich premium" - not 2 years and get taken to the cleaners by the economic archenemy, China
The Google transformer paper was 2017. ChatGPT was the “we can give a version away of this for free.” Llama was “we can afford to give away the whole product for free to even the playing field.” Every tech giant comes out with a comparable product simultaneously. And now a hedge fund, not even a megacap company, can churn out a clone by hiring a small or medium size engineering team.
Really this should be an indictment of corporate bloat, having hundreds of thousand headcount companies distracted by performance reviews, shareholders, marketing, rebuilding the same product they launched two years ago under a new name.
>Really this should be an indictment of corporate bloat, having hundreds of thousand headcount companies distracted by performance reviews, shareholders, marketing, rebuilding the same product they launched two years ago under a new name.
Yeah.
There are some shorter words or acronyms for it though, roughly equivalent to your about 30-word paragraph above:
IBM
DEC
Novell
Oracle
MS
Sun
HP
...
MBA
, all in their worse days or incarnations or ...
The notion I now believe more fully is that the money people - managers, executives, investors and shareholders - like to hear about things in units they understand (so money). They don't understand the science, or the maths and in so much as they might acknowledge it exists it's an ambient concern: those things happen anyway (as far as they can tell), and so they don't know how to value them (or don't value them).
Because we saw, what a week ago the leading indicator that the money people were now feeling happy they were in charge which was that weird not-government US$500 billion investment in AI announcement. And we saw the same being breathlessly reported when Elon Musk founded xAI and had "built the largest AI computer cluster!"...as though that statement actually meant anything?
There was a whole heavily implied analogy going on of "more money (via GPUs) === more powerful AIs!" - ignoring any reality of how those systems worked, their scaling rules or the fact that inferrence tended to run on exactly 1 GPU.
Even the internet activist types bought into this, because people complaining about image generators just could not be convinced that the Stable Diffusion models ran locally on extremely limited hardware (the number of arguments where people would discuss this and imply a gate while I'm sitting their with the web GUI in another window on my 4 year old PC).
I would generally agree, but the market isn't rational about the future prospects of a company. It's rational about "can I make money off this stock" and nothing else matters in the slightest.
Riding hype, and dumping at the first sign of issues, follows that perfectly well.
Sure but it's good to recognize Meta never stopped publishing even after Openai and deepmind most notably stopped sharing the good sauce. From clip to dinov2 and llama series, it's a serious track to be remembered.
But there is a big difference, llama is still way behind chatgpt and one of the key reasons to open source it could have been to use open source community to catch up with chatgpt. Deepseek on contrary is already at par with chatgpt.
R1 distills are still very very good. I've used Llama 405b and I would say dsr1-32b is about the same quality, or maybe a bit worse (subjectively within error) and the 70b distill is better.
Right, so it sounds like it's working then given how much people are starting to care about them in this sphere.
We can laugh at that (like I like to do with everything from Facebook's React to Zuck's MMA training), or you can see how others (like Deepseek and to a lesser extent, Mistral, and to an even lesser extent, Claude) are doing the same thing to help themselves (and each other) catch up. What they're doing now, by opening these models, will be felt for years to come. It's draining OpenAI's moat.
There's no need to read it uncharitably. I'm the last person you can call a FB fan, I think overall they're a strong net negative to society, but their open source DL work is quite nice.
This. Even their less known work is pretty solid[1] ( used it the other day and was frankly kinda amazed at how well it performed under the circumstances ). Facebook/Meta sucks like most social madia does, but, not unlike Elon Musk, they are on the record of having some contributions to society as a whole.
<< And as if releasing llama is one of the main reasons we got here in AI…
Wait.. are you saying it wasn't? Just releasing it in that form was a big deal ( and heavily discussed on HN, when it happened ). Not to mention, a lot of the work that followed on llama partly because it let researches and curious people dig deeper into internals.
As if anyone riding this wave and making billions is not sitting on top of thousands of papers and millions of lines of open source code. And as if releasing llama is one of the main reasons we got here in AI…