This is the world we are entering of "commercial AI" rather than public, peer reviewed AI. No benchmarks. No discussion of pros and cons. No careful comparison with state of the art. Just big numbers and big announcements.
They released the product to the public… we might not have formal academic studies but millions of people trying it and determining it’s utility vs the competition is as good of a test as any.
If pushing the context window turns out to not be the right approach it’s not like there won’t be 10 other companies chomping at the bit to prove them wrong with their own hypothesis. And it’s entirely possible there are multiple correct answers for different usecases.
> millions of people trying it and determining it’s utility vs the competition is as good of a test as any.
Disagree. We aren't polling these people. How do I even get a distilled view of what their thoughts are?
It's a far cry from the level of evaluation that existed before. The lack of benchmarks (until the last week or so - thank you huggingface and lm-sys!) has been very noticeable.
You will get people claiming that LLaMa outperforms ChatGPT, etc. We have no sense of how performance degrades over longer sequence lengths... or even what sort of sparse attention technique they are using for longer sequences (most of which have known problems). It's absurd.
Biological evolution doesn’t do any special testing except reward whatever survives. And it works fine. Marketplaces implement the same algorithm faster and effectively.
There are many ways to find truth besides math and science.
Obviously, those two are the gold standard for difficult questions.
But when time is short (competitors at your heels), rewards are fast (lots of hype fueling prospective customers), and the tech isn’t even that hard (deep learning isn’t rocket science, lots of good ideas are panning out), then any organization that needs to acquire its own resources to survive should operate on a try-evaluate-ship loop as fast as they can.
Occasional missteps won’t be nearly as fatal as being slow and irrelevant.
Yeah, it's a weird comment to call it not "public, peer reviewed" when this article is about how it went public, giving people the opportunity to review it.
If I started selling a previously unknown cancer treatment over-the-counter in CVS, people would be justified in calling it not peer-reviewed, untested, etc. even if it is available to the public (giving people the opportunity to try it).
It could also end up like with the transition to digital cameras and megapixels. With companies adding more and more context just because the consumers minds are already imprinted with the idea that more is better. So in a few years we might have models with a window of 30 megatokens and it'll mean absolutely nothing.
It has been moved to hyper-scale engineering since a few years. The science of their engineering is still progressing (e.g LoRA is open science) , and it seems like whatever these companies are adding is not something fundamentally new (considering the success of LLaMa and the recent google memo that admits they have no moat).
And the various "Model cards" are not really in depth research but rather cursory looks at model outputs. Even the benchmarks are mostly based on standard tests designed for humans, which is not a valid way to evaluate an AI. In any case, these companies care more for the public perception of their model so they tended to release evaluations of its political-sensitivity. But that's not necessary the most interesting thing about those models nor particularly valuable science
Your comment reads to me (someone in the field) like it is informed just by reading popular articles on the topic since 2022. The "Google memo" should basically have no impact on how you are thinking about these things, imo.
The field is taking massive steps backward in just the last year when it comes to open science.
> And the various "Model cards" are not really in depth research but rather cursory looks at model output
Because they are no longer releasing any details! Not because there hasn't been any progress in the last year.
The existance of commercial products doesn't eliminate researchers ability to publish work. Also users are smart. ML-powered search has existed for many years with users voting with their feet based on black boxes and "big numbers and big announcements".
I keep seeing comments like this, but the impact in the last year on open research has been absolutely massive and negative.
The fact that these big industrial research labs have all collectively decided to take a step back from publishing anything with technical details or evaluation is bad.
I agree it is bad for researchers, but I think you should consider "comments like this" are coming from users.
AI was a highly unusual field in terms of sharing latest research. Car companies don't share their latest engine research with each other. Car users are happy with Consumer Reports and researchers shouting how degradation of Journal of Engine Research is massive and negative will land on deaf ears.
It's hard to engage in motte & bailey style conversations with different commentators.
The original GP was saying there was little impact on research. Your comment is a retreat to a more defensible position that I don't have an opinion on.
I didn't say anything about impact on research. I said that research can continue in parallel to commercial enterprises, and end users of commercial products don't need research papers with benchmarks to know what is better.
That's the main reason I don't want a new car. It would take a $20M audit to assess a new car for potentially catastrophic software defects, and almost all new cars would almost certainly fail the audit.