Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If training runs are now on the $6MM/run for SOTA model scale, I think on the contrary: closed labs are screwed, in the same way that Linux clobbered Windows for server-side deployments. Why couldn't Windows just copy whatever Linux did? Well, the codebases and research directions diverged, and additionally MS had to profit off of licensing, so for wide-scale deployments Linux was cheaper and it was faster to ship a fix for your problem by contributing a patch than it was to beg and wait for MS... Causing a virtuous cycle (or, for Microsoft, a vicious cycle) where high-tech companies with the skills to operate Linux deployments collaborated on improving Linux, and as a result saw much lower costs for their large deployments, while also having improved flexibility, which then incentivized more companies to do the same. The open models are becoming much cheaper, and if you want something different you can just run your own finetune on your own hardware.

Worse for the proprietary labs is how much they've trumpeted safety regulations. They can't just release a model without extensive safety testing, or else their entire regulatory push falls apart. DeepSeek can just post a new model to Hugging Face whenever they feel like it — most of their Tiananmen-style filtering isn't at the model level, it's done manually at their API layer. Ditto for anyone running finetunes. In fact, circumventing filtering is one of the most common reasons to run a finetune... A week after R1's release, there are already uncensored versions of the Llama and Qwen distills published on HF. The open source ecosystem publishes faster.

With massively expensive training runs, you could imagine a world where model development remained very centralized and thus the few big labs would easily fend off open-source competition: after all, who would give away the results of their $100MM investment? Pray that Zuck continues? But if the training runs are cheap... Well, there are lots of players who might be interested in cutting out the legs from the centralized big labs. High Flyer — the quant firm that owns DeepSeek — no longer is dependent on OpenAI for any future trading projects that use LLMs, for the cost of $6MM... Not to mention being immune from any future U.S. export controls around access to LLMs. That seems very worthwhile!

As LeCun says: DeepSeek benefitted from Llama, and the next version of Llama will likely benefit from DeepSeek (i.e. massively reduced training costs). As a result, there's incentive for both companies to continue to publish their results and techniques, and that's bad news for the proprietary labs who need the LLMs themselves to be profitable and not just the application of LLMs to be profitable... Because the open models will continue eating their margins away, at least for large-scale deployments by competent tech companies (i.e. like Linux on servers).



> Why couldn't Windows just copy whatever Linux did?

They kinda did: https://en.wikipedia.org/wiki/Azure_Linux


Azure Linux is Linux. Microsoft is one of the biggest contributors to Linux in general, in terms of commits/release, and has been for a lot of years now. That doesn't mean Windows is doing what Linux did - Windows is largely still entirely different from Linux at both the kernel and user's pace level, and improvements in one have little to no bearing on the other.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: