Hacker Newsnew | past | comments | ask | show | jobs | submit | razster's commentslogin

My fear is that these large "AI" companies will lobby to have these open source options removed or banned, growing concern. I'm not sure how else to explain how much I enjoy using what HF provides, I religiously browse their site for new and exciting models to try.

ModelScope is the Chinese equivalent of Hugging Face and a good back up. All the open models are Chinese anyways

Not true! Mistral is really really good, but I agree that there isn't a single decent open model from the USA.

Arcee is working on that, see a blog post about their newest in progress model here: https://www.arcee.ai/blog/trinity-large

Its still not fully post trained and its a non-reasoning model, but its worth keeping an eye on if you dont want to use the Chinese models that currently are the best open-weight options.


Mistral is cool and I wish them success but it consistently ranks extremely low on benchmarks while still being expensive. Chinese models like DeepSeek might rank almost as low as Mistral but they are significantly cheaper. And Kimi is the best of both worlds with incredible benchmark results while still being incredibly cheap

I know things change rapidly so I'm not counting them out quite yet but I don't see them as a serious contender currently


Sure, benchmarks are fake and I use Mistral over equivalently sized models most of the time because it's better in real life. It runs plenty fast for me, I don't pay for inference.

> it consistently ranks extremely low on benchmarks

As general purpose chatbots small Mistral models are better than comparably sized Chiniese models, as they have better SimpleQA scores and general knowledge of Western culture.


It’s really hard to beat qwen coder, especially for role play where the instruction following is really useful. I don’t think their corpus is lacking in western knowledge, although I wonder if Chinese users get even better results from it?

> It’s really hard to beat qwen coder, for role play

I am not sure if you actually tried that. Mistrals are widely asccepted go-to models for roleplay and creative writing. No Qwens are good at prose, except for their latest big Qwen 3.5.

> I don’t think their corpus is lacking in western knowledge,

It absolutely does, especially pop culture knowledge.


Instruct and coder just follow instructions so well though. I guess I’ve just never been able to make mistral work well, I guess.

Qwen3 30B A3B and that big 400+ B Coder were absolutely terrible at editing fiction. I would tell them what to change in the prose and they'd just regurgitate text with no changes.

Did you try asking Gemini what model to use and how to configure/set it up? It has worked wonders for me, ironically (since I’m using a big model to setup smaller local models).

> Did you try asking Gemini what model to use and how to configure/set it up?

That would besuboptimal, as Gemini has too old knowledge cutoff. I am long past the need for such an advice anyway, as I've been using local models since mid 2024.


Gemini will search the web for most things (at least if you are using it via the web search interface), it isn’t limited to the knowledge it was trained on. Actually, I’m a bit mortified that not everyone knows this. If you ask Gemini (from the search interface) about a current event that happened yesterday, they will use search to pull in context and work with that. Also about model that was released yesterday, it can do that.

It’s only a very low level model access where search isn’t used. Local models also need to be configured to use search, and I haven't had a use case to do that yet.

Gemini seems to call this “grounding with google search”. If you have Gemini installed in your enterprise, it will also search internal data sources for context.


> Gemini will search the web for most things (at least if you are using it via the web search interface), it isn’t limited to the knowledge it was trained on.

If decides to do so, and even then baked in knowledge would influence the result.

In any case I do not need Gemini or any other LLMs to figure out setting for my llama.cpp, thank you very much.


It has always searched the web for me, and it can give me pretty good guidance about a model released in the last week. All models ATM are trying to reduce dependence on internal knowledge mostly through RAG. Anyways, this part of LLMs has gotten much better in the last 6 months.

If you are able to figure out the right settings for a model Thats was released last week, then great for you! But it sounds like you just don’t trust LLMs to use current knowledge, and have some misconception about how they satisfy recent knowledge requests.


Why are you talking price when we are talking local AI?

That doesn't make any sense to me. Am I missing something?


15 missed calls from your local power company

Your electricity is free?

Apple silicon is crazy efficient as well as being comparable to GPUs in performance for max and ultra chips.

If you have the hardware to run expensive models, is the cost of electricity much of a factor? According to Google, the average price in the Silicon Valley Area is $0.448 per kWh. An RTX 5090 costs about $4,000 and has a peak power consumption of 1000 W. Maxing out that GPU for a whole year would cost $3,925 at that rate. It's not particularly more expensive than that hardware itself.

At that point it'd be cheaper to get an expensive subscription to a cloud platform AI product. I understand the case for local LLMs but it seems silly to worry about pricing for cloud-based offerings but not worry about pricing for locally run models. Especially since running it locally can often be more expensive

for almost the entire year, yes.

To be fair there are lots of worse models than OpenAI's GPT-OSS-120b. It's not a standout when positioned next to the latest releases from China, but prior to the current wave it was considered one of the stronger local models you can reasonably run.

They can try. I don't think they'll be able to get the toothpaste back in the tube. The data will just move our of the country.

Many of the models on hugging face are already Chinese. It’s kind of obvious that local AI is going to flourish more in China than the USA due to hardware constraints.

it’s only a matter of time. we have all seen first hand how … wrong … these companies behave, almost on a regular basis.

there’s a small tinfoil hat part of me that suspects part of their obscene investments and cornering the hardware market is driven by an conscious attempt to stop open source local from taking off. they want it all, the money, the control, and to be the only source of information to us.


How do you choose which models to try for which workflows? Do you have objective tests that you run, or do you just get a feel for them while using them in your daily workflow?

Not sure if you've been keeping an eye on the front page of HN, but me thinks the AI agents are starting to post. I haven't figured it out yet but it has been getting odd around here. Might be nothing.

You mean cutting into the profit/MONEY of these large corporations? How will they survive!?

As a human living on this planet, with roughly another 50 years left, I say we allow our actions to continue. We are unable to stop those in power and with high influence from doing anything; we deserve what is coming. Earth will be fine without us. Good luck everyone!

From what I get out of this is that these models are trained on basic coding and not enterprise level where you have thousands and thousands of project files all intertwined and linked with dependencies. It didn’t have access to all of that.


The Trump Coin pushing agent kind of kills the fun.


Sir, my tin hat is on.


I'd be a bit more worried with Z-Image Edit/Base is release. Flux.2 Klein is our and its on par with Zit, and with some fine tuning can just about hit Flux.2. Adding on top of that is Qwen Image Edit 2511 for additional refinement. Anything is possible. Those folks at r/StableDiffusion and falling over the possible release of Z-Image-Omni-Base, a hold me over until actual base is out. I've heard its equal to Flux.2. Crazy time.


The MSI motherboard I use has 3, and with the PCIe expansion card installed, I have 7 m.2's. There are some expansion cards with 8 m.2 slots. You can also get SATA to m.2 devices, or my fav is USB-c drives that hold 2 m.2. Getting great speeds from that little device.


I’ve paired my Z-Image Turbo with SeedVR2 upscale, running on a RTX3060 12gb, 32gb sysMEM, generates in 40sec. I’m holding out for Z-Image Edit that is a larger model, once that is out… going to be interesting. Oh and to train your own ZIT LoRA, takes 5hrs for 3000 steps. So fast.


Z-Image Base and Z-Image Edit have been announced as being the same size (or, at least, the whole set has been announced as being in the 6B size class) as Turbo, but slower (50 steps with CFG, apparently, from the announced 100 NFEs compared to Turbo's 9 NFEs, where turbo doesn't, in the use they reference, use CFG.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: