No they don't. Why would they? Most of them are using a single inference engine,... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		clarionbell 6 months ago \| parent \| context \| favorite \| on: Smollm3: Smol, multilingual, long-context reasoner... No they don't. Why would they? Most of them are using a single inference engine, most likely developed inhouse. Or they go for something like vLLM, but llama.cpp especially is under their radar. The reason is simple. There isn't much money in it. llama.cpp is free and targets lower end of the hardware spectrum. Corporations will run something else, or even more likely, offload the task to contractor.

danielhanchen 6 months ago [–]

The chat template issues are actually not on llama.cpp's side, but on all engines (including vLLM, SGLang etc) For eg see https://www.reddit.com/r/unsloth/comments/1l97eaz/deepseekr1... - which fixed tool calling for DeepSeek R1

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact