Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No they don't. Why would they? Most of them are using a single inference engine, most likely developed inhouse. Or they go for something like vLLM, but llama.cpp especially is under their radar.

The reason is simple. There isn't much money in it. llama.cpp is free and targets lower end of the hardware spectrum. Corporations will run something else, or even more likely, offload the task to contractor.



The chat template issues are actually not on llama.cpp's side, but on all engines (including vLLM, SGLang etc) For eg see https://www.reddit.com/r/unsloth/comments/1l97eaz/deepseekr1... - which fixed tool calling for DeepSeek R1




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: