Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because cost and speed. Smaller models can run on your phone for free, or on the cloud for pennies. An API call for a large LLM with a lot of context can cost orders of magnitude more and incur network latency


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: