>What I don’t like is the trend towards the way to do that being to open up network listeners with no authentication on them.
Yeah - but don't do that.
The thing about small models that can run on commodity hardware is that it breaks the business model of OpenAI and co. They hope that they can run a service that charges a fortune but provides functionality that can't be duplicated. This gives them a moat and a huge revenue engine. Quantized models and student models (trained from the big models outputs) show that the moat is likely to be transitory or partial at best. We can run Mistral 7B at about 1/300th of the cost of a call to GPT4. That makes a whole load of applications viable, but it also torpedoes the monopoly pricing model that they are hoping for.
All we need to do now is to stop people training on stolen data.
My Mac M2 is quite capable of running stable diffusion XL models and 30M parameter. LLMs under llama.cpp.
What I don’t like is the trend towards the way to do that being to open up network listeners with no authentication on them.