If they use Llama.cpp they probably run on the GPU. Apple hasn’t published much ...

		hmottestad on June 10, 2024 \| parent \| context \| favorite \| on: Apple's On-Device and Server Foundation Models If they use Llama.cpp they probably run on the GPU. Apple hasn’t published much about their neural engine, so you kinda have to use it through CoreML. I assume they have some aces up their sleeves for running LLMs efficiently that haven’t told anyone yet.