If they use Llama.cpp they probably run on the GPU. Apple hasn’t published much about their neural engine, so you kinda have to use it through CoreML. I assume they have some aces up their sleeves for running LLMs efficiently that haven’t told anyone yet.