Aren't these models consistently quite large and hard to run locally? It's possi...

Aren't these models consistently quite large and hard to run locally? It's possible that future Ollama releases will allow you to dynamically manage VRAM memory in a way that enables these models to run with acceleration on even modest GPU hardware (such as by dynamically loading layers for a single 'expert' into VRAM, and opportunistically batching computations that happen to rely on the same 'expert' parameters - essentially doing manually what mmap does for you in CPU-only inference) but these 'tricks' will nonetheless come at non-trivial cost in performance.