well, i got some gemini models running on my phone, but if i switch apps, android kills it, so the call to the server always hangs... and then the screen goes black
the new laptop only has 16GB of memory total, with another 7 dedicated to the NPU.
i tried pulling up Qwen 3 4B on it, but the max context i can get loaded is about 12k before the laptop crashes.
my next attempt is gonna be a 0.5B one, but i think ill still end up having to compress the context every call, which is my real challenge
I recommend use low quantized models first. for example anywhere between q4 and q8 gguf models. Also dont need high context to fiddle around and learn the ins and outs. for example 4k context is more then enough to figure out what you need in agentic solutions. In fact thats a good limit to impose on yourself and start developing decent automatic context management systems internally as that will be very important when making robus agentic solutions. with all that you should be able to load an llm no issues on many devices.
the new laptop only has 16GB of memory total, with another 7 dedicated to the NPU.
i tried pulling up Qwen 3 4B on it, but the max context i can get loaded is about 12k before the laptop crashes.
my next attempt is gonna be a 0.5B one, but i think ill still end up having to compress the context every call, which is my real challenge