Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>LLaMA 13b (24.5GB) and Ministral 7b (13.6GB)

But the HW requirements state 8GB of VRAM. How do those models fit in that?



They are int4 quantized


Does int4 mean 4 bits per integer, or 4 bytes/32-bits.

If it means that weights for an LLM can be 4 bits well that's just mind boggling.


Four bits per parameter. (A parameter is what you call an integer here.)

I was skeptical of it for some time, but it seems to work because individual parameters don’t encode much information. The knowledge is embedded thanks to having a massive number of low bit parameters.


4 bits




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: