Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Depends entirely on quantization. Q6_K with max context length (262144) is ~40GB of VRAM.

Q8 with the same context wouldn't fit in 48GB of VRAM, it did with 128k of context.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: