Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find that q6 and 5+ are subjectively as good as raw tensor files. 4 bit quality reduction is very detectable though. Of course there must be a loss of information, but perhaps there is a noise floor or something like that.


At what parameter count? Its been established that quantization has less of an effect on larger models. By the time you are at 70B quantization to 4 bits basically is negligible


Source? I’ve seen this anecdotally and heard it, but is there a paper you’re referencing?


I work mostly with mixtral and mistral 7b these days, but I did work with some 70b models before mistral came out, and I was not impressed with the 4 bit Llama-2 70b.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: