I find that q6 and 5+ are subjectively as good as raw tensor files. 4 bit qualit...

Taek · on March 27, 2024

At what parameter count? Its been established that quantization has less of an effect on larger models. By the time you are at 70B quantization to 4 bits basically is negligible

2099miles · on March 28, 2024

Source? I’ve seen this anecdotally and heard it, but is there a paper you’re referencing?

K0balt · on March 28, 2024

I work mostly with mixtral and mistral 7b these days, but I did work with some 70b models before mistral came out, and I was not impressed with the 4 bit Llama-2 70b.