Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

3.5B per weight with no quality loss is state of the art - that's an awesome optimization result (a mix of 2b and 4b weights).


I would like to see their method compared quantitatively to the best llama.cpp methods. IQ3_S has a similar bpw and pretty high quality.

I wonder if they didn't stretch the truth using the phrase "without loss in accuracy".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: