Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have been using Qwen3.5-35B-A3B a lot in local testing, and it is by far the most capable model that could fit on my machine. I think quantization technology has really upped its game around these models, and there were two quants that blew me away

Mudler APEX-I-Quality. then later I tried Byteshape Q3_K_S-3.40bpw

Both made claims that seemed too good to be true, but I couldn't find any traces of lobotomization doing long agent coding loops. with the byteshape quant I am up to 40+ t/s which is a speed that makes agents much more pleasant. On an rtx 3060 12GB and 32GB of system ram, I went from slamming all my available memory to having like 14GB to spare.



Unfortunately, llama.cpp quantization technology has been stagnant for two years. The main quantization developer left or was kicked out of llama.cpp due to an attribution dispute. He created his own fork ik_llama.cpp where he has made multiple new and better quants.

unsloth and byteshape are just using and highlighting features that have been available the whole time. I am very invested in figuring out a solution to this dispute, or some way to get the new quants upstreamed.


Now that I have tried out on a few tasks, Qwen3.6 is a huge jump in capability. It can make improvements to a project that qwen3.5 always struggled with.


Could you share more about your config? I've also got a 3060 12gb and 64gb of ram, but I've never got local models running well enough to be useful


What can and what can't it do compared to Codex and CC?


who do you compare it against qwen3.5 27b?


I haven't ran 27b that much because it only runs at like 2 tokens/sec on my computer.


Which one is best?


I would say byteshape is smaller and faster, I can’t really notice a quality difference. But I haven’t used it as much as I only started using it a few days ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: