At 4-bit quantization it should already fit quite nicely.

Aurornis · 2026-04-22T15:42:23 1776872543

Unfortunately not with a reasonable context length.

regularfry · 2026-04-22T22:14:49 1776896089

I've got 139k context with the UD-Q4_K_XL on a 4090, q8_0 ctk/v. Could probably squeeze a little more but that's enough for me for the moment.

corysama · 2026-04-23T00:00:35 1776902435

Hey, buddy! Can I bum a command line arg list off ya?

GaggiX · 2026-04-22T16:57:14 1776877034

The model uses Gated DeltaNet and Gated Attention so the memory usage of the KV cache is very low, even at BF16 precision.

kkzz99 · 2026-04-22T16:25:30 1776875130

It really depends on what you think a reasonable context length is, but I can get 50k-60k on a 4090.