On llama server, the Q4_K_M is giving about 91k context on 24GB, which calculates to about 70MB per 1K context (KV-Cache). I could have gone for Q5 which probably leaves about 30K token space. I think this is pretty impressive.
I have been getting good results with IQ4_NL and TurboQuant at 4bits on 24gb (3090). It easily fits 256k with that setup, but it starts slowing down quite a bit after 80-100k. Quality in my testing is also still good: