KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

Points

Comments

EGreg

Author

Top Comments

thethirdoneApr 21

> the ratio remains approximately 914x over TurboQuant, with compression improving rather than degrading as context length grows.

This line from the abstract got me really suspicious. Obviously a compression scheme that incorporates the entire sequence shouldn't get worse compared to a per element one as the length increases.

It is important to note that this paper is PURELY theoretical. I couldn't find much meat on the bone from a quick skim.

The single author, Gregory Magarshak, has only published one paper on arxiv before and appears to be a professor of business / music. I don't plan to give it more of a read hoping for something of value.

aesthesiaApr 21

> The second layer, predictive delta coding, stores only the residual of each new KV vector from the model's own prediction of it

I don't understand this. The key and value vectors for any given layer + token are created by the model. By definition, they are exactly equal to the model's prediction of them!

Extreme KV cache compression is easy to get---you can get an infinite compression ratio by just regenerating the key and value vectors on every forward pass. The point of a KV cache is to reduce the amount of repeated computation during generation, though. Compression only helps if you have an efficient decompression algorithm.

ddtaylorApr 21

Very intersting. A compression strategy that uses the model itself as the dictionary.

sabareeshApr 21

Sounds like speculative decoding but for KV cache

tomrodApr 21

Extraordinary claims! I don't follow the argument though.

Visit the Original Link

Read the full content on arxiv.org

Visit arxiv.org View on Hacker News

Source

arxiv.org

Author

EGreg

Posted

April 21, 2026 at 02:11 AM

Visit Original Hacker News Thread

KV Cache Compression 900000x Beyond TurboQuant and Per-Vector Shannon Limit

Top Comments

Visit the Original Link

Source

Author

Posted

More Top Stories

John Ternus to become Apple CEO

How to make a fast dynamic language interpreter

Jujutsu megamerges for fun and profit

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

Kimi vendor verifier – verify accuracy of inference providers

Ternary Bonsai: Top Intelligence at 1.58 Bits