43
Points
34
Comments
EGreg
Author

Top Comments

thethirdoneApr 21
> the ratio remains approximately 914x over TurboQuant, with compression improving rather than degrading as context length grows.

This line from the abstract got me really suspicious. Obviously a compression scheme that incorporates the entire sequence shouldn't get worse compared to a per element one as the length increases.

It is important to note that this paper is PURELY theoretical. I couldn't find much meat on the bone from a quick skim.

The single author, Gregory Magarshak, has only published one paper on arxiv before and appears to be a professor of business / music. I don't plan to give it more of a read hoping for something of value.

aesthesiaApr 21
> The second layer, predictive delta coding, stores only the residual of each new KV vector from the model's own prediction of it

I don't understand this. The key and value vectors for any given layer + token are created by the model. By definition, they are exactly equal to the model's prediction of them!

Extreme KV cache compression is easy to get---you can get an infinite compression ratio by just regenerating the key and value vectors on every forward pass. The point of a KV cache is to reduce the amount of repeated computation during generation, though. Compression only helps if you have an efficient decompression algorithm.

ddtaylorApr 21
Very intersting. A compression strategy that uses the model itself as the dictionary.
sabareeshApr 21
Sounds like speculative decoding but for KV cache
tomrodApr 21
Extraordinary claims! I don't follow the argument though.
Visit the Original Link

Read the full content on arxiv.org

Source
arxiv.org
Author
EGreg
Posted
April 21, 2026 at 02:11 AM


More Top Stories

apple.com Apr 20
John Ternus to become Apple CEO
1398711 commentsby schappim
Details
zef-lang.dev Apr 21
How to make a fast dynamic language interpreter
878 commentsby pizlonator
Details
isaaccorbrey.com Apr 20
Jujutsu megamerges for fun and profit
18161 commentsby icorbrey
Details
qwen.ai Apr 20
Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving
570299 commentsby mfiguiere
Details
kimi.com Apr 20
Kimi vendor verifier – verify accuracy of inference providers
19617 commentsby Alifatisk
Details
prismml.com Apr 18
Ternary Bonsai: Top Intelligence at 1.58 Bits
8820 commentsby nnx
Details
👋 Need help with code?