Cut your Redis memory usage by up to 90%

If you treat your Redis as a bag of single String keys, I have great news for you. By utilizing listpack, you can cut your Redis memory usage by up to 90% without dropping a single byte of your actual data. Let's assume your key is look like this user:1203:blocked 1, one key per user blocked status. It works, but most of your stored data in the memory is not your assigned data. There is plenty of overhead in a single key.

Why all those keys are expensive

When you store your data by running SET user:1203:blocked 1, it feels like you are storing 1 byte of data. But that's not true. In fact, most of the stored data is not yours, it's from Redis itself.

Here's what a single key really cost you:

Redis wrap your value using redisObject, which is about 16 bytes before your actual data.
They key itself is not your typical String. It has a small header of its own.
Every key will lives in Redis's main dict entry, that mean each key needs to reserve more memory there (roughly 24 bytes).
All of those above will also get rounds up by the memory allocator to a fixed size.

One key is harmless, but remember that your keys are expanding as your users grow.

How to save the memory

I assume your keys are like these:

SET user:10001:blocked 1
SET user:15089:blocked 1
...
SET user:10201:external_id AI27X
SET user:10590:external_id JO91P
...

One key per use case per user. With a million of users, that is a million keys that have per-key overhead.

To reduce the number of keys that live in Redis's main dict entry, we can group them into hashes and bucket them by ID range, so the hash stays small:

HSET user:blocked:1 0001 1
HSET user:blocked:1 5089 1
...
HSET user:external_id:1 0201 AI27X
HSET user:external_id:1 0590 JO91P
...

The bucket is the user ID divided by 1000, so one hash hold thousand users. To read a user you compute the bucket and use HGET like usual:

HGET user:blocked:1 0001

Same data, but a million top level keys collapse into thousand.

The proof

Don't put your faith on theory alone, let's test it your self.

First run, one key per user (baseline):

redis-cli FLUSHALL
redis-cli INFO memory | grep "used_memory:"
seq 1 1000000 | awk '{v=($1%7==0); print "SET user:"$1":blocked " v}' | redis-cli --pipe
redis-cli INFO memory | grep "used_memory:"

Result:

Second run, 1000 users per bucketed hash using hashtable:

redis-cli FLUSHALL
redis-cli CONFIG SET hash-max-listpack-entries 1
redis-cli INFO memory | grep "used_memory:"
seq 1 1000000 | awk '{v=($1%7==0); b=int(($1-1)/1000); print "HSET user:blocked:"b" "$1" "v}' | redis-cli --pipe
redis-cli INFO memory | grep "used_memory:"
redis-cli OBJECT ENCODING user:blocked:0

Result:

Third run, 1000 users per bucketed hash using listpack:

redis-cli FLUSHALL
redis-cli CONFIG SET hash-max-listpack-entries 1024
redis-cli INFO memory | grep "used_memory:"
seq 1 1000000 | awk '{v=($1%7==0); b=int(($1-1)/1000); print "HSET user:blocked:"b" "$1" "v}' | redis-cli --pipe
redis-cli INFO memory | grep "used_memory:"
redis-cli OBJECT ENCODING user:blocked:0

Result:

Layout	Encoding	Used memory	Savings vs baseline
1M individual keys	`string`	53.8 MB	—
1M in 1000-field hashes	`hashtable`	46.1 MB	≈ 14%
1M in 1000-field hashes	`listpack`	6.9 MB	≈ 87%

That's the up to 90% we talked about.

What makes the hash cheap

It's obviously because we eliminate the per value redisObject. We also merge a repeated key. user:10001:blocked shrinks into hash key 0001, and the user:blocked:1 become bucket key, paid once per thousand.
But that's not mean just because we eliminate small portion of overhead, it suddenly cuts the memory significantly. It not vanish, it relocates. The hash also has its own overhead. Now you need to pay for a dictEntry on the hash, SDS for the field and the value, and also lose the shared integer trick on the value. Sum it up and it's close to a zero. That is why the hashtable only saved us ≈ 14%.
Then comes the listpack to the spotlight. The per-field dictEntry cost is gone, the bucket array cost is gone, the SDS cost for every field and every value is replace by a compact one. Now most of the overheads are gone and you can free most of your memory.

The catch

Listpack is an optimization Redis applies on hash when the hash stays small, and it goes away the moment your hash grow past either one of two limits: hash-max-listpack-entries (default 512) and hash-max-listpack-value (default 64). The first limits how many fields a hash can hold, and the second limit the size of the value or the field (it has separate limit for both not sum).
Cross either one and Redis will turn your listpack into hashtable without warning. Even from a single value that is one byte too long, your hash will flip. The change is also one way, meaning that if you manage to update your value to under the limit, it will not go back to listpack.

The tradeoffs

There is no freebie, only tradeoffs:

Read and write to a listpack is linier O(n), that's why we need to keep it small.
Separate key can have separate TTL, but when you merge them into hash, there is no per field TTL. Redis 7.4 added per field TTL (using HEXPIRE), but they carry their own memory cost and turn your listpack into listpackEx (at least not hashtable).
More complex code to manage.

Listpack is not a new tool you need to install, not a dependency to add, not even a switch to flip. It is already there, built in your Redis. You just need to tune your code to utilize it. Find the cases that fit and your memory usage drops several fold. Lower memory means lower bills.