Introduction
One of the biggest reasons ClickHouse can analyze massive datasets so efficiently is its columnar storage format combined with powerful compression codecs. Compression is more than just a way to save disk space—it also reduces disk I/O, improves cache utilization, and can significantly speed up analytical queries.
ClickHouse offers several built-in codecs that are optimized for different types of data. Choosing the right codec depends on how your data changes over time and how it is queried.
In this article, you'll learn how ClickHouse compression works, when to use different codecs, and practical guidelines for optimizing storage without sacrificing performance.
Why Compression Matters
Every analytical database processes enormous amounts of data. Without compression, storage costs increase and queries spend more time reading data from disk.
Effective compression provides several advantages:
- Reduces storage requirements
- Decreases disk I/O
- Improves cache efficiency
- Speeds up analytical queries
- Lowers infrastructure costs
Because ClickHouse stores each column independently, it can compress columns much more effectively than traditional row-based databases.
Columnar Storage Makes Compression Effective
Unlike row-oriented databases, ClickHouse stores values from the same column together.
Instead of storing:
| ID | Temperature | City |
|---|---|---|
| 1 | 20 | London |
| 2 | 21 | London |
| 3 | 22 | London |
ClickHouse stores:
ID
1
2
3
Temperature
20
21
22
City
London
London
London
Since similar values are stored together, compression algorithms can identify patterns much more efficiently.
What Are Compression Codecs?
A compression codec defines how ClickHouse stores column data on disk.
Each codec is designed for particular data characteristics:
- Sequential numeric values
- Slowly changing measurements
- Floating-point time series
- Random integers
- General-purpose compression
Codecs can also be combined so that one codec transforms the data before another compresses it.
Delta Codec
The Delta codec stores the difference between consecutive values instead of storing each value directly.
Original values:
100
105
110
115
120
Delta representation:
100
5
5
5
5
Because the differences are much smaller than the original numbers, compression algorithms achieve much better results.
Best suited for
- Auto-increment IDs
- Timestamps
- Counters
- Sequential measurements
DoubleDelta Codec
DoubleDelta stores the change between consecutive deltas.
Original values:
100
110
120
130
140
Delta values:
100
10
10
10
10
DoubleDelta:
100
10
0
0
0
This works exceptionally well when values increase at a consistent rate.
Best suited for
- Event timestamps
- Time-series metrics
- Evenly spaced sequences
Gorilla Codec
The Gorilla codec was originally developed for Facebook's time-series database.
It compresses floating-point values by storing only the bits that change between consecutive numbers.
Example:
100.25
100.26
100.27
100.28
Since only a few bits change, Gorilla can achieve excellent compression while keeping decompression extremely fast.
Best suited for
- Sensor readings
- CPU metrics
- Memory utilization
- Financial prices
- IoT data
T64 Codec
T64 is designed for integer values with a relatively small numeric range.
Instead of storing full-width integers, it packs values into fewer bits whenever possible.
Example:
501
503
506
508
510
Because these values occupy a narrow range, T64 stores them much more efficiently.
Best suited for
- Small integer columns
- Status codes
- Age
- Ratings
- Enumerated values
ZSTD Compression
ZSTD (Zstandard) is ClickHouse's default and most widely used compression algorithm.
It provides an excellent balance between:
- Compression ratio
- Compression speed
- Decompression speed
- CPU utilization
In many production environments, ZSTD delivers strong performance without requiring extensive tuning.
Combining Codecs
One of ClickHouse's strengths is the ability to chain codecs together.
For example:
CODEC(Delta, ZSTD)
The process works like this:
- Delta converts values into smaller differences.
- ZSTD compresses those differences efficiently.
This combination often performs better than either codec alone.
Common combinations include:
Delta + ZSTDDoubleDelta + ZSTDGorilla + ZSTDT64 + ZSTD
Defining Codecs in a Table
Compression codecs are configured when creating a table.
Example:
CREATE TABLE metrics
(
timestamp DateTime CODEC(DoubleDelta, ZSTD),
value Float64 CODEC(Gorilla, ZSTD),
device_id UInt32 CODEC(T64, ZSTD)
)
ENGINE = MergeTree
ORDER BY timestamp;
Each column can use the codec that best matches its data characteristics.
Choosing the Right Codec
There is no single codec that works best for every workload.
Here are some practical recommendations:
| Data Type | Recommended Codec |
|---|---|
| Sequential integers | Delta + ZSTD |
| Regular timestamps | DoubleDelta + ZSTD |
| Floating-point metrics | Gorilla + ZSTD |
| Small-range integers | T64 + ZSTD |
| General-purpose data | ZSTD |
The most effective approach is to test multiple codec combinations using production-like datasets and compare both storage usage and query performance.
Best Practices
To get the best results:
- Start with ZSTD as your default codec.
- Apply Delta or DoubleDelta for sequential numeric values.
- Use Gorilla for floating-point time-series data.
- Choose T64 for compact integer ranges.
- Benchmark with real workloads before deploying changes.
- Monitor both storage savings and query latency.
Remember that higher compression is not always better if it increases CPU usage for your workload.
Conclusion
Compression is one of the key technologies behind ClickHouse's impressive performance. By combining columnar storage with specialized compression codecs, ClickHouse minimizes storage requirements while maximizing query efficiency.
Understanding when to use Delta, DoubleDelta, Gorilla, T64, and ZSTD allows you to build faster and more cost-effective analytical systems.
The best codec depends on your data—not on a universal rule. Measure, benchmark, and choose the option that delivers the right balance between storage efficiency and query performance.
Key Takeaways
- Columnar storage enables highly effective compression.
- Compression improves both storage efficiency and query performance.
- Delta and DoubleDelta work best for sequential numeric data.
- Gorilla is optimized for floating-point time-series values.
- T64 is ideal for small-range integer columns.
- ZSTD is an excellent general-purpose compression algorithm.
- Always benchmark codecs with real production data before standardizing on a compression strategy.
Next up in the #100DaysOfClickHouse series: Learn how ClickHouse uses data skipping indexes to reduce scans and accelerate analytical queries.







