Introduction
One of the reasons ClickHouse® delivers exceptional analytical performance is its storage engine. Instead of updating data in place, ClickHouse continuously performs background merges and mutations to keep data organized and queries efficient.
While these operations happen automatically, they can sometimes become difficult to monitor. A growing number of table parts can lead to "Too many parts" errors, while long-running mutations may delay updates and consume significant system resources.
Unlike query execution, merges and mutations do not always provide clear visibility into their progress. Administrators often rely on manually checking system tables, making it difficult to identify bottlenecks before they impact production workloads.
In this article, you'll learn how merges and mutations work, why monitoring them matters, and how to troubleshoot common issues.
Understanding Background Merges
ClickHouse stores data in immutable parts.
Whenever new data is inserted, a new part is created. Over time, hundreds or even thousands of parts can accumulate.
To improve query performance and reduce storage overhead, ClickHouse automatically merges smaller parts into larger ones.
A typical merge process looks like this:
Insert Data
│
â–¼
Part A Part B Part C
│ │ │
└───────┴───────┘
â–¼
Background Merge
â–¼
Larger Part
Benefits of Background Merges
- Fewer files on disk
- Faster query execution
- Better compression
- Lower metadata overhead
What Are Mutations?
A mutation modifies existing data.
Operations such as:
- UPDATE
- DELETE
- MATERIALIZE COLUMN
- MATERIALIZE INDEX
are executed as background mutations rather than immediate row updates.
Example:
ALTER TABLE events
DELETE WHERE event_date < '2024-01-01';
Instead of deleting rows instantly, ClickHouse schedules a mutation that rewrites affected parts in the background.
The time required depends on:
- Table size
- Number of parts
- Available CPU
- Disk throughput
Why Monitoring Merges and Mutations Matters
When background tasks fall behind, overall database performance can degrade.
Common symptoms include:
- "Too many parts" errors
- Slow inserts
- High disk usage
- Long-running ALTER operations
- Increased CPU utilization
- Delayed DELETE or UPDATE operations
These issues often appear gradually before becoming production incidents.
Monitoring Active Merges
ClickHouse exposes active merge operations through the system.merges table.
SELECT
database,
table,
elapsed,
progress,
num_parts
FROM system.merges;
Example output:
| Database | Table | Progress | Elapsed |
|---|---|---|---|
| analytics | events | 0.72 | 35 sec |
| logs | access_logs | 0.41 | 18 sec |
Useful columns include:
- progress
- elapsed
- num_parts
- bytes_read_uncompressed
- bytes_written_uncompressed
These values help determine whether merges are progressing normally.
Monitoring Mutations
Use system.mutations to inspect mutation status.
SELECT
database,
table,
mutation_id,
command,
is_done,
parts_to_do
FROM system.mutations;
Example:
| Mutation | Status | Parts Remaining |
|---|---|---|
| mutation_25.txt | Running | 48 |
| mutation_26.txt | Completed | 0 |
Important fields include:
- is_done
- parts_to_do
- latest_failed_part
- latest_fail_reason
These columns quickly reveal stalled or failed mutations.
Detecting Too Many Parts
Having thousands of small parts negatively affects performance.
Use the following query to identify tables with excessive part counts:
SELECT
database,
table,
count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY parts DESC;
If a table contains an unusually high number of parts, background merges may not be keeping up with incoming inserts.
Common causes include:
- Very small insert batches
- Heavy concurrent ingestion
- Slow storage
- Limited background threads
Viewing Merge Queue Activity
You can also monitor merge-related metrics through ClickHouse system metrics.
Useful metrics include:
- Active merges
- Background pool utilization
- Pending merge tasks
- Mutation queue length
These metrics are especially useful when building Grafana dashboards for production monitoring.
Common Reasons Merges Fall Behind
Small Insert Sizes
Frequent tiny inserts create excessive numbers of parts.
Instead of repeatedly inserting:
1 row
1 row
1 row
Batch inserts whenever possible:
10,000 rows
Larger batches significantly reduce merge pressure.
Heavy Disk I/O
Merges continuously read and rewrite data parts.
If storage cannot keep up, merge throughput decreases.
SSD or NVMe storage greatly improves merge performance.
Large Mutations
Large UPDATE or DELETE operations require many parts to be rewritten.
Whenever possible, perform large mutations in smaller batches to reduce resource consumption.
Background Thread Saturation
ClickHouse performs merges using background worker threads.
If these workers remain fully occupied, merge tasks begin to queue.
Monitoring background thread utilization helps detect this issue early.
Best Practices
To keep merges and mutations healthy:
- Batch inserts instead of inserting individual rows.
- Monitor active part counts regularly.
- Watch for long-running mutations.
- Use fast storage for production workloads.
- Avoid frequent large DELETE operations.
- Schedule heavy maintenance during off-peak hours.
- Build dashboards around system tables for continuous monitoring.
Example Monitoring Queries
Largest tables by part count
SELECT
table,
count()
FROM system.parts
WHERE active
GROUP BY table
ORDER BY count() DESC;
Running merges
SELECT *
FROM system.merges;
Pending mutations
SELECT *
FROM system.mutations
WHERE is_done = 0;
Building Dashboards
Many production teams collect data from:
- system.merges
- system.mutations
- system.parts
- system.metrics
- system.events
A monitoring dashboard can display:
- Active merges
- Pending mutations
- Part count per table
- Merge throughput
- Background thread utilization
Continuous visibility into these metrics helps identify issues before they affect production workloads.
Conclusion
Background merges and mutations are essential to maintaining ClickHouse performance, but they often operate behind the scenes. Without proper monitoring, issues such as excessive part counts, stalled mutations, and merge backlogs can quietly grow until they affect ingestion and query performance.
By regularly inspecting system.merges, system.mutations, and system.parts, batching inserts efficiently, and monitoring background activity with dashboards, you can detect problems early and keep your ClickHouse cluster running smoothly.
Understanding these internal processes is a key step toward operating ClickHouse reliably at scale.
Read the full article -> https://www.quantrail-data.com/merges-and-mutations-in-clickhouse







