As ClickHouse® deployments grow beyond a single server, ensuring high availability, scalability, and fault tolerance becomes essential. While a standalone node can process billions of rows efficiently, production environments often require multiple servers working together as a cluster.

This is where ClickHouse Keeper comes in.

ClickHouse Keeper is the modern coordination service for ClickHouse®. It replaces the need for an external Apache ZooKeeper deployment while providing the same coordination capabilities required for replication, distributed DDL, and cluster management.

In this article, we'll build a production-style ClickHouse® cluster and understand how Keeper enables reliable replication across multiple nodes.

Why Clustering?

A single server has limitations. As workloads increase, organizations typically need:

High availability
Data replication
Horizontal scalability
Fault tolerance
Cluster-wide query execution

ClickHouse addresses these requirements by separating data storage from cluster coordination.

Unlike traditional distributed databases, Keeper does not store your actual data. Instead, it manages metadata required to keep replicas synchronized.

Keeper stores:

Replica metadata
Part metadata
Leader election information
Distributed DDL queue entries
Replication state

Your actual table data always remains on the ClickHouse servers.

Understanding Cluster Architecture

A ClickHouse cluster is built from three primary components:

Shards
Replicas
Keeper Nodes

A shard stores a subset of your data.

A replica maintains a copy of that shard to provide redundancy and high availability.

Example:

Shard 1
├── Replica A
└── Replica B

Shard 2
├── Replica A
└── Replica B

In this architecture:

Sharding distributes data horizontally.
Replication protects against node failures.
Keeper coordinates replication metadata.

Example Deployment

For this guide we'll use:

Keeper Nodes

keeper1.example.com
keeper2.example.com
keeper3.example.com

ClickHouse Nodes

ch-node1.example.com
ch-node2.example.com

Cluster topology:

1 Shard
2 Replicas
3 Keeper Nodes

Architecture:

                +-------------+
                |  Keeper 1   |
                +-------------+
                       |
                +-------------+
                |  Keeper 2   |
                +-------------+
                       |
                +-------------+
                |  Keeper 3   |
                +-------------+

                       ▲
                       │
        ┌──────────────┴──────────────┐
        │                             │

+------------------+       +------------------+
|   ch-node1       |       |   ch-node2       |
|    Replica 1     |       |    Replica 2     |
+------------------+       +------------------+

Configuring ClickHouse Keeper

Each Keeper node requires a unique server ID.

Example configuration for keeper1:

<clickhouse>
    <keeper_server>
        <tcp_port>9181</tcp_port>

        <server_id>1</server_id>

        <log_storage_path>
            /var/lib/clickhouse/coordination/log
        </log_storage_path>

        <snapshot_storage_path>
            /var/lib/clickhouse/coordination/snapshots
        </snapshot_storage_path>

        <raft_configuration>
            <server>
                <id>1</id>
                <hostname>keeper1.example.com</hostname>
                <port>9234</port>
            </server>

            <server>
                <id>2</id>
                <hostname>keeper2.example.com</hostname>
                <port>9234</port>
            </server>

            <server>
                <id>3</id>
                <hostname>keeper3.example.com</hostname>
                <port>9234</port>
            </server>
        </raft_configuration>
    </keeper_server>
</clickhouse>

For the remaining Keeper nodes, only the server_id changes.

keeper2

<server_id>2</server_id>

keeper3

<server_id>3</server_id>

Restart ClickHouse on every Keeper node after updating the configuration.

Configuring Keeper Connectivity

Each ClickHouse server must know how to connect to the Keeper cluster.

Create a configuration file similar to:

<clickhouse>
    <zookeeper>
        <node>
            <host>keeper1.example.com</host>
            <port>9181</port>
        </node>

        <node>
            <host>keeper2.example.com</host>
            <port>9181</port>
        </node>

        <node>
            <host>keeper3.example.com</host>
            <port>9181</port>
        </node>
    </zookeeper>
</clickhouse>

Although you're using ClickHouse Keeper, the configuration section remains named <zookeeper> because Keeper implements the ZooKeeper protocol.

Defining the Cluster

Next, define the cluster on every ClickHouse node.

<clickhouse>
    <remote_servers>
        <analytics_cluster>
            <shard>
                <internal_replication>true</internal_replication>

                <replica>
                    <host>ch-node1.example.com</host>
                    <port>9000</port>
                </replica>

                <replica>
                    <host>ch-node2.example.com</host>
                    <port>9000</port>
                </replica>
            </shard>
        </analytics_cluster>
    </remote_servers>
</clickhouse>

The setting:

<internal_replication>true</internal_replication>

ensures that replication is handled by ReplicatedMergeTree rather than sending inserts to every replica.

Configuring Macros

Macros allow the same table definition to be used across all replicas.

On ch-node1

<clickhouse>
    <macros>
        <shard>01</shard>
        <replica>ch-node1</replica>
    </macros>
</clickhouse>

On ch-node2

<clickhouse>
    <macros>
        <shard>01</shard>
        <replica>ch-node2</replica>
    </macros>
</clickhouse>

During table creation, ClickHouse automatically substitutes these values.

Verifying Cluster Discovery

Restart ClickHouse on every node and verify that the cluster is recognized.

SELECT
    cluster,
    shard_num,
    replica_num,
    host_name
FROM system.clusters
WHERE cluster = 'analytics_cluster';

Expected output:

analytics_cluster
├── ch-node1
└── ch-node2

If no rows are returned, the cluster configuration has not been loaded successfully.

Creating a Replicated Table

The recommended approach combines macros with distributed DDL.

CREATE TABLE events
ON CLUSTER analytics_cluster
(
    id UInt64,
    event_time DateTime,
    user_id UInt64
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/events',
    '{replica}'
)
ORDER BY (event_time, id);

This command automatically:

Creates the table on every node.
Registers metadata in Keeper.
Assigns unique replica names.
Starts replication automatically.

How Replication Works

Suppose data is inserted into ch-node1.

INSERT INTO events VALUES
(
    1,
    now(),
    1001
);

Replication follows this sequence:

Data is written locally.
Metadata is stored in Keeper.
Other replicas detect the new part.
Missing parts are downloaded.
Replicas become synchronized.

Replication is asynchronous and optimized for high throughput.

Monitoring Replication

Replica health can be checked using:

SELECT
    database,
    table,
    is_leader,
    is_readonly,
    queue_size,
    absolute_delay
FROM system.replicas;

Important columns include:

Column	Description
is_leader	Indicates replica leadership
is_readonly	Whether the replica can accept writes
queue_size	Pending replication tasks
absolute_delay	Replication lag

Healthy replicas typically show:

is_readonly = 0
queue_size = 0

Inspecting Replication Tasks

To troubleshoot replication delays:

SELECT
    database,
    table,
    type,
    create_time
FROM system.replication_queue;

Common task types include:

GET_PART
MERGE_PARTS
ATTACH_PART

A continuously growing queue often indicates disk, network, or resource bottlenecks.

Creating a Distributed Table

ReplicatedMergeTree manages replication.

Distributed manages query routing.

Create a distributed table:

CREATE TABLE events_dist
ON CLUSTER analytics_cluster
AS events
ENGINE = Distributed(
    'analytics_cluster',
    currentDatabase(),
    'events',
    cityHash64(user_id)
);

Now queries automatically execute across the cluster.

Example:

SELECT count()
FROM events_dist;

Distributed DDL

Keeper also coordinates distributed schema changes.

Without distributed DDL:

CREATE TABLE ...

must be executed on every server.

With ON CLUSTER:

CREATE TABLE test
ON CLUSTER analytics_cluster
(
    id UInt64
)
ENGINE = MergeTree
ORDER BY id;

the command is stored in Keeper and executed automatically across every node.

The same mechanism applies to:

CREATE
ALTER
DROP
RENAME

Monitoring Keeper Connectivity

Verify Keeper connectivity with:

SELECT *
FROM system.zookeeper
LIMIT 10
SETTINGS allow_unrestricted_reads_from_keeper = 'true';

Successful results confirm communication between ClickHouse and Keeper.

If Keeper becomes unavailable, replicas may switch to read-only mode.

Check replica status with:

SELECT
    database,
    table,
    is_readonly
FROM system.replicas;

Common Failure Scenarios

Keeper Quorum Loss

If a majority of Keeper nodes become unavailable, a leader cannot be elected.

Example:

3 Keeper Nodes
2 Nodes Offline

Replication pauses until quorum is restored.

Replica Lag

Typical symptoms:

queue_size increasing
absolute_delay increasing

Possible causes:

Slow storage
Network bottlenecks
Heavy background merges

Misconfigured Replica Names

Each replica must have a unique identifier.

Using duplicate replica names can prevent replication from working correctly.

Using macros eliminates this risk.

Production Best Practices

For production environments:

Deploy at least three Keeper nodes.
Use macros for all replicated tables.
Use ON CLUSTER for schema management.
Continuously monitor system.replicas.
Monitor system.replication_queue.
Keep Keeper storage separate from busy data disks.
Regularly test node failure and recovery procedures.
Document shard and replica topology.

Conclusion

ClickHouse Keeper simplifies cluster management by providing built-in coordination without requiring an external ZooKeeper deployment. Combined with replicated tables, distributed DDL, and distributed tables, it enables highly available and horizontally scalable ClickHouse clusters.

Understanding how Keeper manages metadata, coordinates replicas, and automates cluster-wide operations is a key step toward building reliable production-grade analytical systems.

Day 32: Configuring ClickHouse® Clusters with ClickHouse Keeper

Why Clustering?

Understanding Cluster Architecture

Example Deployment

Keeper Nodes

ClickHouse Nodes

Configuring ClickHouse Keeper

Configuring Keeper Connectivity

Defining the Cluster

Configuring Macros

Verifying Cluster Discovery

Creating a Replicated Table

How Replication Works

Monitoring Replication

Inspecting Replication Tasks

Creating a Distributed Table

Distributed DDL

Monitoring Keeper Connectivity

Common Failure Scenarios

Keeper Quorum Loss

Replica Lag

Misconfigured Replica Names

Production Best Practices

Conclusion

Tags

Author

Stats

Published

You Might Also Like

Day 33: Understanding ClickHouse® Query Execution Plans

Day 29 of 100 Days of ClickHouse® - Understanding Distributed Tables

Day 24 of 100 Days of ClickHouse: Working with the ClickHouse HTTP API

Day 27 of 100 Days of ClickHouse® - Optimizing ClickHouse® Queries for Faster Execution

Merges and Mutations in ClickHouse®

Day 26 of 100 Days of ClickHouse: Understanding Distributed Tables