Skip to main content
alaxkar
Staff
Staff
June 2, 2026

Technical Tip: How to validate, restore, and restart ClickHouse on FortiSIEM

  • June 2, 2026
  • 0 replies
  • 35 views

Description

This article describes the procedure to validate ClickHouse cluster health, preserve data directories, restore replicas, restart services, and verify successful recovery following snapshot restoration, node recovery, cluster reconfiguration, or ClickHouse-related maintenance activities.

Scope

FortiSIEM.

Solution

FortiSIEM environments using ClickHouse distributed storage architecture.


Prerequisites:

  • Root shell access to all ClickHouse nodes.

  • Access to FortiSIEM GUI.

  • ClickHouse services installed and operational.

  • Administrative privileges to modify ClickHouse Cluster settings.


Procedure:

  1. Verify ClickHouse Cluster Connectivity and Replica Status

Run the following commands to verify cluster membership, shard assignments, replica health, and replication status:

clickhouse-client -q "SELECT * FROM system.clusters;"
clickhouse-client -q "SELECT host_name, shard_num, replica_num FROM system.clusters;"
clickhouse-client -q "SELECT database, table, is_readonly, is_session_expired, total_replicas, active_replicas FROM system.replicas FORMAT Vertical;"
clickhouse-client -q "SELECT * FROM system.replication_queue FORMAT Vertical;"
clickhouse-client -q "SELECT * FROM system.parts WHERE active = 1;"


Expected results:

  • All cluster nodes should be visible in system.clusters.

  • active_replicas should match total_replicas.

  • No excessive backlog should exist in the system.replication_queue.

  • Active table parts should be present and accessible.


  1. Preserve existing ClickHouse data directories.

Before making configuration changes or restoring data, create backups of ClickHouse storage directories.

cd /
tar cvzf /tmp/data-clickhouse-hot-1.tgz /data-clickhouse-hot-1
tar cvzf /tmp/data-clickhouse-warm-1.tgz /data-clickhouse-warm-1


Generate directory inventory for validation purposes:

ls -lR /data-clickhouse-hot-1 > /tmp/data-hot.out
ls -lR /data-clickhouse-warm-1 > /tmp/data-warm.out


After restoration, verify directory consistency:

diff /tmp/data-hot.out <(ls -lR /data-clickhouse-hot-1)
diff /tmp/data-warm.out <(ls -lR /data-clickhouse-warm-1)
df -h | grep data-clickhouse


Expected results:

  • No unexpected differences should appear in the directory structure.

  • ClickHouse storage volumes should be mounted and accessible.


  1. Reconfigure ClickHouse metadata and reattach Data:

Navigate to: Admin → Settings → ClickHouse Cluster.


For each ClickHouse node:

  1. Verify the assigned Data role.

  2. Correct any role inconsistencies.

  3. Select Test.

  4. Select Deploy.


If partitions require relocation between nodes:

cd /tmp
chmod +x move_all_partitions.sh
nohup sh move_all_partitions.sh <source_ip> <shard_number> <destination_ip> | tee -a /tmp/move_result.txt &


Monitor progress:

tail -f /tmp/move_result.txt


If partition distribution is unbalanced, execute:

/opt/phoenix/bin/clickhouse-rebalance-partitions


  1. Restart ClickHouse services:

Restart ClickHouse services on each ClickHouse node:

systemctl restart clickhouse-server
systemctl restart ClickHouseKeeper


Verify service status:

systemctl status clickhouse-server
systemctl status ClickHouseKeeper


If any replicated table becomes read-only after a restart, restore the replica:

clickhouse-client -q "SYSTEM RESTORE REPLICA fsiem.summary"


clickhouse-client -q "SYSTEM RESTORE REPLICA fsiem.events_replicated"


clickhouse-client -q "SYSTEM RESTORE REPLICA fsiem.mv_t_events"


For replicas experiencing synchronization issues:

clickhouse-client -q "SYSTEM RESTART REPLICA fsiem.events_replicated"


clickhouse-client -q "SYSTEM RESTORE REPLICA fsiem.events_replicated"


Verify ClickHouse Keeper health:

echo stat | nc localhost 2181


Expected results:

  • ClickHouse Keeper should report a healthy status.

  • Replicas should reconnect successfully.

  • Tables should no longer appear in read-only mode.


  1. Validate cluster recovery:

Perform final validation checks:

clickhouse-client -q "SELECT count() FROM fsiem.events_replicated;"


clickhouse-client -q "SELECT database, table, active_replicas, total_replicas FROM system.replicas FORMAT Vertical;"


Expected results:

  • Queries should complete successfully.

  • All ClickHouse nodes should be visible.

  • active_replicas should equal total_replicas for all replicated tables.

  • Replication queues should be empty or steadily decreasing.

  • Event data should be accessible across the cluster.


Verification:

The recovery procedure is considered successful when:

  • ClickHouse services are running on all nodes.

  • ClickHouse Keeper is healthy.

  • Replicated tables are writable.

  • All replicas are active.

  • Event data is available and queryable.

  • GUI ClickHouse Cluster and ClickHouse Config tests complete successfully.