Skip to main content
aebadi
Staff
Staff
April 2, 2026

Troubleshooting Tip: How to troubleshoot ETCD error 'The member has been permanently removed from the cluster'

  • April 2, 2026
  • 0 replies
  • 169 views
Description This article describes how to troubleshoot and resolve an 'Etcd is down' condition in a FortiSIEM HA supervisor cluster.
Scope
  • FortiSIEM HA clusters (3 or more nodes).
  • Versions 7.x.x.
Solution

This issue typically occurs when:

  • A node is removed from the ETCD cluster.
  • The node still retains stale ETCD data.
  • The ETCD service fails to start or continuously restarts.

 

For the purposes of this article, the HA cluster nodes are defined as follows:

  • HA1 – 10.65.48.44 -> ETCD is down (unhealthy).
  • HA2 – 10.65.49.161 -> Node is healthy.
  • HA3 – 10.65.49.162 -> Node is healthy.

 

Running the following command shows ETCD as down or unstable:

 

phstatus 

 

Check service status:

 

system status etcdsystem status etcd

 

In some cases, the status may show 'success' or 'exit code = 1', which may be misleading, as the service is continuously restarting.


Log into a Healthy node and run the following to see the exception:


etcdctl \
--endpoints=http://10.65.49.161:2379,http://10.65.49.162:2379,http://10.65.48.44:2379 \
endpoint health -w table

 

Output:

 

{"level":"warn","ts":"2026-03-24T15:01:39.396771-0700","caller":"flags/flag.go:94","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_API=3"}
{"level":"warn","ts":"2026-03-24T15:01:44.399821-0700","logger":"client","caller":"v3@v3.6.4/retry_interceptor.go:65","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002e0000/10.65.48.44:2379","method":"/etcdserverpb.KV/Range","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing: dial tcp 10.65.48.44:2379: connect: connection refused\""}
+--------------------------+--------+--------------+---------------------------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+--------------------------+--------+--------------+---------------------------+
| http://10.65.49.162:2379 | true | 4.342763ms | |
| http://10.65.49.161:2379 | true | 4.545808ms | |
| http://10.65.48.44:2379 | false | 5.000977199s | context deadline exceeded |
+--------------------------+--------+--------------+---------------------------+
Error: unhealthy cluster

 

This indicates one node is unhealthy.

On the unhealthy node, run the following command:

 

journalctl -u etcd -n 50 --no-pager | grep error

Example output:

 

the member has been permanently removed from the cluster

 

Root cause:

This issue occurs when:

  • The node was removed from the ETCD cluster.
  • The node still contains old member ID data in /var/lib/etcd.
  • ETCD detects a mismatch and refuses to start.

 

Remediation steps:


Step 1 - Identify the stale member.

 

etcdctl member list

 

Example output:

 

{"level":"warn","ts":"2026-03-31T17:30:04.513625-0700","caller":"flags/flag.go:94","msg":"unrecognized environment variable","environment-variable":"ETCDCTL_API=3"}
6219912d81186549, started, HA-Super2_10_65_49_161, http://10.65.49.161:2380, http://10.65.49.161:2379, false
debaf717ada6ae1e, started, HA-Super3_10_65_49_162, http://10.65.49.162:2380, http://10.65.49.162:2379, false
fa288190cc028248, started, AH-SUPER_10_65_48_44, http://10.65.48.44:2380, http://10.65.48.44:2379, false

 

Broken node ID: fa288190cc028248.
 

Step 2 - Log into a Healthy node and remove the stale member.

 

etcdctl --endpoints=http://10.65.49.161:2379,http://10.65.49.162:2379 member remove fa288190cc028248

 

Output:

 

Member fa288190cc028248 removed from cluster 92445e086e4dd2e2

 

Step 3 - Re-add the node:

 

etcdctl --endpoints=http://10.65.49.161:2379,http://10.65.49.162:2379 member add AH-SUPER_10_65_48_44 --peer-urls=http://10.65.48.44:2380

 

Output:

 

Member fa288190cc028248 added to cluster

ETCD_NAME="AH-SUPER_10_65_48_44"
ETCD_INITIAL_CLUSTER="AH-SUPER_10_65_48_44=http://10.65.48.44:2380,HA-Super2_10_65_49_161=http://10.65.49.161:2380,HA-Super3_10_65_49_162=http://10.65.49.162:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.65.48.44:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

 

Log back into the broken node and stop and remove the etcd status and folders and recreate them.

 

Step 4 - Clean the stale data on the broken node CLI.

 

systemctl stop etcd

rm -rf /var/lib/etcd/*

 

Step 5 - Recreate directory.

 

mkdir -p /var/lib/etcd
chown etcd:etcd /var/lib/etcd
chmod 700 /var/lib/etcd

 

Step 6 - Update the configuration.

 

vi /etc/etcd/etcd.conf

 

Paste:

 

ETCD_NAME="AH-SUPER_10_65_48_44"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.65.48.44:2380"
ETCD_LISTEN_PEER_URLS="http://10.65.48.44:2380"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://10.65.48.44:2379"
ETCD_INITIAL_CLUSTER="AH-SUPER_10_65_48_44=http://10.65.48.44:2380,HA-Super2_10_65_49_161=http://10.65.49.161:2380,HA-Super3_10_65_49_162=http://10.65.49.162:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Step 7 - Start etcd:

 

systemctl daemon-reload
systemctl start etcd
systemctl status etcd

 

Expected output:

 

Active: active (running)
published local member to cluster through raft
ready to serve client requests

 

Step 8 - Verify cluster health:

 

etcdctl --endpoints=http://10.65.49.161:2379,http://10.65.49.162:2379,http://10.65.48.44:2379 endpoint health -w table

 

It is expected that all nodes show HEALTH = true.

 

Step 9 - Verify the member list:

 

etcdctl member list

Expected:

 

4c00291e5f9a435c, started, AH-SUPER_10_65_48_44 ...