|
When HA is configured on the FortiSIEM, the ETCD service is a Distributed Key-value Store service that important in database replication management. If this service has failures, FortiSIEM is not available.
Follow these steps to evaluate the health and fix the etcd service:
- Check the health on each nodes of the HA cluster
The expected output:
systemctl status etcd ● etcd.service - Etcd Server Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2025-11-25 09:21:03 CET; 2 weeks 3 days ago Main PID: 2011 (etcd) Tasks: 20 (limit: 151697) Memory: 249.1M CGroup: /system.slice/etcd.service └─2011 /usr/bin/etcd
etcdctl member list
b0e9ae9f045235bd, started, fsmSup410Node2_10_5_8_167, http://10.5.8.167:2380, http://10.5.8.167:2379, false cff64fd3c11ba5af, started, fsmSup410Node1_10_5_8_166, http://10.5.8.166:2380, http://10.5.8.166:2379, false
etcdctl endpoint --cluster health http://10.5.8.166:2379 is healthy: successfully committed proposal: took = 1.884484ms http://10.5.8.167:2379 is healthy: successfully committed proposal: took = 2.575399ms
etcdctl endpoint status --cluster -w table +------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://10.5.8.167:2379 | b0e9ae9f045235bd | 3.5.13 | 162 MB | true | false | 4 | 1463537 | 1463537 | | | http://10.5.8.166:2379 | cff64fd3c11ba5af | 3.5.13 | 162 MB | false | false | 4 | 1463537 | 1463537 | | +------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Other wrong outputs:
etcdctl endpoint status --cluster -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | http://10.5.8.x:2379 | b0e9ae9f045235bd | 3.5.13 | 2.1 GB | true | false | 2 | 441985 | 441985 | memberID:xxxx NOSPACE| | http://10.5.8.x:2379 | cff64fd3c11ba5af | 3.5.13 | 2.1 GB | false | false | 2 | 441985 | 441985 | memberID:xxxx NOSPACE| +------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
See Troubleshooting Tip: etcd NOSPCE error.
etcdctl member list {"level":"warn","ts":"2025-11-13T15:48:24.139794+0100","logger":"etcdclient","caller":"v3@v3.5.13/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcdendpoints://0xc000360000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} Error: context deadline exceeded
Follow the next steps:
- Fix 1 - Recovering ETCD: Check network port and restart.
Make sure that the network is available and a port opened between the nodes:
From the node 1 CLI:
telnet nodeX 2379
telnet nodeX 2380
Renew the commands from the other nodes.
If connection is not possible, review the network.
If the link is present, review the FortiSIEM internal rules to make sure ports are opened with the following command:
firewall-cmd --list-all
If necessary, renew the rules with commands below (replace nodeXIP with the appropriate node IP):
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="node1IP/24" port port="2379" protocol="tcp" accept’ firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="node2IP/24" port port="2379" protocol="tcp" accept' firewall-cmd --reload
systemctl restart etcd
Renew step 1 to check the health.
- Fix 2 - Recovering ETCD: Reset configuration on one HA node.
The previous action did not work and only one node has been disconnected. The ETCD service configuration can be reset to the node that has been disconnected.
- On the root node CLI that has been disconnected:
systemctl stop etcd
sleep 10
rm -rf /var/lib/etcd/*
sed -i 's/new/existing/g' /etc/etcd/etcd.conf
- On the node that remained alive:
etcdctl member remove b0e9ae9f045235bd <----- Use member ID.
etcdctl member add fsmSup410Node2_10_5_8_167 --peer-urls=http://10.5.8.167:2380 <----- Use the same name and URL as before.
- On the node that has been disconnected:
systemctl start etcd
Renew step 1 to check the health.
- Fix 3 - Recovering ETCD: Reset ETCD configuration on all the HA nodes.
Previous actions didn't fix ETCD service on all nodes, it is possible to reset the ETCD configuration on all the nodes to start with a fresh state. Reset ETCD service config on each of CLI nodes as the root user:
systemctl stop etcd rm -rf /var/lib/etcd/* vi /etc/etcd/etcd.conf <----- Make sure ETCD_INITIAL_CLUSTER_STATE='new' and all expected nodes are listed at ETCD_INITIAL_CLUSTER and save. systemctl start etcd python3.11 /opt/phoenix/confd/bin/updateConfd.py --nodes ‘http://10.5.8.166:2379,http://10.5.8.167:2379’
Renew step 1 to check the health.
|