FortiSIEM
FortiSIEM provides Security Information and Event Management (SIEM) and User and Entity Behavior Analytics (UEBA)
mbenvenuti
Staff
Staff
Article Id 422881
Description This article shows how to troubleshoot the ETCD service.
Scope FortiSIEM 7.4.1+.
Solution

When HA is configured on the FortiSIEM, the ETCD service is a Distributed Key-value Store service that important in database replication management. If this service has failures, FortiSIEM is not available.

 

Follow these steps to evaluate the health and fix the etcd service:

 

  1. Check the health on each nodes of the HA cluster

 

The expected output:

 

systemctl status etcd
● etcd.service - Etcd Server
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2025-11-25 09:21:03 CET; 2 weeks 3 days ago
Main PID: 2011 (etcd)
Tasks: 20 (limit: 151697)
Memory: 249.1M
CGroup: /system.slice/etcd.service
└─2011 /usr/bin/etcd

 

etcdctl member list

b0e9ae9f045235bd, started, fsmSup410Node2_10_5_8_167, http://10.5.8.167:2380, http://10.5.8.167:2379, false
cff64fd3c11ba5af, started, fsmSup410Node1_10_5_8_166, http://10.5.8.166:2380, http://10.5.8.166:2379, false

 

etcdctl endpoint --cluster health
http://10.5.8.166:2379 is healthy: successfully committed proposal: took = 1.884484ms
http://10.5.8.167:2379 is healthy: successfully committed proposal: took = 2.575399ms

 

etcdctl endpoint status --cluster -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://10.5.8.167:2379 | b0e9ae9f045235bd | 3.5.13 | 162 MB | true | false | 4 | 1463537 | 1463537 | |
| http://10.5.8.166:2379 | cff64fd3c11ba5af | 3.5.13 | 162 MB | false | false | 4 | 1463537 | 1463537 | |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

 

Other wrong outputs:

 

etcdctl endpoint status --cluster -w table

+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://10.5.8.x:2379 | b0e9ae9f045235bd | 3.5.13 | 2.1 GB | true | false | 2 | 441985 | 441985 | memberID:xxxx NOSPACE|
| http://10.5.8.x:2379 | cff64fd3c11ba5af | 3.5.13 | 2.1 GB | false | false | 2 | 441985 | 441985 | memberID:xxxx NOSPACE|
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

 

See Troubleshooting Tip: etcd NOSPCE error.

 

etcdctl member list
{"level":"warn","ts":"2025-11-13T15:48:24.139794+0100","logger":"etcdclient","caller":"v3@v3.5.13/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcdendpoints://0xc000360000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context
deadline exceeded"}
Error: context deadline exceeded

 

Follow the next steps:

 

  1. Fix 1 - Recovering ETCD: Check network port and restart.

 

Make sure that the network is available and a port opened between the nodes:

 

From the node 1 CLI:

 

telnet nodeX 2379

telnet nodeX 2380

 

Renew the commands from the other nodes.

If connection is not possible, review the network.

If the link is present, review the FortiSIEM internal rules to make sure ports are opened with the following command:

 

firewall-cmd --list-all

 

If necessary, renew the rules with commands below (replace nodeXIP with the appropriate node IP):

 

firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="node1IP/24" port port="2379"
protocol="tcp" accept’
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="node2IP/24" port port="2379"
protocol="tcp" accept'
firewall-cmd --reload

 

systemctl restart etcd

 

Renew step 1 to check the health.

 

  1. Fix 2 - Recovering ETCD: Reset configuration on one HA node.

 

The previous action did not work and only one node has been disconnected. The ETCD service configuration can be reset to the node that has been disconnected.

  • On the root node CLI that has been disconnected:

 

systemctl stop etcd

sleep 10

rm -rf /var/lib/etcd/*

sed -i 's/new/existing/g' /etc/etcd/etcd.conf

 

  • On the node that remained alive:

 

etcdctl member remove b0e9ae9f045235bd <----- Use member ID.

etcdctl member add fsmSup410Node2_10_5_8_167 --peer-urls=http://10.5.8.167:2380 <----- Use the same name and URL as before.

 

  • On the node that has been disconnected:

 

systemctl start etcd

 

Renew step 1 to check the health.

 

  1. Fix 3 - Recovering ETCD: Reset ETCD configuration on all the HA nodes.

 

Previous actions didn't fix ETCD service on all nodes, it is possible to reset the ETCD configuration on all the nodes to start with a fresh state. Reset ETCD service config on each of CLI nodes as the root user:


systemctl stop etcd
rm -rf /var/lib/etcd/*
vi /etc/etcd/etcd.conf <----- Make sure ETCD_INITIAL_CLUSTER_STATE='new' and all expected nodes are listed at ETCD_INITIAL_CLUSTER and save.
systemctl start etcd
python3.11 /opt/phoenix/confd/bin/updateConfd.py --nodes ‘http://10.5.8.166:2379,http://10.5.8.167:2379’

 

Renew step 1 to check the health.

Contributors