This article describes how to troubleshoot if the blades on the chassis are stuck at 'Waiting for data heartbeat' status in a HA cluster.
FortiGate 6000/7000 series , v5.6, v6.0,v6.2,v6.4.
From ' diag load-balance status', The blades on the chassis are stuck at 'Waiting for data heartbeat' status.
- Check the output of ' diag test application chlbd 1' on both the chassis.
- The output of this command shows the ELBC master and the last received update counter.
- The 'last_rx of update msg ' counter should not have a high difference in value between Master and Slave chassis.
- Check the output of ' diagnose sys fortiswitch-heartbeat status' on the faulty chassis.
- The output of this command shows the HB-Tx flag, the status of the blade etc.
1) If Step 1 and Step 2’s output looks the same and still the blade is stuck at 'Waiting for data heartbeat' status', try the following:
a) Power off/on the slot which is stuck in the 'Waiting for data heartbeat' status.
b) If the above step does not fix the problem, Power Cycle the whole chassis.
c) If the above 2 steps fail and if the cluster is in HA, Break the HA cluster by removing physical connectivity in the order of data ports -> mgmt ports -> HA ports and make it standalone and check if the blades come up.
If the blades come up, most likely there is a communication issue between the HA ports probably the intermediate switch might be blocking some packets. In order to troubleshoot further, check connectivity via sniffer and ping.
i) To find the IP addresses, use the command:
# diagnose ip address list | grep "SN\|10.101.11"
ii) Example: Ping FPC02 elbc-base-ctrl channel IP address from MBD.
# config vdom
diagnose ip address list | grep "SN\|10.101.11"
execute enter vsys_ha
execute ping 10.101.11.4
iii) Sniffer –Run below sniffer and capture packets on the switch side that is connected to the faulty chassis simultaneously and check if can see 2-way UDP broadcast (703) packets.
# config vdom
diag sniffer option view-option
diag sniffer options filter-out-internal-pkts disable
diag sniffer options slot current
diag sniffer packet any 'port 703 and proto 17' 6 20
2) If the blades do not come up after making standalone, collect the output of the following commands and contact TAC.
Example: If FPC02 is stuck in the 'Waiting for data heartbeat' status. Collect the commands simultaneously on FIM/MBD and FPC/FPM.
1) On the MBD/FIM, in 'Global'
# diagnose load-balance set slot current
# diagnose sys bcm_intf cli "0:"
# diagnose sniffer options slot current
# diagnose sniffer packet f-slot4 "ether proto 0x8895" 6 20 (the "f-slot4" is for troubleshooting with FPC02)
# fnsysctl ifconfig f-slot4 (repeat 5 times, interval ~10 seconds)
3) On FPC02, in 'Global'
# diagnose sys fortiswitch-heartbeat status
# fnsysctl cat /proc/net/np6_0/int-link
4) On FPC02, in 'mgmt-vdom'
# diagnose sniffer options filter-out-internal-pkts disable