FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
ssudhakar
Staff
Staff
Article Id 212934
Description

This article describes how to troubleshoot if the blades on the chassis are stuck at 'Waiting for data heartbeat' status in a HA cluster.

Scope

FortiGate 6000/7000 series , v5.6, v6.0,v6.2,v6.4.

Solution

From diag load-balance status'The blades on the chassis are stuck at 'Waiting for data heartbeat' status.

 

lb-both.png

 

Troubleshooting:

 

Step 1:

- Check the output of diag test application chlbd 1' on both the chassis.

-  The output of this command shows the ELBC master and the last received update counter.

- The 'last_rx of update msg ' counter should not have a high difference in value between Master and Slave chassis.

 

chlbd.PNG

 

Step 2:

- Check the output of  ' diagnose sys fortiswitch-heartbeat status' on the faulty chassis.

- The output of this command shows the HB-Tx flag, the status of the blade etc.

 

fortiswitch-1.png

 

Step 3:

1) If Step 1 and Step 2’s output looks the same and still the blade is stuck at 'Waiting for data heartbeat' status', try the following:

 

a) Power off/on the slot which is stuck in the 'Waiting for data heartbeat' status.

 

b) If the above step does not fix the problem, Power Cycle the whole chassis.

 

c) If the above 2 steps fail and if the cluster is in HA, Break the HA cluster by removing physical connectivity in the order of data ports -> mgmt ports -> HA ports and make it standalone and check if the blades come up.

 

If the blades come up, most likely there is a communication issue between the HA ports probably the intermediate switch might be blocking some packets. In order to troubleshoot further, check connectivity via sniffer and ping.

 

        i) To find the IP addresses, use the command:

 

# diagnose ip address list | grep "SN\|10.101.11"

 

         ii) Example: Ping FPC02 elbc-base-ctrl channel IP address from MBD.

 

# config vdom

edit mgmt-vdom

diagnose ip address list | grep "SN\|10.101.11"

execute enter vsys_ha

execute ping 10.101.11.4

 

vsys_ha-1.png

               

iii)  Sniffer –Run below sniffer and capture packets on the switch side that is connected to the faulty chassis simultaneously and check if can see 2-way UDP broadcast (703) packets.

 

# config vdom

edit mgmt-vdom

diag sniffer option view-option

diag sniffer options filter-out-internal-pkts disable

diag sniffer options slot current

diag sniffer packet any 'port 703 and proto 17' 6 20

 

2) If the blades do not come up after making standalone, collect the output of the following commands and contact TAC.

 

Example: If FPC02 is stuck in the 'Waiting for data heartbeat' status. Collect the commands simultaneously on FIM/MBD and FPC/FPM.

 

1)  On the MBD/FIM, in 'Global'
      ------------------------------

   # diagnose load-balance set slot current
   # diagnose sys fortiswitch-heartbeat status
   # diagnose sys fortiswitch-heartbeat config
   # diagnose test application chlbd 1

   # diagnose sys bcm_intf cli "0:"
   # diagnose sys bcm_intf cli "ps"
   # diagnose debug enable
   # diagnose debug application elbcd -1
   # diagnose test application elbcd 1
   # diagnose debug reset


2)  On the MBD/FIM, in 'mgmt-vdom'
    ------------------------------

   # diagnose sniffer options slot current
   # diagnose sniffer options filter-out-internal-pkts disable

   # diagnose sniffer packet f-slot4 "ether proto 0x8895" 6 20 (the "f-slot4" is for troubleshooting with FPC02) 

   # fnsysctl ifconfig f-slot4 (repeat 5 times, interval ~10 seconds)

 

3) On FPC02, in 'Global'
    ------------------------------

# diagnose sys fortiswitch-heartbeat status
# diagnose sys fortiswitch-heartbeat config

# diagnose test application chlbd 1

# fnsysctl cat /proc/net/np6_0/int-link
# fnsysctl cat /proc/net/np6_system/nplink
# diagnose debug enable
# diagnose debug application chlbd -1(wait 3 minutes)
# diagnose debug reset

 

4) On FPC02, in 'mgmt-vdom'
    ------------------------------

# diagnose sniffer options filter-out-internal-pkts disable
# diagnose sniffer packet elbc-ctrl/1 "ether proto 0x8895" 6 20
# fnsysctl ifconfig elbc-ctrl/1 (repeat 5 times, interval ~10 seconds)

 

https://docs.fortinet.com/document/fortigate/6.0.0/handbook/644870/ha-heartbeat

Contributors