Troubleshooting Note : Fortigate HA message "HA master heartbeat interface intf_name lost neighbor information"

rmetzger · ‎07-07-2009

Description

When a FortiGate is running in HA mode, the following HA log messages examples may appear:

2009-02-16 11:06:34 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=critical vd=root msg="HA slave heartbeat interface internal lost neighbor information"

or
2009-02-16 11:06:40 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=notice vd=root msg="Virtual cluster 1 of group 0 detected new joined HA member"
or
2009-02-16 11:06:40 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=notice vd=root msg="HA master heartbeat interface internal get peer information"

Scope
FortiGate running in HA mode.

Solution

These log messages mean that the FortiGate devices in an HA cluster did not see each other anymore for the period of time that is given by (hb-interval x hb-lost-threshold), which is 1.2s with the default values.

Step 1 : Look on all devices in the cluster for any errors on the heartbeat port(s) that would reflect some physical issues and prevent heartbeat packets to be sent or received ; use the following command :

FGT# diag hardware dev nic <Heartbeat port Name>

Step 2: should the problem come from a peak of traffic at certain times, increase the tolerance for HA by setting the following parameter: "set hb-lost-threshold 12" and " set hb-interval 4" ; this will multiply by 4 the loss detection interval, but can still be increased if needed.

Step 3 : Eventually, as a next step, disable session-pickup in order to release some load on the heartbeat interface.

In this situation, monitoring the CPU can be relevant and is an option that is possible from the Log Event config: "CPU & memory usage", even though the 5 minutes polling interval may not allow to see a peak, or that is also possible by polling the appropriate CPU usage MIB.

Check if there has been any changes in the network or an increase of traffic recently that could lead to this?

Does this problem happen frequently and does it always happen at the same period of the day?

In order to monitor the CPU of the devices in the cluster and troubleshoot further, use the following procedure and commands

# get sys perf status

and

# diag sys top 2

These commands repeated at frequent intervals will show the activity of the CPU and the number of sessions.

See also the related article : "Troubleshooting Tip : Simple steps to monitor CPU and Memory on a FortiGate".

If the problem is reoccurring, gather the following information (a console connection might be necessary if connectivity is lost) and provide it to Technical Support when opening a ticket:

- Debug log : from the GUI downloaded the GUI : System --> Maintenance --> Advanced --> debug log

- CLI commands output:

# diag sys top 2 ; keep it running for 20s
# get sys perf status ; repeat this command multiple times to get good samples
# get sys ha status
# diag sys ha status
# diag sys ha dump all
# diag sys ha dump 2
# diag sys ha dump 3
# diag netlink dev list
# diag hardware dev nic <Heartbeat port Name>
# execute log filter category event
# execute log display

Related Articles

Troubleshooting Tip : Monitor CPU and Memory on a FortiGate

Troubleshooting Note : FortiGate HA synchronization messages and cluster verification steps

Troubleshooting Note : Fortigate HA message "HA master heartbeat interface intf_name lost neighbor information"

You are leaving our website