FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
rmetzger
Staff
Staff
Article Id 196075

Description

 
This article describes the case when a FortiGate is running in HA mode, and the following HA log message examples may appear:

 

2009-02-16 11:06:34 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=critical vd=root msg="HA secondary heartbeat interface internal lost neighbor information"

or

2009-02-16 11:06:40 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=notice vd=root msg="Virtual cluster 1 of group 0 detected new joined HA member"

or

2009-02-16 11:06:40 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=notice vd=root msg="HA primary heartbeat interface internal get peer information"


Scope


FortiGate is running in HA mode.


Solution

 

These log messages mean that the FortiGate devices in an HA cluster did not see each other anymore for the period of time that is given by (hb-interval x hb-lost-threshold), which is 1.2s with the default values.:
  • Step 1 : Look on all devices in the cluster  for any errors on the heartbeat port(s) that would reflect some physical issues and prevent heartbeat packets from being sent or received; use the following command :
 
FGT# diagnose hardware dev nic <Heartbeat port Name>

  • Step 2: Should the problem come from a peak of traffic at certain times, increase the tolerance for HA by setting the following parameters: 'set hb-lost-threshold 12' and 'set hb-interval 4'; this will multiply by 4 the loss detection interval, but can still be increased if needed.
  • Step 3: Eventually, as a next step, disable session-pickup to release some load on the heartbeat interface.

In this situation, monitoring the CPU can be relevant and is an option that is possible from the Log Event config: 'CPU & memory usage', even though the 5-minute polling interval may not allow for seeing a peak, or that is also possible by polling the appropriate CPU usage MIB.
 
Check if there have been any changes in the network or an increase in traffic recently that could lead to this.
The question can be if this problem happens frequently, or if it always happens at the same time of day.
 
To monitor the CPU of the devices in the cluster and troubleshoot further, use the following procedure and commands:
 
get sys perf status
 
And:
 
diagnose sys top 2
 
These commands repeated at frequent intervals will show the activity of the CPU and the number of sessions.
 
See also the related article: 'Troubleshooting Tip: Simple steps to monitor CPU and Memory on a FortiGate'.

 

If the problem is recurring, gather the following information (a console connection might be necessary if connectivity is lost) and provide it to Technical Support when opening a ticket:

Debug log: from the GUI, download the GUI: System -> Maintenance -> Advanced -> Debug log.

 

CLI commands output:

 

diagnose sys top 2  <--  Keep it running for 20s.
get sys perf status <-- Repeat this command multiple times to get good samples.
get sys ha status
diagnose sys ha status
diagnose sys ha dump all
diagnose sys ha dump 2
diagnose sys ha dump 3
diagnose netlink dev list

diagnose hardware dev nic <Heartbeat port Name>
execute log filter category event
execute log display

 

 

Related articles:

Troubleshooting Tip : Monitor CPU and Memory on a FortiGate

Troubleshooting Note : FortiGate HA synchronization messages and cluster verification steps