Troubleshooting Tip: A possible root cause for instability in HA Cluster triggering repeated failovers

FortiArt · ‎04-03-2025

Description

This article presents a possible root cause for instability in an HA cluster configured with monitored interface(s) triggering repeated failovers.

Scope

FortiGate.

Solution

Introduction:

When a monitored interface in an HA cluster goes down, it triggers a failover for the cluster members. When the monitored interface experiences a flapping up/down behavior, this in turn will trigger frequent failovers among the cluster members, causing instability. This in turn may affect the system resources such a memory, CPU, etc., especially if the session-pickup setting is enabled.

Scenario:

Here, it is assumed that the FortiGate is configured as follows. System link-monitor configured to use wan1 to ping an external server, for example, 8.8.8.8.

config system link-monitor
    edit "wan1-ping-server"
        set srcintf "wan1"
        set server "8.8.8.8"
        set update-cascade-interface enable
        set update-static-route enable
    next
end

Upon checking the system link-monitor (diagnose system link-monitor status), it is observed that the status is flapping between alive/dead. This indicates there is a reachability problem, which may be due to an ISP issue or any intermediate router routing issues in the path to the destination.

The system HA cluster is configured as per the following (port1 is the monitored interface):

config system ha

set group-name "FGT-HA"
set mode a-p
set monitor "port1"

end

It is necessary to relate the flapping behavior of the system link-monitor interface, wan1, with the repeated failovers in the system HA cluster.

Root Cause:

It is necessary to check the configuration of the system interface settings for the source interface in the system link-monitor, i.e., wan1. Confirm if the fail-detect setting is enabled and which system interface it's connected to using the fail-alert-interfaces setting. As it is evident from the following configuration, it was noticed that the system link-monitor is the source of the problem, as it triggers the flapping behavior on the monitored interface under the system HA cluster configuration:

config system interface

edit "wan1"

set ip 192.168.1.254 255.255.255.0

set fail-detect enable

set fail-detect-option detectserver link-down

set fail-alert-method link-down

set fail-alert-interfaces "port1"

next
end

Additionally, the HA monitored interface status can be verified from the 'get system ha status' command by checking the 'mondev' status in the output.

Technical Tip: 'get system ha status' showing warning with 'mondev down' message

Note: There may be other causes that trigger the flapping behavior for the system HA cluster units. This article shows only one possible root cause.

Related articles:

Technical Tip: Configuring HA Monitored Interfaces for Failover

Technical Tip: Possible cause of HA monitor interface not triggering HA failover

Technical Tip: Troubleshooting unexpected High Availability (HA) failover

Troubleshooting Tip: A possible root cause for instability in HA Cluster triggering repeated failovers

You are leaving our website