FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
saleha
Staff & Editor
Staff & Editor
Article Id 394737
Description

 

This article describes the behavior of an HA cluster when deployed to use link-monitor as failover criteria. An example and illustration of this deployment can be found in Technical Tip: Combining Remote Link Monitoring with FGCP cluster High Availability.

 

Scope

 

FortiOS - HA Cluster - Link-Monitor.

 

Solution

 

One of the main advantages of deploying HA cluster with link-monitor / remote-monitoring is to allow the firewall admin to monitor a link via VLAN interface as the HA cluster regular interface monitoring does not allow monitoring a VLAN interface.

 

By default, the command 'set pingserver-secondary-force-reset' is enabled once the 'pingserver-monitor-interface' option has a selected interface. For example:

 

config system ha

    .....

    set pingserver-monitor-interface "port1"
    set pingserver-secondary-force-reset enable

    .....

end

 

This command's behavior depends on HA cluster failover election method: Uptime or Priority. When the option 'override' is enabled on any member of the cluster, FortiOS will look at which member of the cluster has a higher priority value first. When override is disabled, uptime is the first election method.

 

For more details about the HA cluster failover election method, see Technical Tip: FortiGate HA Primary unit selection process when override is disabled vs enabled.

 

The following is an illustration in both scenarios of the HA cluster primary election method when link-monitor is down on the primary. For time-conservation purposes, the 'pingserver-flip-timeout' value is set to the minimum, which is 6 minutes.

 

First Scenario: Override is disabled.

 

In this scenario, the primary member of the cluster is elected by the longest uptime as illustrated on the output of the command 'get system ha status' below:

 

uptime.png

 

The following is the link-monitor and ha config:


config system link-monitor
    edit "ha-monitor"
        set srcintf "port2"
        set server "8.8.8.8"
        set failtime 2
        set ha-priority 5
        set update-cascade-interface disable
        set update-policy-route disable
    next
end

config system ha
    set group-id 103
    set group-name "remote-mon"
    set mode a-p
    set password ENC pblkR52GmqtF+jcUMjpP/cbYKVdH4H7AXFYfwVn0rKln2/NlvVpxqDEFxa+   

    +M0cmqIxwQn2Qz+RrJWZIOJGISP0wmtV5S0pkcZktee9IKFWW7uBpBll36t7ETBlj5ulSjM5lH1DkTqI7Y3fMKy

    E3/CVyCWsdMcaBYvtwWTeKgoQkIGV0Hiq+NjGNk7F6Vqu4kgd8U1lmMjY3dkVA

    set hbdev "port5" 50

    set session-pickup enable

    set session-pickup-connectionless enable

    set ha-mgmt-status enable

        config ha-mgmt-interfaces
            edit 1
                set interface "port3"
                set gateway 10.9.31.254
            next
        end
    set override disable
    set priority 130
    set monitor "port2"
    set pingserver-monitor-interface "port2"

    set pingserver-secondary-force-reset enable
    set pingserver-failover-threshold 4
    set pingserver-flip-timeout 6
end

 

The next step is to force a failover when the link-monitor for port2 is down:

 

uptime_first.png

 

After link-monitor forces a failover, the flip timer starts counting down:

 

diagnose sys ha dump-by group
HA information.
group-id=103, group-name='remote-mon'
has_no_aes128_gcm_sha256_member=0

gmember_nr=2
'FGVM02TM23001313': ha_ip_idx=1, hb_packet_version=4, last_hb_jiffies=9225072, linkfails=1, weight/o=0/0, support_aes128_gcm_sha256=1
hbdev_nr=1: port5(mac=0043..05, last_hb_jiffies=9225072, hb_lost=0),
'FGVM02TM23001318': ha_ip_idx=0, hb_packet_version=38, last_hb_jiffies=0, linkfails=0, weight/o=0/0, support_aes128_gcm_sha256=1

vcluster_nr=1
vcluster-1: start_time=1748958892(2025-06-04 01:54:52), state/o/chg_time=3(standby)/2(work)/1748958892(2025-06-04 01:54:52)
pingsvr_flip_timeout/expire=360s/263s <-----
mondev: port2(prio=50,is_aggr=0,status=1)
'FGVM02TM23001313': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=792/1
'FGVM02TM23001318': ha_prio/o=1/1, link_failure=0, pingsvr_failure=5, flag=0x00000000, mem_failover=0, uptime/reset_cnt=0/4

 

After the flip time expires, there are no further failovers unless the current primary link-monitor goes down, even if the current primary uptime is lower than the secondary member of the cluster:

 

uptime_second.png

 

Second Scenario: Override is enabled.

 

This is the case where the primary election method is highest priority:

 

config system ha
    .....
    set override enable
    .....
end

 

Status of the cluster before any failover by link-monitor:

 

priority.png

 

When link-monitor for port2 is down, the cluster performs failover to the secondary as expected with lower priority and starts the flip timer:

 

priority_first.png

 

However, in this case, once the flip timer is down to 0, the cluster will perform another failover back to the primary with the highest priority as illustrated by the command output and following image:

diagnose sys ha dump-by group
HA information.
group-id=103, group-name='remote-mon'
has_no_aes128_gcm_sha256_member=0

gmember_nr=2
'FGVM02TM23001313': ha_ip_idx=1, hb_packet_version=10, last_hb_jiffies=17700913, linkfails=1, weight/o=0/0, support_aes128_gcm_sha256=1
hbdev_nr=1: port5(mac=0043..05, last_hb_jiffies=17700913, hb_lost=0),
'FGVM02TM23001318': ha_ip_idx=0, hb_packet_version=90, last_hb_jiffies=0, linkfails=0, weight/o=0/0, support_aes128_gcm_sha256=1

vcluster_nr=1
vcluster-1: start_time=1748976732(2025-06-04 06:52:12), state/o/chg_time=3(standby)/2(work)/1749043349(2025-06-05 01:22:29)
pingsvr_flip_timeout/expire=360s/0s <-----
'FGVM02TM23001313': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=0/2
'FGVM02TM23001318': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=66253/7

 

priority2.png

 

This is why, in cases where override is disabled, it is critically important to make sure the 'pingserver-flip-timeout' value is fine-tuned. If this flip timer times out and the secondary device with higher priority still had an issue connecting to the ISP, the failover will still occur and the flip timer will restart while the HA cluster will not automatically failover to the secondary until that timer runs down to 0. This causes a flip-flap effect until the member with the higher priority has restored its connection to the ISP.