Skip to main content
hbac
Staff
Staff
April 20, 2023

Troubleshooting Tip: HA does not failover even if the Remote Link Monitor status is ‘die’ due to an EXE_FAIL_OVER flag

  • April 20, 2023
  • 0 replies
  • 8059 views
Description

This article describes an issue where an HA cluster does not perform failover where Link Monitor is configured as per this article: Technical Tip: Combining remote link monitoring with a high availability FGCP cluster

Scope FortiGate HA.
Solution

At times, the error below may be encountered:


<2022/12/22 21:16:37> FG100ETK11111111 is selected as the primary because it has EXE_FAIL_OVER flag set.

 

However, the HA won’t failover when the link-monitor status is ‘die’. The following is the diagnostic output when the link monitor is up:


diagnose sys link-monitor status
Link Monitor: ha-failover-WAN, Status: alive, Server num(1), Flags=0x1 init, Create time: Mon Mar 6 19:48:55 2023
Source interface: WAN (38)
Source IP: 176.57.x.x
Interval: 500 ms
Peer: 8.8.8.8(8.8.8.8)
Source IP(176.57.x.x)
protocol: ping, state: alive
Latency(Min/Max/Avg): 5.701/6.733/6.099 ms
Jitter(Min/Max/Avg): 0.020/0.998/0.425
Packet lost: 0.000%
Number of out-of-sequence packets: 0
Fail Times(0/5)
Packet sent: 4163, received: 4163, Sequence(sent/rcvd/exp): 4164/4164/4165

FortiGate1 # di sys ha status
[Debug_Zone HA information]
HA group member information: is_manage_primary=1.
FG100ETK11111111: Primary, serialno_prio=1, usr_priority=255, hostname=FortiGate1 <<<< FortiGate1 is the primary
FG100ETK22222222: Secondary, serialno_prio=0, usr_priority=128, hostname=FortiGate2

 

The following diagnostic output is seen when the link monitor is down:


diagnose sys link-monitor status
Link Monitor: ha-failover-WAN, Status: die, Server num(1), Flags=0x1 init, Create time: Mon Mar 6 20:28:15 2023
Source interface: WAN (38)
Source IP: 176.57.x.x
Interval: 500 ms
Peer: 8.8.8.8(8.8.8.8)
Source IP(176.57.x.x)
protocol: ping, state: die
Packet lost: 100.000%
Number of out-of-sequence packets: 0
Recovery times(0/5) Fail Times(3/5)
Packet sent: 158, received: 69, Sequence(sent/rcvd/exp): 159/70/71
FortiGate1 # di sys ha status
[Debug_Zone HA information]
HA group member information: is_manage_primary=1.
FG100ETK11111111: Primary, serialno_prio=1, usr_priority=255, hostname=FortiGate1 <<<< FortiGate1 is still the primary
FG100ETK22222222: Secondary, serialno_prio=0, usr_priority=128, hostname=FortiGate2

 

FortiGate # get sys ha status
HA Health Status:
WARNING: FG100ETK11111111 has pingsvr down;
Model: FortiGate-100E
Mode: HA A-P
Group: 0
Debug: 0
Cluster Uptime: 104 days 23:34:58
Cluster state change time: 2022-12-22 21:16:37
Primary selected using:
<2022/12/22 21:16:37> FG100ETK11111111 is selected as the primary because it has EXE_FAIL_OVER flag set.

 

This happens when the primary is failed over by using the command 'execute ha failover set 1' to force a failover, but has never been unset.

 

Unset it by running the following command:

 

execute ha failover unset 1

 

Caution:

This command may trigger an HA failover and is intended for testing purposes.