Created on
‎04-08-2011
06:45 AM
Edited on
‎03-11-2025
01:38 AM
By
Jean-Philippe_P
Description
Scope
FortiGate.
Solution
In a normal situation, the cluster's Primary is the unit with the highest priority, so the Primary is always the same unit which makes it easier to identify.
- If the Primary fails and recovers, it triggers a double fail-over: The first one is normal because the other unit takes over. The second one, because it comes later up and takes priority because of the override enable. If this is something best to avoid, it is recommended to configure ha with 'set ha override disable'.
- If the cluster is set up to monitor a certain link and that link is flapping only on one node, but stable on the other, then the failover will happen repeatedly, possibly cutting the network access entirely.
- Is the preferred Primary.
- Has a priority of 200
- Is configured with ha override disabled.
- Is the preferred Secondary.
- Has a priority of 100.
- Is configured with ha override disabled.
- t= 0 s : A and B are just booted.
- ha uptime difference is less than 5 minutes. As a consequence, the HA uptime difference is ignored in the Primary election process.
- A is promoted to Primary because its priority is higher than B (200>100).
- t=1 mn: A is rebooted.
- A leaves the cluster but re-joins it as Primary after 2 minutes.
This is expected because the HA uptime difference between A and B is less than 5 minutes. - AS a result, the HA aging condition is ignored in the election algorithm (and A's priority trumps B's priority).
- A leaves the cluster but re-joins it as Primary after 2 minutes.
- t= 15 mn: A is again rebooted.
- This time A rejoins the cluster as Secondary.
Because HA uptime difference between A and B is greater than 5 minutes.
- This time A rejoins the cluster as Secondary.
- The status is now: B=Primary, A=Secondary.
- t= later... in a maintenance window.
- The administrator wishes to have its preferred Primary A back as the cluster Primary.
- The administrator connects to B (current Primary ) CLI and issues the following command:
diag sys ha reset-uptime
- This resets B's internal HA uptime making A the oldest one.
- A is promoted Primary.
- B is degraded to Secondary.
How to check the difference between members:
diagnose sys ha dump-by group
'FGVM16TM24000014': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=407/0 <- '407' is a difference measured in seconds.
'FGVM16TM24000037': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=0/2 <----- '0' is for the device with the lowest HA uptime and '2' is the number of times HA uptime has been reset for this device.
The above shows how to identify the HA uptime difference between members. The member with 0 in the uptime column indicates the device with the lowest uptime. The example shows that the device with the serial number ending in 14 has an HA uptime that is 407 higher than that of the other device in the HA cluster. The reset_cnt column indicates the number of times the HA uptime has been reset for that device.
To confirm the ha override setting:
sh system ha | grep override
set override enable
Related articles:
Technical Tip: How to use failover flag to change Active unit
Technical Tip: Different options to trigger an HA failover (FGCP)