FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
vprabhu_FTNT
Staff
Staff
Article Id 193422

Description


This article describes how to troubleshoot HA synchronization issues when a cluster is out of sync.

 

Scope

 

FortiGate.

Solution


For a multi-vdom FortiGate, the following commands are used in 'config global' mode.


get system ha status              <- Shows detailed HA information and the cluster failover reason.

get sys ha status

HA Health Status: OK
Model: FortiGate-VM64-KVM
Mode: HA A-P
Group: 9
Debug: 0
Cluster Uptime: 14 days 5:9:44
Cluster state change time: 2019-06-13 14:21:15

 

The master is selected using the following:

 

<date:02> FGVMXXXXXXXXXX44 is selected as the master because it has the largest value of uptime. <- This is the reason for last failover.
<date:01> FGVM
XXXXXXXXXX46 is selected as the master because it has the largest value of uptime.
<date:00> FGVM
XXXXXXXXXX44 is selected as the master because it has the largest value of override priority.
ses_pickup: enable, ses_pickup_delay=disable
override: disable

 

Configuration Status:

 

FGVMXXXXXXXXXX44(updated 3 seconds ago): in-sync
FGVM
XXXXXXXXXX46(updated 4 seconds ago): in-sync

 

System Usage stats:

 

FGVMXXXXXXXXXX44(updated 3 seconds ago):
sessions=42, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=64%

FGVM
XXXXXXXXXX46(updated 4 seconds ago):
sessions=5, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=54%

 

HBDEV stats:

 

FGVMXXXXXXXXXX44(updated 3 seconds ago):
port8: physical/10000full, up, rx-bytes/packets/dropped/errors=2233369747/7606667/0/0, tx=3377368072/8036284/0/0

FGVM
XXXXXXXXXX46(updated 4 seconds ago):
port8: physical/10000full, up, rx-bytes/packets/dropped/errors=3377712830/8038866/0/0, tx=2233022661/7604078/0/0

 

MONDEV stats:

 

FGVMXXXXXXXXXX44(updated 3 seconds ago):
port1: physical/10000full, up, rx-bytes/packets/dropped/errors=1140991879/3582047/0/0, tx=319625288/2631960/0/0

FGVM
XXXXXXXXXX46(updated 4 seconds ago):
port1: physical/10000full, up, rx-bytes/packets/dropped/errors=99183156/1638504/0/0, tx=266853/1225/0/0

Master: Prim-FW         , FGVMXXXXXXXXXX44, cluster index = 1
Slave : Bkup-Fw         , FGVM
XXXXXXXXXX46, cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Master: FGVM
XXXXXXXXXX44, operating cluster index = 0
Slave : FGVM
XXXXXXXXXX46, operating cluster index = 1

 

diag sys ha checksum cluster  <- Shows the checksums for each cluster unit and the VDOM in order to determine where there is a difference.

================== FGVM
XXXXXXXXXX44 ==================
is_manage_master()=1, is_root_master()=1
debugzone
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
Cust-A: 84 af 8f 23 b5 31 ca 32 c1 0b f2 76 d2 57 d1 aa
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 g5

checksum
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
Cust-A: 84 af 8f 23 b5 31 ca 32 c1 0b f2 76 d2 57 d1 aa
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 g5

================== FGVM
XXXXXXXXXX46 ==================
is_manage_master()=0, is_root_master()=0
debugzone
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
Cust-A: 84 af 8f 23 b5 31 ca 32 c1 0b f2 76 d2 57 d1 bc
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 60

checksum
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
Cust-A: 84 af 8f 23 b5 31 ca 32 c1 0b f2 76 d2 57 d1 bc
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 60


Further on, the commands must be collected on both firewalls in order to compare the output.

Collecting this only on a single firewall is not relevant. (See How to access the second firewall.)

Check the checksum mismatch in the above output, and then look for the cluster checksum and compare the output for mismatch.

As is visible above, the 'global' and 'root' contexts are synchronized.

The problem is not here. However,  the checksum for VDOM 'Cust-A' is different: this needs to be checked.

When one single checksum is different, the 'all' checksum will be different.

 

Another option is to check the difference directly from GUI, as below:

Out_of_sync.png

 

Through above information, is possible to know directly where might be the difference in configuration. So, in this case should be compared the configurations between both FortiGate Firewalls under:

config system global

config system interface

config system ha

config system console

 

Issue these commands for a more granular view of mismatched VDOMs:

 

diag sys ha checksum show <vdom_name>
diag sys ha checksum show <global>

 

For the above example, the only relevant output will come from the following:

 

diag sys ha checksum show Cust-A

 

Once the object that is not matching is determined on both cluster units,  run the following command, replacing <object_name> with the actual object name:

 

diag sys ha checksum show Cust-A <object_name>

 

This will show where in the object the differences are and look at that specific place in the config for differences.

 

Use the grep option as well to only display checksums for parts of the configuration.

For example, to display system related configuration checksums in the root VDOM or log-related checksums in the global configuration:

 

diagnose sys ha checksum show root | grep system
diagnose sys ha checksum show global | grep log

 

Remember: repeat the above commands on all devices to compare the mismatch, then check the corresponding area in the configuration file.

If no mismatch is found, a simple re-calculation of the checksums can fix the out-of-sync problem.

The re-calculated checksums should match and the out-of-sync error messages should stop appearing.

The following command is to re-calculate all HA checksums (run on both units):

 

diagnose sys ha checksum recalculate

 

Or, more specific:

 

diagnose sys ha checksum recalculate [<your_vdom_name> | global]

 

Entering the command without options recalculates all checksums. A VDOM name can be specified to just recalculate the checksums for that VDOM. Enter 'global' to recalculate the global checksum. It should match on all devices in the cluster.

Run the following commands to debug HA synchronization:

 

diag debug app hasync 255
diag debug enable
execute ha synchronize start

 

diagnose debug application hatalk -1   <- To check the Heartbeat communication between HA devices.

 

Run the following commands to check mismatches instantly:

 

diag debug config-error-log read               <- (1)
diag hardware device disk                     
<-
(2)
show sys storage                              
<-
(3)
show wanopt storage                           
<-
(4)

 

(1): Check the output to identify issues with configuration lines that were not accepted. Try to manually configure the device configuration item listed.
(2):
Check the device disk on both devices as the size and availability should match.
(3):
Check the size of the storage disk as it should match on both devices.

(4): Check the size of wanopt disk as the size should match.

 

If the cluster is still not in sync, isolate the Secondary FortiGate from the cluster. This process will require physical access to the FortiGates.

 

Important: before starting this process, take a backup of the FortiGate configuration.

 

First, disconnect all network cables from the secondary unit except for the heartbeat cables.

After that, disconnect the heartbeat cable. This will disconnect the secondary FortiGate from the network.

After, connect to the secondary FortiGate and perform a factory reset:

 

execute factoryreset

 

See Technical Tip: How to reset a FortiGate with the default factory settings/without losing management ... for detailed instructions.

 

After the FortiGate comes back online, login again and configure the HA settings. Make sure to keep the priority low for the secondary FortiGate in HA settings. 

After that is configured, connect the HA cable to the heartbeat interface of the secondary. Do not connect any other cables at this time.

The secondary FortiGate should show up in the HA. If the secondary FortiGate does not show up in HA settings, do not proceed to the next step.

 

The secondary FortiGate should have joined the secondary role. After observing that this has happened, connect all of the other network cables to the secondary FortiGate as per the previous setup.

Afterwards, check the status of the configuration sync: it should be in sync.