FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
dbabic
Staff
Staff
Article Id 196067

Description

 

This article describes the methods used to force the synchronization on the cluster before proceeding to rebuild the HA (as last resort).


Scope

 

High Availability synchronization.


Solution

 

For this procedure, it is recommended to have access to all units through SSH (i.e.. Putty).
 
Note: It is possible to connect to the other units with 'exec ha manage X <username>' where X is the member ID (Available IDs can be found by using 'exec ha manage ?').

To check the FortiGate HA status in CLI:
 
# get sys ha status
# diagnose sys ha checksum cluster 
 
All cluster members need to have the same checksum values (compare the last digits of ‘all’ checksum).

Further, check which part of the checksum is not matching, as described here.
If the checksums are not matching, perform the following steps, logging ALL the output, in case it is needed to later open a Technical Support case with Fortinet:

1) Simple recalculation of checksums might help.
On the Primary unit:
 
# diagnose sys ha checksum recalculate (then check again if synchronized).
 
On Backup units:
 
# diagnose sys ha checksum recalculate  (then check again if synchronized).
 
2) Restart the synchronization process and monitor if there is an error in the debug (check both units at the same time).
 
Note: The user may be logged out of the backup units during this process – this is a good sign (explained here).

On the Primary unit:
# execute ha synchronize stop
# diag debug reset
# diag debug enable
# diag debug console timestamp enable
# diag debug application hasync -1
# diag debug application hatalk -1
# execute ha synchronize start

On Backup units:
 
# diag debug reset
# diag debug enable
# execute ha synchronize stop
# diag debug console timestamp enable
# diag debug application hasync -1
# diag debug application hatalk -1
# execute ha synchronize start
 
It is possible to check if the checksums are matching during this debug output
Disable debugging once the Backup units are in sync with the Primary unit, or after the capturing of logs is completed (5-6min):
 
# diag debug disable
# diag debug reset
 
3) Manual synchronization.

In certain specific scenarios, the cluster fails to synchronize due to some elements in the configuration.
To avoid rebuilding the cluster, compare the configurations and perform the changes manually.

a) Obtain the configurations from both units clearly marked as Primary and Secondary/Backup.
Make sure the console output is standard (no '---More---' text appears*), log the ssh output, and issue the command 'show' in both units**.
Note*: To remove paginated display: 
 
# config system console 
    set output standard
end

Note**: Do NOT issue 'show full-configuration' unless absolutely necessary.

b) Use any comparison tool available to check the two files side-to-side (i.e. Notepad++ with the 'Compare' plugin).
 
c) Certain fields can be ignored (hostname, SN, interface dedicated to management if configured, password hashes, certificates, HA priorities and override settings, and disk labels).
 
d) Perform configuration changes in CLI on Backup units to reflect the config of the Primary; if errors occur and they are explanatory, act accordingly. If it is not explanatory and the config can not be changed (added/deleted), make sure these errors are logged and presented in a TAC case.

After all the changes outlined in the comparison are corrected, check for cluster status once again.


4) Restart the ha daemons / restart the units, one by one.
 
Note: This step requires a maintenance window and might need physical access to both units, as it can affect the traffic.

In case there is no output generated in hasync debug or hatalk debug, a restart of these daemons may be needed. This can be done by running the following commands on each unit at a time:
 
# diag sys top   <- Note: the process ID of hasync and hatalk.
 
or
 
# diag sys top-summary | grep hasync
# diag sys top-summary | grep hatalk
# diag sys kill 11 <pid#>    <- repeat for both noted processes.
 
After these commands, the daemons normally restart with different numbers (check by # diag sys top).

Since FortiOS 6.2 there is an easier way to determine the process ID (in case, it will not show up in the 'diag sys top' command):
 
# diag sys process pidof hasync
# diag sys process pidof hatalk
# diag sys kill 11 <pid#>             <- repeat for both noted processes.
 
After these commands, the daemons normally restart with different numbers (check by # diag sys process pidof).

 

In certain conditions, this does not solve the problem, or the daemons fail to restart.
Be prepared for this situation, as a hard reboot may be necessary (either exec reboot from the console or plug/unplug the power supply).
After reboot, check the disk status for both units (if diskscan is needed, perform it before anything else), then check the cluster status (checksums) once again.

5) If all the above methods fail, a cluster rebuild may be needed.

 
Note 1: Primary and Secondary with different disk statuses.

If the Primary and Secondary units have different disk statuses, the cluster would fail. 
The following error could be seen on the console of the Secondary:

'Slave and master have different hdisk status. Cannot work with HA master. Shutdown the box!'

The output of the following commands needs to be collected from both cluster members:
 
# get sys status
# exec disk list
 
If one of the cluster members shows log disk status as 'Need format' or 'Not Available', the unit needs to be disconnected from the cluster and a disk format needs to be performed.
This requires a reboot. It can be done by executing the following command:
 
# execute formatlogdisk    <- a confirmation for reboot follows.
 
If the problem persists, open a ticket with Technical Support with the output of the following commands from both units in the cluster:
 
# get sys status
# exec disk list
 
Note 2: Secondary unit not seen in the cluster.
 
When checking the checksums, the second unit may be missing or with incomplete output as follows:
 
#FortiVM1# diag sys ha checksum cluster
================== FGVMXXXXXXXXXX1 ==================
is_manage_master()=1, is_root_master()=1
debugzone
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 60

checksum
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 60

================== FGVMXXXXXXXXXX2 ==================

FortiVM1#


This happens in the situation the hasync can not communicate properly with the other unit.
What can be done:
-    make sure the units are running the same firmware #get system status.
-    reboot both units one at a time, starting with the Secondary.

 

Related Articles:

Technical Note: How to create a log file of a session using PuTTY

https://community.fortinet.com/t5/FortiGate/Technical-Tip-Rebuilding-an-HA-cluster/ta-p/195429