Description
Scope
Solution
To check the FortiGate HA status in the CLI:
diagnose sys ha checksum cluster
Further, check which part of the checksum is not matching, as described here.
Once it is identified the specific VDOM checksum value is different, it is possible to check in which config, the checksum is a mismatch in the specific VDOM: Troubleshooting Tip: How to troubleshoot HA synchronization issue using GUI.
1. Force the Backup unit to synchronize with the Primary unit. On the Backup unit:
execute ha synchronize start
Simple recalculation of checksums might help. On the Primary unit:
- Restart the synchronization process and monitor if there is an error in the debug (check both units simultaneously).
Note:
The user may be logged out of the backup units during this process, this is a good sign: Troubleshooting Note: FortiGate HA synchronization messages and cluster verification steps
diagnose debug reset
diagnose debug enable
diagnose debug console timestamp enable
diagnose debug application hasync -1
diagnose debug application hatalk -1
execute ha synchronize start
On Backup units:
diagnose debug enable
execute ha synchronize stop
diagnose debug console timestamp enable
diagnose debug application hasync -1
diagnose debug application hatalk -1
execute ha synchronize start
diagnose debug reset
- Manual synchronization. In certain specific scenarios, the cluster fails to synchronize due to some elements in the configuration. To avoid rebuilding the cluster, compare the configurations and perform the changes manually.
-
Obtain the configurations from both units marked as Primary and Secondary/Backup.
Make sure the console output is standard (no '---More---' text appears*), log the SSH output, and issue the command 'show' in both units**.
- Use any comparison tool available to check the two files side-to-side (i.e. Notepad++ with the 'Compare' plugin).
-
Certain fields can be ignored (hostname, SN, interface dedicated to management if configured, password hashes, certificates, HA priorities and override settings, and disk labels).
-
Perform configuration changes in CLI on Backup units to reflect the config of the Primary; if errors occur and they are explanatory, act accordingly. If it is not explanatory and the config can not be changed (added/deleted), ensure these errors are logged and presented in a TAC case.
- Restart the HA daemons / restart the units, one by one.
Note:
This step requires a maintenance window and might need physical access to both units, as it can affect the traffic.
If there is no output generated in hasync debug or hatalk debug, a restart of these daemons may be needed. This can be done by running the following commands on each unit at a time:
diagnose sys top-summary | grep hatalk <----- On v6.4 and above, this command does not exist.
diagnose sys kill 11 <pid#> <------ Repeat for both noted processes.
diagnose sys process pidof hatalk
diagnose sys kill 11 <pid#> <----- Repeat for both noted processes.
In certain conditions, this does not solve the problem, or the daemons fail to restart. Be prepared for this situation, as a hard reboot may be necessary (either exec reboot from the console or plug/unplug the power supply).
- If all the above methods fail, a cluster rebuild may be needed.
If the Primary and Secondary units have different disk statuses, the cluster will fail. The following error could be seen on the console of the Secondary:
exec disk list
exec disk list
================== FGVMXXXXXXXXXX1 ==================
is_manage_primary()=1, is_root_primary()=1
debugzone
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 60
checksum
global: c5 33 93 23 26 9f 4d 79 ed 5f 29 fa 7a 8c c9 10
root: d3 b5 fc 60 f3 f0 f0 d0 ea e4 a1 7f 1d 17 05 fc
all: 04 ae 37 7e dc 84 aa a4 42 3d db 3c a2 09 b0 60
================== FGVMXXXXXXXXXX2 ==================
FortiVM1#
What can be done:
- Make sure the units are running the same firmware via 'get system status'.
- Reboot both units one at a time, starting with the Secondary.
An additional method to recover the Synchronization of the HA:
Step 1. Obtain the Configuration File of the Primary Unit.
Step 2. Edit the file to be used in the secondary unit by making the following modifications to the text file:
config system global
set hostname XXXX -> Name of Secondary Device.
set alias "Secondary Serial Number"
Step 3. Go to config system ha -> Configuration corresponding to the Secondary equipment.
Step 4. Disconnect the cable in the LAN ports of the Secondary equipment.
Step 5. Disconnect the cable in the heartbeat interface from the Secondary device.
Step 6. Connect via GUI to the Secondary device and load the configuration file that was modified.
Step 7. Connect the cable back again in the heartbeat interface of the Secondary device.
Step 8. Connect the cable back again in the ports of the Secondary device.
Step 9. Run the following commands to check the HA status after the modification:
get hardware status
diagnose system ha checksum show
diagnose system ha checksum show global
After this modification, it should synchronize again.
Note:
In some cases accessing the Secondary FortiGate's CLI via the Primary FortiGate's CLI will show frequent disconnections when trying to check the configuration on Secondary and the HA will be still out of sync, the solution is to reboot the Secondary FortiGate but ensure to follow all the steps given above before proceeding to reboot the FortiGate.
Note:
If the previous steps do not resolve the issue, the configuration file from the primary unit may need to be downloaded and manually edited:
- Modify the HA parameters.
- Update the hostname.
- Adjust the management interface, if applicable.
Once edited, deploy the file to the secondary unit. This approach is straightforward and effective as a last resort.
Related articles:
Technical Tip: How to create a log file of a session using PuTTY
Technical Tip: Rebuilding an HA cluster
Technical Tip: Correcting-an-out-of-sync-HA-cluster