Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
zexex
New Contributor

HA out of sync since 7.2.10

since I've upgraded to 7.2.10 (7.2.9) before, my HA won't sync - since 14 days. Complained about the DNS-table and nothing helped, so I did a factoryreset on the secondary unit and configured the HA params only. Did a reboot and connected only the HA cables. This didn't help either, 30 tables out of sync (why???).

Did another factoryreset and used the same config as on the primary unit and changed only the hostname and the HA priority before that. Restored the config - HA out of sync, this time "only" rule.fmwp and firewall.internet-service-name. Rebooted many times, executed "diag sys ha checksum recalculate" on both units, nothing!

Fortinet, what have you done... Anybody else having such annoying sync-issues or and ideas how to get rid of it?

BTW, "exec update-now" on the secondary unit failed with code -6, so I connected the device to the internet (isolated) and did a "exec update-now". No errors there but still the same result...

10 REPLIES 10
salemneaz
Staff
Staff

Hi,

Would you please run this command "diag debug crashlog read" and check if the HA demon is crashing or not at the Primary Unit. When the HA goes of sync then manually syncing the configuration does.

Article Reference:

--------------------------------

https://community.fortinet.com/t5/FortiGate/Technical-Tip-Procedure-for-HA-manual-synchronization/ta...

https://community.fortinet.com/t5/FortiGate/Troubleshooting-Tip-How-to-troubleshoot-HA-synchronizati...

Salem
Shashwati
Staff
Staff

hello , please try to restart the HA sync processes using following command

 

fnsysctl killall hasync
fnsysctl killall hatalk
 
And run debug to collect HA logs on both Firewalls
 
diag debug reset
diag debug enable
execute ha synchronize stop
diag debug console timestamp enable
diag debug application hasync -1
diag debug application hatalk -1
execute ha synchronize start
 
diag debug reset
diag debug disable   [run his to stop debug]
IgorDM
New Contributor II

this is my output:

 

2024-10-16 16:25:05 <hasync:WARN> conn=0x9ddbde0 abort: rt=-1, dst=169.254.0.1, sync_type=3(fib)
2024-10-16 16:25:07 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088707
2024-10-16 16:25:08 <hasync:WARN> conn=0x9e7bfd0, peer closed the connection: dst=169.254.0.1, sync_type=18(byod)
2024-10-16 16:25:09 <hasync:WARN> conn=0x9df2150 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:09 <hasync:WARN> conn=0x9df2150 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:25:09 <hasync:WARN> conn=0x9df1d40 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:09 <hasync:WARN> conn=0x9df1d40 abort: rt=-1, dst=169.254.0.1, sync_type=6(proxy)
2024-10-16 16:25:09 <hasync:WARN> conn=0x9e456b0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:09 <hasync:WARN> conn=0x9e456b0 abort: rt=-1, dst=169.254.0.1, sync_type=6(proxy)
execute ha synchronize start
starting synchronize with HA primary...

FortiGate-61F-1 # 2024-10-16 16:25:17 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088717
2024-10-16 16:25:19 <hasync:WARN> conn=0x9e59470 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:19 <hasync:WARN> conn=0x9e59470 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:25:19 <hasync:WARN> conn=0x9df2a90 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:19 <hasync:WARN> conn=0x9df2a90 abort: rt=-1, dst=169.254.0.1, sync_type=24(mcast)
2024-10-16 16:25:25 <hasync:WARN> conn=0x9e598b0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:25 <hasync:WARN> conn=0x9e598b0 abort: rt=-1, dst=169.254.0.1, sync_type=27(capwap)
2024-10-16 16:25:27 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088727
2024-10-16 16:25:29 <hasync:WARN> conn=0x9e5a7f0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:29 <hasync:WARN> conn=0x9e5a7f0 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:25:37 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088737
2024-10-16 16:25:39 <hasync:WARN> conn=0x9e73640 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:39 <hasync:WARN> conn=0x9e73640 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:25:47 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088747
2024-10-16 16:25:50 <hasync:WARN> conn=0x9e05130 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:25:50 <hasync:WARN> conn=0x9e05130 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:25:57 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088757
2024-10-16 16:25:58 <hasync:WARN> conn=0x9e0db90, peer closed the connection: dst=169.254.0.1, sync_type=18(byod)
2024-10-16 16:26:00 <hasync:WARN> conn=0x9e0e280 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:26:00 <hasync:WARN> conn=0x9e0e280 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:26:07 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088767
2024-10-16 16:26:10 <hasync:WARN> conn=0x9ddf5e0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:26:10 <hasync:WARN> conn=0x9ddf5e0 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:26:11 <hasync:WARN> conn=0x9e7bc80 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:26:11 <hasync:WARN> conn=0x9e7bc80 abort: rt=-1, dst=169.254.0.1, sync_type=3(fib)
2024-10-16 16:26:17 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088777
2024-10-16 16:26:22 <hasync:WARN> conn=0x9df1d40 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:26:22 <hasync:WARN> conn=0x9df1d40 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:26:22 <hasync> reap child: pid=24216, status=0
2024-10-16 16:26:22 <hasync> reap child: pid=24217, status=0
2024-10-16 16:26:22 <hasync> reap child: pid=24218, status=0
2024-10-16 16:26:22 <hasync> reap child: pid=24219, status=0
2024-10-16 16:26:22 <hasync> reap child: pid=24220, status=0
2024-10-16 16:26:22 <hasync> reap child: pid=24221, status=0
2024-10-16 16:26:22 <hasync> reap child: pid=24222, status=0
2024-10-16 16:26:24 <hasync> reap child: pid=24223, status=0
2024-10-16 16:26:24 <hasync> reap child: pid=24224, status=0
2024-10-16 16:26:24 <hasync> reap child: pid=24225, status=0
2024-10-16 16:26:27 <hasync> reap child: pid=24226, status=0
2024-10-16 16:26:27 <hasync> reap child: pid=24227, status=0
2024-10-16 16:26:27 <hasync> reap child: pid=24228, status=0
2024-10-16 16:26:27 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088787
2024-10-16 16:26:27 <hasync> reap child: pid=24229, status=0
2024-10-16 16:26:27 <hasync> reap child: pid=24230, status=0
2024-10-16 16:26:27 <hasync> reap child: pid=24231, status=0
2024-10-16 16:26:27 <hasync> reap child: pid=24232, status=0
2024-10-16 16:26:32 <hasync:WARN> epoll HUP/ERR: conn=0x9e72fa0, dst=169.254.0.1, evt=0x8(ERR)
2024-10-16 16:26:32 <hasync:WARN> conn=0x9e72fa0 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:26:37 <hasync:WARN> conn=0x9e73190, peer closed the connection: dst=169.254.0.1, sync_type=18(byod)
2024-10-16 16:26:37 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088797
2024-10-16 16:26:43 <hasync:WARN> conn=0x9e47240 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:26:43 <hasync:WARN> conn=0x9e47240 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:26:47 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088807
2024-10-16 16:26:53 <hasync:WARN> conn=0x9e0e150 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:26:53 <hasync:WARN> conn=0x9e0e150 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:26:57 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088817
2024-10-16 16:27:02 <hasync:WARN> conn=0x9e7d240 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:02 <hasync:WARN> conn=0x9e7d240 abort: rt=-1, dst=169.254.0.1, sync_type=14(diff)
2024-10-16 16:27:03 <hasync:WARN> epoll HUP/ERR: conn=0x9e59230, dst=169.254.0.1, evt=0x8(ERR)
2024-10-16 16:27:03 <hasync:WARN> conn=0x9e59230 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:27:07 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088827
2024-10-16 16:27:08 <hasync:WARN> conn=0x9def140, peer closed the connection: dst=169.254.0.1, sync_type=18(byod)
2024-10-16 16:27:12 <hasync:WARN> conn=0x9df2ae0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:12 <hasync:WARN> conn=0x9df2ae0 abort: rt=-1, dst=169.254.0.1, sync_type=14(diff)
2024-10-16 16:27:13 <hasync:WARN> conn=0x9e585b0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:13 <hasync:WARN> conn=0x9e585b0 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:27:17 <hasync:WARN> conn=0x9decb30 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:17 <hasync:WARN> conn=0x9decb30 abort: rt=-1, dst=169.254.0.1, sync_type=3(fib)
2024-10-16 16:27:17 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088837
2024-10-16 16:27:19 <hasync> reap child: pid=24235, status=0
2024-10-16 16:27:20 <hasync> reap child: pid=24236, status=0
2024-10-16 16:27:20 <hasync> reap child: pid=24237, status=0
2024-10-16 16:27:20 <hasync> reap child: pid=24238, status=0
2024-10-16 16:27:20 <hasync> reap child: pid=24239, status=0
2024-10-16 16:27:20 <hasync> reap child: pid=24240, status=0
2024-10-16 16:27:20 <hasync> reap child: pid=24241, status=0
2024-10-16 16:27:22 <hasync:WARN> conn=0x9dec970 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:22 <hasync:WARN> conn=0x9dec970 abort: rt=-1, dst=169.254.0.1, sync_type=14(diff)
2024-10-16 16:27:24 <hasync:WARN> conn=0x9e0d8e0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:24 <hasync:WARN> conn=0x9e0d8e0 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:27:25 <hasync> reap child: pid=24242, status=0
2024-10-16 16:27:25 <hasync> reap child: pid=24244, status=0
2024-10-16 16:27:25 <hasync> reap child: pid=24243, status=0
2024-10-16 16:27:25 <hasync> reap child: pid=24245, status=0
2024-10-16 16:27:25 <hasync> reap child: pid=24246, status=0
2024-10-16 16:27:25 <hasync> reap child: pid=24247, status=0
2024-10-16 16:27:25 <hasync> reap child: pid=24248, status=0
2024-10-16 16:27:25 <hasync:WARN> conn=0x9decf70 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:25 <hasync:WARN> conn=0x9decf70 abort: rt=-1, dst=169.254.0.1, sync_type=27(capwap)
2024-10-16 16:27:27 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088847
2024-10-16 16:27:32 <hasync:WARN> conn=0x9df1d40 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:32 <hasync:WARN> conn=0x9df1d40 abort: rt=-1, dst=169.254.0.1, sync_type=14(diff)
2024-10-16 16:27:32 <hasync:WARN> conn=0x9e0e020, peer closed the connection: dst=169.254.0.1, sync_type=4(ipsec)
2024-10-16 16:27:35 <hasync:WARN> conn=0x9e36ce0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:35 <hasync:WARN> conn=0x9e36ce0 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:27:38 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088858
2024-10-16 16:27:40 <hasync:WARN> conn=0x9e36ce0, peer closed the connection: dst=169.254.0.1, sync_type=18(byod)
2024-10-16 16:27:42 <hasync:WARN> conn=0x9e39fb0 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:42 <hasync:WARN> conn=0x9e39fb0 abort: rt=-1, dst=169.254.0.1, sync_type=14(diff)
2024-10-16 16:27:46 <hasync:WARN> conn=0x9e39870 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:46 <hasync:WARN> conn=0x9e39870 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)
2024-10-16 16:27:48 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1729088646/1729088868
2024-10-16 16:27:52 <hasync:WARN> conn=0x9e47240 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:52 <hasync:WARN> conn=0x9e47240 abort: rt=-1, dst=169.254.0.1, sync_type=14(diff)
2024-10-16 16:27:56 <hasync:WARN> conn=0x9e39750 connect(169.254.0.1) failed: 110(Connection timed out)
2024-10-16 16:27:56 <hasync:WARN> conn=0x9e39750 abort: rt=-1, dst=169.254.0.1, sync_type=5(conf)

Toshi_Esumi
SuperUser
SuperUser

I wouldn't waste any more time but just open a ticket and get it looked by TAC.
Did you run "diag debug config-error-log read" on both devices once the first upgrade was done? That likely would have given you a hint what went wrong during the upgrade process.
On the other hand, recently I'm experiencing HA issues during/after upgrade with multiple random clusters so TAC created a bug report and DEV is looking into the origin.
That's another reason to open a TAC case.

Toshi

IgorDM
New Contributor II

Same problem here!

Out of sync with "rule.fmwp"

 

I have a case open for several days with Fortinet.

They asked me to disconnect the out of sync firewall from the HA and reset the firewall but my problem is that it is a location of my company 10 hours away by car.

 

I have an appointment with their support tomorrow to understand what to do!

 

PS: My problem started after upgrade from 7.2.8 to 7.2.9 - FGT61F

IgorDM
New Contributor II

I just solved the problem.

I ran this command on both firewalls, at the same time.

 

fnsysctl killall hasync
fnsysctl killall hatalk
Toshi_Esumi

That means something happened to either or both of daemons after the upgrade process and they couldn't recover by themselves. There maybe/must be a problem in HA with the upgrade process.

Toshi

crakit
New Contributor

We had the same issue. HA was out-of-sync since upgrading from 7.2.8 to 7.2.9. I first compare the checksums running this command on both devices:

diagnose sys ha checksum test


I then compared the two results (compared both in Notepad++ using the Compare plugin). I had no problem there. Then I ran this command that fixed the issue right away:

diagnose sys ha checksum recalculate

According to this kb, this command can be used when checksum matches but sync isn't working.

UPDATE: I've upgraded the cluster from 7.2.9 to 7.2.10 this morning and HA is still up and synced after the upgrade. :)

UPDATE 2: Even if it was stated as synchronised in the Fortigate, it was not. After the upgrade to 7.2.10, all changes made to the Fortigate after HA failed to sync was lost. I was able to redo a test with the same error on 7.2.9. The "fix" I provided still "worked" but as soon as I made a change to a policy, HA went back to a failed status. Reapply the fix, upgraded to 7.2.10, made a change to a policy and it seems to be ok now but all work done before applying 7.2.10 after a HA failed status is lost. I do have the backup configuration :)

Toshi_Esumi

Like OP's case, you should open a ticket with TAC to find the root cause.

Toshi

Announcements

Select Forum Responses to become Knowledge Articles!

Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.

Labels
Top Kudoed Authors