Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
CSKUM
New Contributor

Fortigate HA Active-Passive out of sync all the time

Hello,

 

Few days ago we've started having trouble with our Active Passive cluster of two 1000F fortigates running 7.2.10 firmware.

 

After making changes on the primary unit, those changes does not propagate to secondary and after few minutes we see HA cluster out of sync. We've waiting couple of hours but they didn't synchronize.

 

The only way to get synchronize back is to manually force it by CLI:

 

diagnose sys ha checksum recalculate

execute ha synchronize start

 

After executing those commands couple of times on both primary and secondary cluster becomes synchronized.

 

Any ideas what happened?

Szymon Malinowski
Szymon Malinowski
15 REPLIES 15
CSKUM

I've tried everything from the link you've provided exept rebuilding the HA from scratch (reseting secondary to factory defaults). I've replaced the cable used for connecting both Fortigates. I event switched the port used for HA from HA port to port7. Same result.

 

Almost every time when I get the synchronization back manually and I add something new on the primary unit the secondary gets out of sync. New objects which are added on primary unit don't show up on secondary. Sometimes it works but that's not very often. Few minutes ago I've made a test and added 3 address objects on primary unit one by one and check if they show up on secondary. And it did. But when I removed them from primary they weren't removed on secondary and HA became out of sync again.

 

When I debug the HA from CLI i get multiple WARNINGS, but I don't know if it is normal or not:

 

025-02-10 13:00:56 <hasync:WARN> conn=0xc4e4440, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:01:01 <hasync:WARN> conn=0xc4d1c20, peer closed the connection: dst=169.254.0.2, sync_type=18(byod)
2025-02-10 13:01:03 <hasync:WARN> conn=0xc536520, peer closed the connection: dst=169.254.0.2, sync_type=12(auth)
2025-02-10 13:01:04 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188864
2025-02-10 13:01:06 <hasync:WARN> conn=0xc4f9cd0, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:01:14 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188874
2025-02-10 13:01:16 <hasync:WARN> conn=0xc4e4440, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:01:24 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188884
2025-02-10 13:01:26 <hasync:WARN> conn=0xc4d1c20, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:01:32 <hasync:WARN> conn=0xc536520, peer closed the connection: dst=169.254.0.2, sync_type=18(byod)
2025-02-10 13:01:34 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188894
2025-02-10 13:01:36 <hasync:WARN> conn=0xc4f9cd0, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:01:44 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188904
2025-02-10 13:01:46 <hasync:WARN> conn=0xc4e4440, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:01:54 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188914
2025-02-10 13:01:56 <hasync:WARN> conn=0xc4e4440, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:02:03 <hasync:WARN> conn=0xc4d1c20, peer closed the connection: dst=169.254.0.2, sync_type=18(byod)
2025-02-10 13:02:04 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188924
2025-02-10 13:02:06 <hasync:WARN> conn=0xc56dc20, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:02:14 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188934
2025-02-10 13:02:16 <hasync:WARN> conn=0xc4f9cd0, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:02:24 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188944
2025-02-10 13:02:26 <hasync:WARN> conn=0xc4e4440, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:02:33 <hasync:WARN> conn=0xc4d1c20, peer closed the connection: dst=169.254.0.2, sync_type=18(byod)
2025-02-10 13:02:34 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188954
2025-02-10 13:02:36 <hasync:WARN> conn=0xc56dc20, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:02:44 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188964
2025-02-10 13:02:46 <hasync:WARN> conn=0xc4e4440, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)
2025-02-10 13:02:54 <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=3(standby)/1739187627/1739188974
2025-02-10 13:02:56 <hasync:WARN> conn=0xc56dc20, peer closed the connection: dst=169.254.0.2, sync_type=14(diff)

Szymon Malinowski
Szymon Malinowski
Toshi_Esumi

When you run the command below on both units as in the KB, what did you see in the hatalk application debug output? Didn't the command recover the sync?

  fnsysctl killall hasync

 

It might be another HA related bug in 7.2.x. Or already reported. You should open a ticket at TAC to get it evaluated. 


Toshi

CSKUM

This is what I did today. I manually synec both devices so the cluster is synchronized. I've enabled debuging with these commands on both fortigates:

 

diag debug app hasync 255
diag debug enable

diagnose debug application hatalk -1

diagnose debug application hasync -1

 

After that I added firewall address object on primary. It showed up on secondary almost immideatly and they were still in sync. After that I addedd new address group and added this new addres to it. Same efect this address group immideatly showed up on secondary and cluster is still in sync.

 

After that I removed from primary this new address group and new address and they didn't disapear from secondary and cluster became out of sync. I waited 15-20 minutes and it didn't sync on it's own. Belowe is the debug output from both devices from the time when I was adding and removing this address and address group.

Primary - Debug 

Secondary - Debug 

Szymon Malinowski
Szymon Malinowski
dingjerry_FTNT

Hi @CSKUM ,

 

Could you please share the FGT config?

 

If not, please share the outputs with the following commands:

 

get sys ha status

get sys status

show sys ha 

Regards,

Jerry
CSKUM

Here are the results of above commands for primary and secondary:

Primary 

Secondary 

Szymon Malinowski
Szymon Malinowski
dingjerry_FTNT

Hi @CSKUM ,

 

Thanks for the outputs.

 

1) On the Secondary, I see:

<2025/02/10 12:40:21> vcluster-1: FG1K0FTB23901110 is selected

But I don't see a similar one on the Primary.  It seems that for some time, they did not see each other.

 

How do you connect the Heartbeat interface?  Connect each other directly or via a switch?

 

2) Both devices are using the same HA priority 200.  It's better to use a lower one for the Secondary one.

 

3) If you make some changes, can you check the synchronization results later?  Say, after a few minutes.

Regards,

Jerry
CSKUM

1. This is strange because primary is  FG1K0FTB23900927 and secondary is  FG1K0FTB23901110. They are connected directly via 25cm rj45 patch cord. We used to have it connected via dedicated HA port but once the problem started we've switched it to one of the lan ports, port8 to be exact and set up this port as Heartbeat in HA configuration. 

2. Nope they are not. Primary is set up to 200 and secondary to 150. I checked it on both devices and it's the same 200/150

3. I've waited over 15-20 minutes after I removed the test group and test address. Nothing happened. But when I was adding the test address and test group the changes showed up on secondary immediately.  

Szymon Malinowski
Szymon Malinowski
dingjerry_FTNT

Hi @CSKUM ,

 

In the outputs of Secondary device, it has:

FortiGate-1000F-100 # show sys ha 

I guess you were using "exe ha manage" command to access the Secondary device. It seems the connection to the Secondary device was timed out and backed to the Primary device.

 

Anyway, this issue is really weird and I suggest you open a TAC ticket for further assistance.

Regards,

Jerry
Announcements
Check out our Community Chatter Blog! Click here to get involved
Labels
Top Kudoed Authors