Solution |
- Description:
If an FGSP cluster consists of two members and both have standalone-config-sync enabled, rebooting one of them will cause out-of-sync errors due to the port_ha in each being unable to reach the other.
There is no such issue in AA or AP mode with FGSP. This issue affects FortiOS 7.0.X, 7.2.X, and 7.4.X.
The commands below were dumped after rebooting the secondary to reproduce this issue.
In this situation, when a primary FortiGate is trying to send out the packets, they get dropped and never reach the other end.
D1-CA-DB-FW-1 (global) # get system ha status HA Health Status: OK Model: FortiGate-60F Mode: ConfigSync Group Name: D1-CA-DB-FW Group ID: 10 Debug: 0 Cluster Uptime: 0 days 0:17:45 Cluster state change time: 2024-07-26 17:06:50 Primary selected using: <2024/07/26 17:06:50> vcluster-1: FGT60FTK22060168 is selected as the primary because its uptime is larger than peer member FGT60FTK2209E0GV. <2024/07/26 17:05:18> vcluster-1: FGT60FTK22060168 is selected as the primary because it's the only member in the cluster. <2024/07/26 16:55:10> vcluster-1: FGT60FTK22060168 is selected as the primary because its uptime is larger than peer member FGT60FTK2209E0GV. <2024/07/26 16:54:57> vcluster-1: FGT60FTK22060168 is selected as the primary because it's the only member in the cluster. ses_pickup: enable, ses_pickup_delay=disable override: disable Configuration Status: FGT60FTK22060168(updated 4 seconds ago): in-sync FGT60FTK22060168 chksum dump: da bc 6a f8 89 7d ab 93 10 43 83 78 a0 17 a0 a1 FGT60FTK2209E0GV(updated 431 seconds ago): in-sync FGT60FTK2209E0GV chksum dump: da bc 6a f8 89 7d ab 93 10 43 83 78 a0 17 a0 a1
D1-CA-DB-FW-2 (global) # get system ha status HA Health Status: OK Model: FortiGate-60F Mode: ConfigSync Group Name: D1-CA-DB-FW Group ID: 10 Debug: 0 Cluster Uptime: 0 days 0:18:13 Cluster state change time: 2024-07-26 17:06:55 Primary selected using: <2024/07/26 17:06:55> vcluster-1: FGT60FTK22060168 is selected as the primary because its uptime is larger than peer member FGT60FTK2209E0GV. ses_pickup: enable, ses_pickup_delay=disable override: disable Configuration Status: FGT60FTK2209E0GV(updated 4 seconds ago): out-of-sync FGT60FTK2209E0GV chksum dump: da bc 6a f8 89 7d ab 93 10 43 83 78 a0 17 a0 a1 FGT60FTK22060168(updated 1721981575 seconds ago): in-sync FGT60FTK22060168 chksum dump: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
D1-CA-DB-FW-2 (global) # dia debug appli hasync -1
D1-CA-DB-FW-1 (global) # dia debug ena
D1-CA-DB-FW-1 (global) # exe ha sync start
<hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=2(work)/1721981215/1721982100 <hasync> upd_cfg_extract_av_db_version[331]-Failed av db version, obj 32 <hasync:WARN> conn=0x91c7cc0 connect(169.254.0.2) failed: 113(No route to host) <hasync:WARN> conn=0x91c7cc0 abort: rt=-1, dst=169.254.0.2, sync_type=5(conf) <hatalk> vcluster_1: ha_prio=1(secondary), state/chg_time/now=2(work)/1721981215/1721982110 <hasync> upd_cfg_extract_av_db_version[331]-Failed av db version, obj 32 <hasync:WARN> conn=0x91c9520 connect(169.254.0.2) failed: 113(No route to host)
- Workaround:
Restart the Primary unit:
execute fnsysctl killall hatalk.
- Resolved in: 7.2.11(tentative), 7.4.7(tentative), 7.6.1(fixed).
|