Some virtual environments require the use of Unicast HA heartbeat.
In v7.2 and later, if a non-default vrf was configured on a High Availability (HA) heartbeat interface, inbound unicast cluster communication will be dropped by the Reverse Path Forwarding (RPF) check. This does not prevent the cluster from forming or passing data traffic but does prevent configuration synchronization and other cluster management functions.
Diagnosing the issue:
If the HA heartbeat interface shows a vrf other than '0', the issue is expected.
FGT-A # show system ha | grep hbdev
set hbdev "port2" 0
FGT-A # show full system interface port2 | grep vrf
set vrf 15
The command 'get system ha status' shows both devices in the cluster but no checksum update from the remote device.
FGT-A # get system ha status <...> unicast_hb: peerip=10.255.196.2, myip=10.255.196.1, hasync_port='port2' Configuration Status: FGVM08TMAAAAAAAA(updated 4 seconds ago): out-of-sync FGVM08TMAAAAAAAA chksum dump: 27 b7 d6 3d 2b 9c 8c d9 ec a7 4a aa 2a b5 6e e9 FGVM08TMBBBBBBBB(updated 1736791624 seconds ago:( in-sync FGVM08TMBBBBBBBB chksum dump: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
A debug flow trace filtered by remote heartbeat IP shows reverse path check failure for incoming packets from the remote heartbeat IP address.
FGT-A # diagnose debug flow filter addr 10.255.196.2
FGT-A # diag debug enable
FGT-A # diag debug flow trace start 100
FGT-A # id=65308 trace_id=1 func=print_pkt_detail line=5862 msg="vd-vsys_ha:15 received a packet(proto=17, 10.255.196.1:730->10.255.196.2:730) tun_id=0.0.0.0 from local. " id=65308 trace_id=1 func=resolve_ip_tuple_fast line=5950 msg="Find an existing session, id-00000bf9, original direction" id=65308 trace_id=2 func=print_pkt_detail line=5862 msg="vd-vsys_ha:15 received a packet(proto=17, 10.255.196.2:730->10.255.196.1:730) tun_id=0.0.0.0 from port2. " id=65308 trace_id=2 func=resolve_ip_tuple_fast line=5950 msg="Find an existing session, id-00000bf9, reply direction" id=65308 trace_id=3 func=print_pkt_detail line=5862 msg="vd-vsys_ha:15 received a packet(proto=6, 10.255.196.2:7716->10.255.196.1:703) tun_id=0.0.0.0 from port2. flag [S], seq 412232175, ack 0, win 29200" id=65308 trace_id=3 func=init_ip_session_common line=6047 msg="allocate a new session-00000f94" id=65308 trace_id=3 func=ip_route_input_slow line=1695 msg="reverse path check fail, drop"
FGT-A # diagnose debug disable
Resolving the issue:
Heartbeat interface configuration is not synchronized between devices and cannot be changed easily. It is highly recommended to schedule a maintenance window to make the required changes since some network disruption is likely.
Option 1:
Redeploy the cluster using a configuration without vrf.
Option 2: Because the HA heartbeat interface configuration is not synchronized between HA members, restoring a modified configuration file will not resolve the issue if both devices are in the cluster at the time. Therefore it is necessary to remove each device from the cluster, take a configuration backup of the individual device, and restore a modified backup.
- Failover cluster as necessary to make the device secondary. See Technical Tip: Different options to trigger an HA failover (FGCP).
- Isolate FortiGate from the network, leaving only an isolated GUI interface to manage the virtual machine. This requires administrative access to the virtual switch or virtual networks of the platform hosting the VM.
- Isolate FortiGate from the HA cluster. If needed, this can be done by changing the unicast-hb-peerip.
- Connect to FortiGate GUI and download FortiGate's backup configuration using a super_admin.
- Remove 'set vrf <xx>' from the heartbeat interface in the configuration file.
config system interface edit <hbdev> <----- found from 'config system ha > set hbdev'. More than one interface is possible. set vrf <vrf_number> <----- Remove this line. set ip <address> <mask> set type physical set snmp-index <index> next end
- Restore the modified backup configuration to the isolated FortiGate. The unit reboots. Verify the intended configuration after reboot.
- Restore connectivity from FortiGate to the HA cluster.
- Restore connectivity from FortiGate to the network.
- Repeat the above steps for each FortiGate having a non-default vrf on the heartbeat interface.
Option 3:
This option is only suitable when the devices are not in production:
- Take configuration backup of each device and verify console (virtual serial) access to the device.
- Change the HA heartbeat interface to an unused interface. Note this will remove any IP address on the old HA heartbeat interface and will cause temporary split-brain condition.
- On each device, unset vrf from the old heartbeat interface and reconfigure the IP address.
config system interface edit <old hbdev> unset vrf set ip x.x.x.x y.y.y.y end
- On each device, restore the HA heartbeat interface.
Notes:
- Physical FortiGate clusters do not support unicast heartbeat and are not affected by this issue.
- Configuring vrf on an HA heartbeat interface is a misconfiguration and will be no longer possible in an upcoming v7.6 release.
- The misconfigured vrf may be present on one or both HA devices.
- In v7.0 firmware versions, the reverse path forward check was not enforced for heartbeat traffic. Devices having the misconfiguration would not have any noticeable issues until upgrade to v7.2 or later.
Related articles: Technical Tip: Reverse Path Forwarding (RPF) implementation and use of strict-src-check enable|disab...
How to restore a configuration backup on a FortiGate HA cluster
How to access the FortiGate VM console in public cloud
|