Technical Tip: FortiGate VM High Availability cluster out-of-sync after upgrade to v7.2 if vrf configured on heartbeat interface

Matt_B · ‎01-13-2025

Description

This article describes a known issue that can prevents a virtual machine HA cluster from synchronizing when using unicast heartbeat if a non-default vrf is configured on the heartbeat interface.

Scope

FortiGate v7.2, v7.4, or v7.6, FortiGate VM High Availability cluster using unicast heartbeat.

Solution

Some virtual environments require the use of Unicast HA heartbeat.

In v7.2 and later, if a non-default vrf was configured on a High Availability (HA) heartbeat interface, inbound unicast cluster communication will be dropped by the Reverse Path Forwarding (RPF) check. This does not prevent the cluster from forming or passing data traffic but does prevent configuration synchronization and other cluster management functions.

Diagnosing the issue:

If the HA heartbeat interface shows a vrf other than '0', the issue is expected.

FGT-A # show system ha | grep hbdev

set hbdev "port2" 0

FGT-A # show full system interface port2 | grep vrf

set vrf 15

The command 'get system ha status' shows both devices in the cluster but no checksum update from the remote device.

FGT-A # get system ha status
<...>
unicast_hb: peerip=10.255.196.2, myip=10.255.196.1, hasync_port='port2'
Configuration Status:
FGVM08TMAAAAAAAA(updated 4 seconds ago): out-of-sync
FGVM08TMAAAAAAAA chksum dump: 27 b7 d6 3d 2b 9c 8c d9 ec a7 4a aa 2a b5 6e e9
FGVM08TMBBBBBBBB(updated 1736791624 seconds ago:( in-sync
FGVM08TMBBBBBBBB chksum dump: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

A debug flow trace filtered by remote heartbeat IP shows reverse path check failure for incoming packets from the remote heartbeat IP address.

FGT-A # diagnose debug flow filter addr 10.255.196.2

FGT-A # diag debug enable

FGT-A # diag debug flow trace start 100

FGT-A # id=65308 trace_id=1 func=print_pkt_detail line=5862 msg="vd-vsys_ha:15 received a packet(proto=17, 10.255.196.1:730->10.255.196.2:730) tun_id=0.0.0.0 from local. "
id=65308 trace_id=1 func=resolve_ip_tuple_fast line=5950 msg="Find an existing session, id-00000bf9, original direction"
id=65308 trace_id=2 func=print_pkt_detail line=5862 msg="vd-vsys_ha:15 received a packet(proto=17, 10.255.196.2:730->10.255.196.1:730) tun_id=0.0.0.0 from port2. "
id=65308 trace_id=2 func=resolve_ip_tuple_fast line=5950 msg="Find an existing session, id-00000bf9, reply direction"
id=65308 trace_id=3 func=print_pkt_detail line=5862 msg="vd-vsys_ha:15 received a packet(proto=6, 10.255.196.2:7716->10.255.196.1:703) tun_id=0.0.0.0 from port2. flag [S], seq 412232175, ack 0, win 29200"
id=65308 trace_id=3 func=init_ip_session_common line=6047 msg="allocate a new session-00000f94"
id=65308 trace_id=3 func=ip_route_input_slow line=1695 msg="reverse path check fail, drop"

FGT-A # diagnose debug disable

Resolving the issue:

Heartbeat interface configuration is not synchronized between devices and cannot be changed easily. It is highly recommended to schedule a maintenance window to make the required changes since some network disruption is likely.

Option 1:

Redeploy the cluster using a configuration without vrf.

Option 2:
Because the HA heartbeat interface configuration is not synchronized between HA members, restoring a modified configuration file will not resolve the issue if both devices are in the cluster at the time. Therefore it is necessary to remove each device from the cluster, take a configuration backup of the individual device, and restore a modified backup.

Failover cluster as necessary to make the device secondary. See Technical Tip: Different options to trigger an HA failover (FGCP).
Isolate FortiGate from the network, leaving only an isolated GUI interface to manage the virtual machine. This requires administrative access to the virtual switch or virtual networks of the platform hosting the VM.
Isolate FortiGate from the HA cluster. If needed, this can be done by changing the unicast-hb-peerip.
Connect to FortiGate GUI and download FortiGate's backup configuration using a super_admin.
Remove 'set vrf <xx>' from the heartbeat interface in the configuration file.

config system interface
edit <hbdev> <----- found from 'config system ha > set hbdev'. More than one interface is possible.
set vrf <vrf_number> <----- Remove this line.
set ip <address> <mask>
set type physical
set snmp-index <index>
next
end
Restore the modified backup configuration to the isolated FortiGate. The unit reboots. Verify the intended configuration after reboot.
Restore connectivity from FortiGate to the HA cluster.
Restore connectivity from FortiGate to the network.
Repeat the above steps for each FortiGate having a non-default vrf on the heartbeat interface.

Option 3:

This option is only suitable when the devices are not in production:

Take configuration backup of each device and verify console (virtual serial) access to the device.
Change the HA heartbeat interface to an unused interface. Note this will remove any IP address on the old HA heartbeat interface and will cause temporary split-brain condition.
On each device, unset vrf from the old heartbeat interface and reconfigure the IP address.

config system interface
edit <old hbdev>
unset vrf
set ip x.x.x.x y.y.y.y
end
On each device, restore the HA heartbeat interface.

Notes:

Physical FortiGate clusters do not support unicast heartbeat and are not affected by this issue.
Configuring vrf on an HA heartbeat interface is a misconfiguration and will be no longer possible in an upcoming v7.6 release.
The misconfigured vrf may be present on one or both HA devices.
In v7.0 firmware versions, the reverse path forward check was not enforced for heartbeat traffic. Devices having the misconfiguration would not have any noticeable issues until upgrade to v7.2 or later.

Related articles:
Technical Tip: Reverse Path Forwarding (RPF) implementation and use of strict-src-check enable|disab...

How to restore a configuration backup on a FortiGate HA cluster

How to access the FortiGate VM console in public cloud

Technical Tip: FortiGate VM High Availability cluster out-of-sync after upgrade to v7.2 if vrf configured on heartbeat interface

You are leaving our website