FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
hamidr
Staff
Staff
Article Id 419012
Description This article describes how a FortiGate HA failover in Google Cloud can delete SDN routes without recreating them if the failover is interrupted, and provides guidance on why it happens and how to prevent it.
Scope FortiGate, FortiOS v7.0+.
Solution FortiGate High Availability in Google Cloud Platform.
FortiGate High Availability (HA) in Google Cloud Platform (GCP) operates fundamentally differently from on-premises environments because the underlying GCP Virtual Private Cloud (VPC) networking fabric does not support Layer 2 (L2) features critical for traditional HA, such as Gratuitous ARP (GARP), shared MAC addresses, or floating IPs.

To manage failovers, FortiGate utilizes the GCP SDN Connector (Software-Defined Network Connector). This feature integrates with the GCP API to dynamically update custom static VPC routes so that traffic is always directed to the active FortiGate instance.

In an Active-Passive (A-P) cluster, the Primary (Active) node is the only one authorized to make these route changes. When a failover occurs, the newly promoted Primary node executes an API call via the SDN Connector to modify the existing custom route. It updates the route's Next Hop to its own internal IP address, effectively steering traffic away from the failed instance.

This dynamic routing mechanism ensures fast failover but carries the risk of a split-brain condition, where a communication failure between the two nodes leads both to briefly assume the Primary role and compete for control over the VPC routes.

Short Description of FortiGate HA in GCP.
FortiGate HA in GCP has two important characteristics:

  • Each firewall instance uses unique NIC IPs (no shared VIPs).
  • Failover depends on GCP route manipulation instead of floating IPs.

 

When a node becomes primary:

  • It becomes responsible for programming GCP custom routes.
  • The SDN Connector performs API calls (compute.routes.delete → compute.routes.insert).

 

During a FortiGate HA failover in GCP, the newly promoted primary node begins with the SDN routing update workflow:

  1. Delete the existing GCP routes that point to the old primary.
  2. Insert new GCP routes pointing to itself (the new primary).


This 'delete → insert sequence' is required because GCP does not support direct route updates.

 

Because of this, HA behavior and routing stability depend on:

  • Reliable heartbeat communication,
  • HA role stability,
  • Consistent completion of the SDN update sequence.

 

Problem Scenario.
In some situations, such as a brief heartbeat loss, a rapid failback, or a short-lived split-brain condition, the FortiGate may:

  • Start the failover,
  • Successfully delete the GCP routes.
  • Then revert to the previous HA state before the route-insert step occurs.

 

 Because the route-creation step is interrupted:

  • The routes are deleted,
  • But no new routes are created,
  • And the cluster returns to a stable primary/secondary state with missing GCP custom routes.

 

Because FortiOS does not keep a complete local record of the routes it deleted, it cannot automatically recreate the missing routes once the HA state returns to normal. As a result, the routes remain missing and traffic becomes blackholed.

 

Workarounds to mitigate this issue.

 

  1. Adjusting HA Timers.

Slowing down failovers prevents half-completed SDN updates.


hb-interval # Time (in seconds) between sending heartbeat packets.
hb-lost-threshold # Number of missed heartbeats required to signal a peer failure.

 

If the cluster experiences temporary CPU spikes, short hbdev interruptions, or brief network instability, secondary may temporarily promote itself to primary, trigger route deletions, and then return to secondary before the route-creation step completes.
Increasing these timers reduces the chance of unnecessary or partial failovers.

 

  1. Add a Second Heartbeat Interface (If the architecture allows).
    Redundant heartbeat paths reduce the chance of spurious HA transitions.

     

     

  2. Consider Load-Balancer-Based Designs.
    Using Google Cloud Load Balancers or similar services to manage traffic forwarding reduces dependency on the route deletion/creation logic within the FortiGates.

     

 

Note: AWS, Azure, and OCI support true route updates via their APIs, so they are not impacted by this specific route deletion/insertion edge case.

 

Related documents:
HA for FortiGate-VM on GCP
FortiGate architecture in Google Cloud