FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
aadenola
Staff
Staff
Article Id 214842
Description

This article describes the split-brain scenario in a HA setup.

 

Split Brain is when the FortiGate in HA can’t communicate with each other on the heartbeat interface, causing each FortiGate to assume the role of the primary unit without the vcluster configuration.

 

When HA is configured, one unit is always the primary unit and the other unit is the secondary no matter if it is active-active or active-passive with vcluster not enabled however with the split-brain, both FortiGates will act as master.

 

Common symptoms of split-brain:
 
  • When trying to connect to the FortiGate cluster via administrative access, connections work intermittently. Sometimes traffic will hit one FortiGate. Other times it will hit the other.
  • Sessions that cannot be established through the FortiGate. Traffic is dropped.
  • When logging into the FortiGates via console, 'get sys ha status' shows each FortiGate as the primary.
 

To avoid a split-brain scenario:

 

  • In a two-member HA configuration, use back-to-back links for the heartbeat interface instead of connecting through a switch.
  • Use redundant HA heartbeat interfaces.
  • In a configuration where members are in different locations, ensure the heartbeat lost intervals and thresholds are longer than the possible latency in the links.
Scope FortiGate, High Availability.
Solution

Cause of split-brain.

 

  1. Incomplete Upgrade (meaning only one unit was able to upgrade).
  2. Split-brain is usually caused by complete loss of the heartbeat link or links. This can be a physical connectivity issue, or less commonly, something blocking the heartbeat packets between the HA members.
  3. Congestion and latency in the heartbeat links that exceed the heartbeat lost intervals and thresholds.

 

Congestion on the heartbeat link can be caused when using the same link for session sync. For better latency, it is recommended to use another link/interface for session sync.

 

Below are the troubleshooting steps:

 

  1. Identify the heartbeat port and confirm if it is up.

show system ha

diagnose hardware deviceinfo nic xxx  <----- Where xxx is the port name.

 

  1. Verify if the heartbeat ports are exchanging sending and receiving the Heartbeat packet.

    dia sniffer packet any "ether proto 0x8890"4 <----- NAT/Route Mode Heartbeat.
    dia sniffer packet any "ether proto 0x8891" 4 <-----Transparent Mode Heartbeat.
    dia sniffer packet any "ether proto 0x8893" 4 <----- Configuration synchronization.

  2. Verify HA configurations are matching between the HA members.

    Verify settings such as HA mode, group-name, group-id*, passwords are the same,*different group-ids will result in different Virtual MACs, so this might not cause a MAC conflict.


    However, an IP conflict can still occur.

     

     

  3. Check the firmware version of both units.

    get system status

     

     

  4. Another interface should be used/selected for session-sync instead of the Heartbeat interface.

     

 

Example:

 

Master.
 
dia sys ha history  read
version=1.1
HA state change time: 2022-06-16 12:55:36
message_count=8/512
<2022-06-16 12:55:36> FGVMEVIJGWSKGW55 is elected as the cluster primary of 1 member
<2022-06-16 12:55:36> member FGVMEV_FDLRD6Y15 lost heartbeat on hbdev port2
<2022-06-16 12:55:36> heartbeats from FGVMEV_FDLRD6Y15 are lost on all hbdev
<2022-06-16 12:55:32> hbdev port2 link status changed: 1->0
 
Slave.
 
dia sys ha history  read
version=1.1
HA state change time: 2022-06-16 12:55:36
message_count=6/512
<2022-06-16 12:55:36> member FGVMEVIJGWSKGW55 lost heartbeat on hbdev port2
<2022-06-16 12:55:36> FGVMEV_FDLRD6Y15 is elected as the cluster primary of 1 member
<2022-06-16 12:55:36> heartbeats from FGVMEVIJGWSKGW55 are lost on all hbdev
 
Related document: