FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
Sepideh
Staff
Staff
Article Id 356961
Description

This article describes potential issues that may occur during a failover on a FortiGate 7000 chassis, along with troubleshooting steps and solutions for resolving those issues.

Scope FortiGate 7000.
Solution

The failover could either be triggered manually or occur due to unspecified reasons. The two primary issues that may occur are:

 

  1. IPsec tunnels may 'fail to establish', if IPsec is used.
  2. The primary chassis may become 'out-of-sync', causing desynchronization between FIM and FPM blades on the primary chassis.

 

To address the first issue, bouncing the IPsec tunnels should resolve the issue.

 

To start troubleshooting the second issue, first examine the output of the following commands:

 

get system status
diagnose load-balance status
diagnose sys confsync showcsum
diagnose sys confsync status
diag debug crashlog read | grep "YYYY-MM-DD"
show system global
diagnose sys ha hadiff status

diag sys confsync diffcsum

 

Example:

In the example provided, two FortiGate 7000 chassis are involved. Initially, 7K-C1 was the Primary chassis and 7K-C2 was the Secondary. However, due to a failover, 7K-C2 has become the new Primary chassis. The issue arose because the original Primary, 7K-C1, experienced a desynchronization across its FIM and FPM blades. It is important to note that HA between the two chassis is functioning correctly, with no issue in maintaining the connection. The main problem lies in the desynchronization of the 7K-C1 FIM and FPM blades.

 

7K-C1 Chassis: Primary Chassis

7K-C2 Chassis: Secondary Chassis (The 7K-C2 chassis has become the new primary due to the out-of-sync issue across the 7K-C1 FIM and FPM blades.)

 

7K-C1 [FIM01] (global) # diagnose sys confsync status
==========================================================================
Slot: 3 Module SN: FPM20FTxxxxxxxxx
ELBC: svcgrp_id=1, chassis=1, slot_id=3
ELBC HB devs:
elbc-ctrl1: active=1, hb_count=237
elbc-ctrl2: active=0, hb_count=0
ELBC mgmt devs:
b-chassis: mgmtip_set=1

zone: self_idx:1, primary_idx:0, ha_primary_idx:255, members:2
FPM20FTBxxxxxxxxx, Secondary, uptime=236.65, priority=19, slot_id=1:3, idx=1, flag=0x4, in_sync=0
FIM41FTBxxxxxxxxx, Primary, uptime=4944717.69, priority=1, slot_id=1:1, idx=0, flag=0x10, in_sync=1
b-chassis: state=3(connected), ip=169.254.2.15, last_hb_time=633.12, hb_nr=904

==========================================================================
Current slot: 1 Module SN: FIM41FTxxxxxxxxx
ELBC: svcgrp_id=1, chassis=1, slot_id=1

ha zone: ha_primary_sn:F78F1ATBxxxxxxxxx, ha_primary_idx:0
Ha Member: F78F1ATBxxxxxxxxx, mode=a-p, role=Secondary, slot_id=1:1, idx=1, in_sync=1
Ha Member: F78F1ATBxxxxxxxxx, mode=a-p, role=Primary, slot_id=2:1, idx=0, in_sync=1


zone: self_idx:1, primary_idx:1, ha_primary_idx:0, members:2 ha_member:1
FIM41FTBxxxxxxxxx, Primary, uptime=4944717.69, priority=1, slot_id=1:1, idx=1, flag=0x10, in_sync=1
FPM20FTBxxxxxxxxx, Secondary, uptime=236.65, priority=19, slot_id=1:3, idx=2, flag=0x4, in_sync=0
b-chassis: state=3(connected), ip=169.254.2.3, last_hb_time=4944991.55, hb_nr=827

 

7K-C1 [FIM01] (global) # diag load-balance status
==========================================================================
Current slot: 1 Module SN: FIM41FTxxxxxxxxx
Primary FPM Blade: N/A

Slot 3:
Status:Dead Function:Active
Link: Base: Up Fabric: Up
Heartbeat: Management: Good Data: Failed
Status Message:"Waiting for configuration sync."

 

To resolve the issue of the Primary chassis being out-of-sync, five solutions are proposed:

 

Soultion 1: To recalculate checksum.

 

7K-C1 [FIM01] (global):

# diagnose sys confsync csum-recalculate

 

Solution 2: To kill 'confysncd process'.

 

7K-C1 [FIM01] (global):

# diagnose sys process pidof confsyncd >> x
# diagnose sys process dump x
# diagnose sys process pstack x

 

Solution 3: To upload config backup from FIM blade to FPM.

 

Solution 4: To power cycle FPM blade ONLY.

  • A remote or local connection to the SMM console port is required.
  • After connecting to the FortiGate console port, 'Ctrl+T' must be pressed multiple times to reach the SMM prompt.
  • To restart the slot, the command is 'fru activate <slot-ID>':

admin@SMM: 

# fru deactivate x

# fru activate x

# diagnose load-balance status

 

Solution 5: Reboot.

This can involve the following steps:

 

Step 1: Reboot the 7K-C1 Chassis:

It is recommended to isolate the 7K-C1 chassis: however, this can also be done with HA. This should not impact the current primary chassis (7K-C2).


The output for the following commands should be gathered before and after the reboot:

 

7K-C1 [FIM01] (global):

# get system status

# diagnose load-balance status

# diagnose sys confsync showcsum

# diagnose sys confsync status

# diag debug crashlog read | grep "YYYY-MM-DD"
# execute reboot

 

*** If the issue persists, proceed with rebooting the 7K-C2 chassis.

 

Step 2: Reboot the 7K-C2 Chassis:
The output for the commands listed in Step 1 should be collected both before and after the reboot:

 

  1. It is necessary to isolate the 7K-C2 and 7K-C1 chassis.
  2. On the 7K-C1 chassis, the cables should be removed in the following order: Data ---> Mgmt ---> HA. After this, verify if the chassis blades come back in-sync.
  3. Then the cables should be reconnected in the order of HA ---> Mgmt ---> Data.
  4. Afterward, the sync status should be verified on both chassis:


# diagnose sys confsync status

 

  1. Finally, a failover to the 7K-C1 chassis should be performed followed by rebooting or power cycling the 7K-C2 chassis. 

 

7K-C1 [FIM01] (global):

# execute ha failover set <cluster_id>

# execute ha failover status

# get system ha status

# execute ha failover unset <cluster_id>

 

7K-C2 [FIM01] (global):
# execute reboot

 

If the issue remains unresolved, it may then be necessary to proceed with step 3.

 

Step 3: Reload the configuration to the whole chassis:
If step 1 and step 2 do not resolve the out-of-sync issue, a configuration reload should be performed on all chassis to address the issue. The procedure involves breaking the HA, isolating the 7K-C1 chassis, and reloading the configuration on it. After the reload, the cables should be reconnected, and HA must be reconfigured on both the 7K-C1 and the 7K-C2 chassis to restore synchronization.

Contributors