FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
jcovarrubias
Staff
Staff
Article Id 355981
Description

This article describes a scenario where a High-Availability (HA) FortiGate pair upgrade becomes stuck in SENT-IMAGE status.

Scope FortiGate
Solution

Pre-requisites: Understanding of the High-Availability upgrade process as documented in Technical Tip: FortiGate HA upgrade procedure and the status during the upgrade.

 

Issue Description: During Stage #1 of the upgrade process, the primary FortiGate transmits the target software image to the secondary FortiGate. The standard procedure involves the secondary FortiGate rebooting after image reception, followed by the primary unit advancing to Stage #2. However, in certain circumstances, despite the secondary FortiGate completing its reboot cycle with the new image, the primary unit fails to progress to the next stage.

 

In this scenario, the primary FortiGate will display the following debug messages (after enabling  'diagnose debug application hatalk -1' and 'diagnose debug enable') in a never-ending cycle:

 

<hatalk> entering hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000

<hatalk> leaving hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000

<hatalk> entering hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000

<hatalk> leaving hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000

<hatalk> entering hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000

 

Configuration analysis:

 

The relevant high-availability (HA) configuration settings include:

 

config system ha

    set group-id 100

    set group-name "NAME"

    set mode a-p

    set hbdev "port4" 100

    set hb-interval 20

    set hb-lost-threshold 60

 

Root Cause:

 

The configuration sets a heartbeat interval of 2 seconds with a lost threshold of 60 intervals (120 seconds total). Analysis reveals that heartbeat packet reception on the primary FortiGate temporarily ceased during the secondary unit's reboot. However, packet reception resumed before reaching the 120-second lost threshold, resulting in the upgrade failure.

 

The high-availability (HA) protocol design requires members' reboot durations to exceed the hb-lost-threshold timer for proper reboot detection. When the reboot completes before this threshold, as observed in this scenario, the primary FortiGate cannot detect the reboot completion and remains in SENT-IMAGE status.