Description |
This article describes a scenario where a High-Availability (HA) FortiGate pair upgrade becomes stuck in SENT-IMAGE status. |
Scope | FortiGate |
Solution |
Pre-requisites: Understanding of the High-Availability upgrade process as documented in Technical Tip: FortiGate HA upgrade procedure and the status during the upgrade.
Issue Description: During Stage #1 of the upgrade process, the primary FortiGate transmits the target software image to the secondary FortiGate. The standard procedure involves the secondary FortiGate rebooting after image reception, followed by the primary unit advancing to Stage #2. However, in certain circumstances, despite the secondary FortiGate completing its reboot cycle with the new image, the primary unit fails to progress to the next stage.
In this scenario, the primary FortiGate will display the following debug messages (after enabling 'diagnose debug application hatalk -1' and 'diagnose debug enable') in a never-ending cycle:
<hatalk> entering hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000 <hatalk> leaving hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000 <hatalk> entering hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000 <hatalk> leaving hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000 <hatalk> entering hatalk_upgrade_timer_func: uprade_state=3(SENT-IMAGE), daemon_bits=0x00000000
Configuration analysis:
The relevant high-availability (HA) configuration settings include:
config system ha set group-id 100 set group-name "NAME" set mode a-p set hbdev "port4" 100 set hb-interval 20 set hb-lost-threshold 60
Root Cause:
The configuration sets a heartbeat interval of 2 seconds with a lost threshold of 60 intervals (120 seconds total). Analysis reveals that heartbeat packet reception on the primary FortiGate temporarily ceased during the secondary unit's reboot. However, packet reception resumed before reaching the 120-second lost threshold, resulting in the upgrade failure.
The high-availability (HA) protocol design requires members' reboot durations to exceed the hb-lost-threshold timer for proper reboot detection. When the reboot completes before this threshold, as observed in this scenario, the primary FortiGate cannot detect the reboot completion and remains in SENT-IMAGE status. |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.