FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
GeorgeZhong
Staff
Staff
Article Id 402009
Description This article describes a scenario where the RADIUS and other authentication suddenly stop working in the secondary unit in the FortiGate HA cluster. This issue may randomly happen in v7.4.
Scope FortiGate v7.4+.
Solution

In the HA A-P cluster, the primary unit is handling most of the local and remote authentication activities, such as Radius, LDAP, and so on, by its 'fnbamd' daemon.

 

But in some scenarios, the secondary unit still needs to handle the remote authentication. For example, when the administrator logs into the secondary unit with the Radius admin account, the 'fnbamd' daemon on the secondary unit will handle this Radius authentication, relay the Radius request to the primary, and the primary unit will initiate the Radius request traffic to the Radius server on UDP 1812.

In this scenario, In such cases, the 'fnbamd' daemon runs on both units. The authentication process can be observed by executing the following debug commands on both units:

 

diagnose debug application fnbamd -1

diagnose debug enable

 

RADIUS packets can be captured on the primary unit with this command:

 

diagnose sniffer packet any 'port 1812' 4 0 l

 

For complete packet captures:

 

diagnose sniffer packet any 'port 1812' 6 0 l

 

Occasionally, remote authentication (e.g., RADIUS) may fail on the secondary unit even with valid credentials. During troubleshooting, the following behaviours may be observed:

  1. The 'fnbamd' debug commands above do not show ANY output on the secondary unit.
  2. No corresponding Radius request packets were captured on the primary unit corresponding to the secondary unit’s authentication attempt.
  3. 'fnbamd' daemon is stuck in 'R' state and consumes 99.9% percent of the user CPU (not total CPU) on secondary unit as shown below:

 

diagnose sys top


Run Time: 31 days, 4 hours and 24 minutes
8U, 0N, 0S, 92I, 0WA, 0HI, 0SI, 0ST; 15918T, 9505F
fnbamd 888 R 99.9 0.1 4
hasync 155 S < 0.1 0.1 0
hatalk 292 S < 0.1 0.1 0
proxyd 305 S 0.1 0.0 10
ipshelper 15708 S < 0.0 1.9 2
ipsengine 15709 S < 0.0 0.8 2

 

This indicates the 'fnbamd' daemon is fully stuck. In this scenario, the only workaround is to kill/restart the 'fnbamd' daemon. Steps are as below:

  1. Check 'fnbamd' process ID:

 

diagnose sys process pidof fnbamd

 

  1. Kill/restart the process ID by sending signal 11 (SIGSEGV):

 

diagnose sys kill 11 <pid>

 

  1. Verify the process has restarted (the PID should change):

 

diagnose sys process pidof fnbamd

 

After the process restarts, remote authentication on the secondary unit should resume. A process stack backtrace will be recorded in the crash log due to signal 11. This can be verified with:

 

diagnose debug crashlog read

 

This issue has been acknowledged by development under the engineering ticket ID 1163152. Its patch will be included in v7.4.9 Build 2811, v7.6.4 Build 3580, and v8.0.0 Build 0041.