FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
ftrapani
Staff
Staff
Article Id 336377
Description This article describes an explanation and a fix if IPSEC Engine stuck issues due to a specific traffic pattern on NP6 platforms.
Scope NP6 devices.
Solution

On FortiGate devices, IPSec traffic can generally be offloaded in hardware, through dedicated IPSec sub-engines.

However, under certain conditions, especially in Ethernet-over-MPLS networks, a huge amount of Layer2 padding traffic can lead to the IPSec subengine stuck on the NP6 platform.

 

IPSec sub-engine stuck can cause:

 

  • The packet drops when a new IPsec session reaches the problematic sub-engine.
  • In cases of IPsec, sessions are offloaded to NP6 already, while SA re-keys occur if the new SA offloading hits the problematic IPsec sub-engine, the new SA cannot be offloaded, and the originally offloaded sessions will flush to kernel.
  • Tunnel instability.
  • BGP neighborship flaps in 'BGP per Overlay' scenarios.

If this issue happens, several side effects could be observed, such as:

 

  • A relevant percentage of non-fully offloaded tunnels. As an example:

device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=00"    --> 00 = not offloaded

52
device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=01"    --> 01 = only egress traffic is offloaded
1270
device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=02"    --> 02 = only ingress traffic is offloaded
899
device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=03"    --> 03 = fully offloaded

2415

 

  • As a consequence of the first point, several  ESP packets go to kernel, which can be seen with:

device (VDOM1) # diagnose sniffer packet any "esp" 4 0 a


And could also cause softirq CPU spikes:

 

device (global) # diag sys mpstat 2 2
Gathering data, wait 2 sec, press any key to quit.
..0..1
TIME CPU %usr %nice %sys %iowait %irq %soft %steal %idle
10:24:56 AM all 1.25 0.00 1.57 0.00 0.02 10.21 0.00 86.95
0 0.00 0.00 1.00 0.00 0.00 0.00 0.00 99.00
...
72 0.00 0.00 4.00 0.00 0.00 35.50 0.00 60.50
73 0.50 0.00 1.00 0.00 0.00 20.00 0.00 78.50
...

 

Issue Identification.

 

The status of IPSec sub-engines can be seen with the command 'fnsysctl cat /proc/net/np6_x/ipsec-stats' where 'x' is the number of the NP6 processor to be verified. It is possible to check the number of NP6 processors in the FortiGate unit with 'diagnose npu np6 port-list'.

 

Example for a 3600E:

 

device (global) # diagnose npu np6 port-list
Chip XAUI Ports Max Cross-chip
Speed offloading
-------------------- ---- ------ ------- ----------
NP#0-5 0-3 port1 25000M Yes --> 3600E has 6 NP6, going from np6_0 to np6_5.
NP#0-5 0-3 port2 25000M Yes
NP#0-5 0-3 port3 25000M Yes
NP#0-5 0-3 port4 25000M Yes
NP#0-5 0-3 port5 25000M Yes
NP#0-5 0-3 port6 25000M Yes
NP#0-5 0-3 port7 25000M Yes
NP#0-5 0-3 port8 25000M Yes
NP#0-5 0-3 port9 25000M Yes
NP#0-5 0-3 port10 25000M Yes
NP#0-5 0-3 port11 25000M Yes
NP#0-5 0-3 port12 25000M Yes
NP#0-5 0-3 port13 25000M Yes
NP#0-5 0-3 port14 25000M Yes
NP#0-5 0-3 port15 25000M Yes
NP#0-5 0-3 port16 25000M Yes
NP#0-5 0-3 port17 25000M Yes
NP#0-5 0-3 port18 25000M Yes
NP#0-5 0-3 port19 25000M Yes
NP#0-5 0-3 port20 25000M Yes
NP#0-5 0-3 port21 25000M Yes
NP#0-5 0-3 port22 25000M Yes
NP#0-5 0-3 port23 25000M Yes
NP#0-5 0-3 port24 25000M Yes
NP#0-5 0-3 port25 25000M Yes
NP#0-5 0-3 port26 25000M Yes
NP#0-5 0-3 port27 25000M Yes
NP#0-5 0-3 port28 25000M Yes
NP#0-5 0-3 port29 25000M Yes
NP#0-5 0-3 port30 25000M Yes
NP#0-5 0-3 port31 100000M Yes
NP#0-5 0-3 port32 100000M Yes
NP#0-5 0-3 port33 100000M Yes
NP#0-5 0-3 port34 100000M Yes
NP#0-5 0-3 port35 100000M Yes
NP#0-5 0-3 port36 100000M Yes

 

To check if the sub-engine stuck state is present, run 'fnsysctl cat /proc/net/np6_x/ipsec-stats' multiple times each 5 seconds, for each NP6 processor.

If, for 'x' engine:

  1. nr_busy counters are not zero and increasing.
  2. Idle Status never goes to ff (ff means that all sub-engine modules are idle).

It is very likely that engine 'x' is stuck.

 

Below is a real example:

 

device (global) # fnsysctl cat /proc/net/np6_2/ipsec-stats
Counters IB0 IB1 OB0 OB1
--------------- --------------- --------------- --------------- ---------------
active-SA 0 702 518 716
timeout 1 0 1 0
invalid_idx 0 0 0 0
tbl_full 0 0 0 0
nr_flush 0 0 0 0
nr_busy 1793994801 0 3702759723 0
nr_cache_off 0 0 0 0
cache_disabled 0 0 0 0
nr_write_check_err0 0 0 0
nr_retry_fail 0 0 0 0
--------------- --------------- --------------- --------------- ---------------
IPSec0-eng-mask(enc/dec) = 0xff/0xff
IPSec1-eng_mask(enc/dec) = 0xff/0xff
Idle status of IPSec Engine 0:0   --> never going to 0:ff
Idle status of IPSec Engine 1:fe   --> never going to 1:ff

 

Issue Fix.

Such a state can be fixed and prevented by applying the following setting:

 

config system npu

    set strip-esp-padding enable

    set strip-clear-text-padding enable

end

 

This will instruct the NP6 processor to strip clear text padding and ESP padding before sending the packets to the IPsec engine.

 

Note: This setting requires a device reboot to be effective.

 

Related article: 

Troubleshooting Tip: Inbound IPsec traffic dropped due to layer 2 padding