On FortiGate devices, IPSec traffic can generally be offloaded in hardware, through dedicated IPSec sub-engines.
However, under certain conditions, especially in Ethernet-over-MPLS networks, a huge amount of Layer2 padding traffic can lead to the IPSec subengine stuck on the NP6 platform.
IPSec sub-engine stuck can cause:
- The packet drops when a new IPsec session reaches the problematic sub-engine.
- In cases of IPsec, sessions are offloaded to NP6 already, while SA re-keys occur if the new SA offloading hits the problematic IPsec sub-engine, the new SA cannot be offloaded, and the originally offloaded sessions will flush to kernel.
- Tunnel instability.
- BGP neighborship flaps in 'BGP per Overlay' scenarios.
If this issue happens, several side effects could be observed, such as:
- A relevant percentage of non-fully offloaded tunnels. As an example:
device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=00" --> 00 = not offloaded
52 device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=01" --> 01 = only egress traffic is offloaded 1270 device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=02" --> 02 = only ingress traffic is offloaded 899 device (VDOM1) # diagnose vpn tunnel list | grep -c "npu_flag=03" --> 03 = fully offloaded
2415
- As a consequence of the first point, several ESP packets go to kernel, which can be seen with:
device (VDOM1) # diagnose sniffer packet any "esp" 4 0 a
And could also cause softirq CPU spikes:
device (global) # diag sys mpstat 2 2 Gathering data, wait 2 sec, press any key to quit. ..0..1 TIME CPU %usr %nice %sys %iowait %irq %soft %steal %idle 10:24:56 AM all 1.25 0.00 1.57 0.00 0.02 10.21 0.00 86.95 0 0.00 0.00 1.00 0.00 0.00 0.00 0.00 99.00 ... 72 0.00 0.00 4.00 0.00 0.00 35.50 0.00 60.50 73 0.50 0.00 1.00 0.00 0.00 20.00 0.00 78.50 ...
Issue Identification.
The status of IPSec sub-engines can be seen with the command 'fnsysctl cat /proc/net/np6_x/ipsec-stats' where 'x' is the number of the NP6 processor to be verified. It is possible to check the number of NP6 processors in the FortiGate unit with 'diagnose npu np6 port-list'.
Example for a 3600E:
device (global) # diagnose npu np6 port-list Chip XAUI Ports Max Cross-chip Speed offloading -------------------- ---- ------ ------- ---------- NP#0-5 0-3 port1 25000M Yes --> 3600E has 6 NP6, going from np6_0 to np6_5. NP#0-5 0-3 port2 25000M Yes NP#0-5 0-3 port3 25000M Yes NP#0-5 0-3 port4 25000M Yes NP#0-5 0-3 port5 25000M Yes NP#0-5 0-3 port6 25000M Yes NP#0-5 0-3 port7 25000M Yes NP#0-5 0-3 port8 25000M Yes NP#0-5 0-3 port9 25000M Yes NP#0-5 0-3 port10 25000M Yes NP#0-5 0-3 port11 25000M Yes NP#0-5 0-3 port12 25000M Yes NP#0-5 0-3 port13 25000M Yes NP#0-5 0-3 port14 25000M Yes NP#0-5 0-3 port15 25000M Yes NP#0-5 0-3 port16 25000M Yes NP#0-5 0-3 port17 25000M Yes NP#0-5 0-3 port18 25000M Yes NP#0-5 0-3 port19 25000M Yes NP#0-5 0-3 port20 25000M Yes NP#0-5 0-3 port21 25000M Yes NP#0-5 0-3 port22 25000M Yes NP#0-5 0-3 port23 25000M Yes NP#0-5 0-3 port24 25000M Yes NP#0-5 0-3 port25 25000M Yes NP#0-5 0-3 port26 25000M Yes NP#0-5 0-3 port27 25000M Yes NP#0-5 0-3 port28 25000M Yes NP#0-5 0-3 port29 25000M Yes NP#0-5 0-3 port30 25000M Yes NP#0-5 0-3 port31 100000M Yes NP#0-5 0-3 port32 100000M Yes NP#0-5 0-3 port33 100000M Yes NP#0-5 0-3 port34 100000M Yes NP#0-5 0-3 port35 100000M Yes NP#0-5 0-3 port36 100000M Yes
To check if the sub-engine stuck state is present, run 'fnsysctl cat /proc/net/np6_x/ipsec-stats' multiple times each 5 seconds, for each NP6 processor.
If, for 'x' engine:
- nr_busy counters are not zero and increasing.
- Idle Status never goes to ff (ff means that all sub-engine modules are idle).
It is very likely that engine 'x' is stuck.
Below is a real example:
device (global) # fnsysctl cat /proc/net/np6_2/ipsec-stats Counters IB0 IB1 OB0 OB1 --------------- --------------- --------------- --------------- --------------- active-SA 0 702 518 716 timeout 1 0 1 0 invalid_idx 0 0 0 0 tbl_full 0 0 0 0 nr_flush 0 0 0 0 nr_busy 1793994801 0 3702759723 0 nr_cache_off 0 0 0 0 cache_disabled 0 0 0 0 nr_write_check_err0 0 0 0 nr_retry_fail 0 0 0 0 --------------- --------------- --------------- --------------- --------------- IPSec0-eng-mask(enc/dec) = 0xff/0xff IPSec1-eng_mask(enc/dec) = 0xff/0xff Idle status of IPSec Engine 0:0 --> never going to 0:ff Idle status of IPSec Engine 1:fe --> never going to 1:ff
Issue Fix.
Such a state can be fixed and prevented by applying the following setting:
config system npu
set strip-esp-padding enable
set strip-clear-text-padding enable
end
This will instruct the NP6 processor to strip clear text padding and ESP padding before sending the packets to the IPsec engine.
Note: This setting requires a device reboot to be effective.
Related article:
Troubleshooting Tip: Inbound IPsec traffic dropped due to layer 2 padding
|