FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
ftrapani
Staff
Staff
Article Id 324204
Description This article describes how to correlate high CPU usage with the number of IP fragments crossing the network.
Scope FortiGate NP6, NP6xlite, NP6lite.
Solution

Fragmented packets cannot be accelerated on NP6 processors. A huge amount of fragments could thus have an impact on CPU usage. This article is supposed to help in:

 

  • Understand if CPU usage can be related to fragmented packets
  • Capture fragmented packets with 'diagnose sniffer packet' command

 

  1. CPU usage.

 

A huge amount of fragmented packets per second can lead to an over-increment of softirq usage percentage.

 

Check real-time CPU usage by running the following command:

 

diagnose sys mpstat <delay> <loops>                   

 

This command shows information about CPU usage every <delay> seconds and for the specified number of loops <loops>.

See Technical Tip: Deprecated of command '# diagnose sys top-summary'.

 

In this example, taken from a 3600E (88 cores), CPU 74 shows a high softirq percentage:

 

diag sys mpstat 5 10
Gathering data, wait 5 sec, press any key to quit.
..0..1..2..3..4
TIME CPU %usr %nice %sys %iowait %irq %soft %steal %idle
09:32:46 AM all 0.10 0.00 0.77 0.00 0.33 2.62 0.00 96.18
0 0.00 0.00 1.20 0.00 0.00 0.00 0.00 98.80
1 0.00 0.00 2.40 0.00 0.00 0.00 0.00 97.60
2 0.00 0.00 1.20 0.00 0.00 0.20 0.00 98.60
3 0.00 0.00 1.20 0.00 0.00 0.00 0.00 98.80
4 0.00 0.00 0.80 0.00 0.00 0.00 0.00 99.20
5 0.00 0.00 0.80 0.00 0.20 0.00 0.00 99.00
6 0.00 0.00 0.80 0.00 0.00 0.20 0.00 99.00
7 0.00 0.00 0.80 0.00 0.00 0.20 0.00 99.00
8 0.00 0.00 0.80 0.00 0.00 0.00 0.00 99.20
9 0.00 0.00 0.80 0.00 0.00 0.00 0.00 99.20
10 0.00 0.00 0.80 0.00 0.20 0.20 0.00 98.80
.....
74 0.00 0.00 0.20 0.00 0.00 73.00 0.00 26.80
...
87 0.20 0.00 1.20 0.00 0.60 2.60 0.00 95.40

 

CPU profiling:

To understand what is causing the most number of soft interrupts, run a CPU profiling on the affected core(s):

 

diag sys profile cpumask <CPU_id> <----- Where CPU_id is the number of the affected core.
diag sys profile start <----- Leave it running for 2-3 minutes.
diag sys profile stop
diag sys profile show order<----- To show the output of collected profiling.

 

For further details, check here:
In this example, CPU74 is profiled. It is easy to find that fragmentation/defragmentation functions ('inet_frag_find' and 'ip_defrag') are called very often:

 

diag sys profile cpumask 74

diag sys profile start
diag sys profile stop

diag sys profile show order
0xffffffff80208b08: 3436 poll_idle+0x18/0x30
0xffffffff8052bab4: 530 __copy_skb_header+0x214/0x4c0
0xffffffff80264ea4: 463 __do_softirq+0x64/0x150
....
0xffffffff8059e998: 289 inet_frag_find+0xd8/0x280
0xffffffff802ced1c: 257 kfree+0x6c/0xc0
0xffffffff8054e2cc: 251 sch_direct_xmit+0x5c/0x1c0
0xffffffffa003bd85: 220 esp_decrypt_finish+0x265/0x820
0xffffffff802647d4: 214 local_bh_enable_ip+0x24/0xa0
0xffffffff80395cf0: 202 __memcpy+0x0/0x120
0xffffffffa0181109: 196 find_vlan_dev_by_vid+0xc9/0x260
0xffffffff80562194: 192 ip_defrag+0x3e4/0xce0
0xffffffff8052e0a0: 185 skb_release_data+0x30/0xe0
...

 

Number of fragmented packets:

To confirm the assumption (from 7.0.6 and on), it is possible to run 'diagnose snmp ip frags rate' to display the ratio of fragmented packets per second. If the ratio is high, there is likely a correlation between softirq usage and the number of fragments. See Technical Tip: How to calculate fragmented packets per second hitting a FortiGate for further details.

 

Output example:

 

diag snmp ip frags rate
ReasmReqds = 158663/s <----- Reassembling requests.
FragCreates = 158574/s <----- Created fragments.

 

In this example, the ratio is very high, and could likely mean CPU overusage.

 

  1. Capture fragmented packets.

To identify the source(s) of fragmented traffic, a specific sniffer filter can be used, namely:

 

diagnose sniffer packet any '((ip[6:2] > 0) and (not ip[6] = 64))' <level> <packet count limit> <ts format>

 

Output example:

 

diag sniffer packet any '((ip[6:2] > 0) and (not ip[6] = 64) and host 10.86.255.4)' 4 1000 l
interfaces=[any]
filters=[((ip[6:2] > 0) and (not ip[6] = 64) and host 10.86.255.4)]
2024-01-26 17:26:22.365116 Underlay_1 in 10.86.255.4 -> 10.86.0.4:ESP(spi=0xc5971903,seq=0xedd15a2) (frag 5538:1480@0+)

2024-01-26 17:26:22.365132 Underlay_1 in 10.86.255.4 -> 10.86.0.4: ESP(spi=0xc5971903,seq=0xedd15a3) (frag 5539:1480@0+)

 

Further details about advanced sniffing filters can be found in Troubleshooting Tip: Filter 'diagnose sniffer packet' to collect fragmented packets only. Once the fragmented traffic source(s) has been identified and tuned, CPU usage should come back to expected values.

 

Note:

On NP7 platforms, fragments can be offloaded (see the documentation).