FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
DiegoBernardelli
Article Id 320048
Description This article describes how to analyze high CPU usage on a FortiGate.
Scope FortiGate, FortiOS.
Solution

It is important to understand how CPU usage is measured:
CPU usage is a time-based measurement: it is the amount of time during which the CPU has not been IDLE over time and has been executing instructions.

 

The CPU can be mainly used by 3 distinct spaces:

  • user space -> Time spent on running application daemons.
  • system -> Time spent on executing Kernel instructions.
  • interrupts (irq/softirqs) -> Time spent on handling interrupts.


Starting an investigation for high CPU with 'diagnose system top' is not a good idea as the top command only shows the CPU usage of user space daemons, so it does not provide the whole picture of the actual CPU usage.

 

Better options to start investigating are 'get sys performance status' and 'diagnose sys mpstat <refresh seconds>'.
Both commands show the actual percentage usage for each space for each CPU.

 

get sys performance status
CPU states: 0% user 0% system 0% nice 99% idle 0% iowait 1% irq 0% softirq
CPU0 states: 0% user 0% system 0% nice 99% idle 0% iowait 1% irq 0% softirq
Memory: 2055760k total, 1093300k used (53.2%), 567356k free (27.6%), 395104k freeable (19.2%)
Average network usage: 243 / 23 kbps in 1 minute, 241 / 62 kbps in 10 minutes, 223 / 4 kbps in 30 minutes
Average sessions: 54 sessions in 1 minute, 48 sessions in 10 minutes, 44 sessions in 30 minutes
Average session setup rate: 0 sessions per second in last 1 minute, 0 sessions per second in last 10 minutes, 0 sessions per second in last 30 minutes
Virus caught: 0 total in 1 minute
IPS attacks blocked: 0 total in 1 minute
Uptime: 1 days, 0 hours, 55 minutes


diagnose sys mpstat 2
Gathering data, wait 2 sec, press any key to quit.
..0..1
TIME CPU %usr %nice %sys %iowait %irq %soft %steal %idle
11:42:42 AM all 0.00 0.00 0.00 0.00 0.50 0.00 0.00 99.50
0 0.00 0.00 0.00 0.00 0.50 0.00 0.00 99.50

 

Once the main type of usage has been identified, the focus can be moved to the specific use space.


User space:

At this point, 'diagnose system top' can be used to identify the application that is consuming all the CPU time and debug it with the specific application debug commands.

 

Additional information about a process (like the last CPU it ran, status, syscalls, memory usage, etc) can be gathered in the following underlying directories:

 

fnsysctl cat /proc/<pid>/status
fnsysctl cat /proc/<pid>/stat
fnsysctl cat /proc/<pid>/statm
fnsysctl cat /proc/<pid>/syscall

 

Refer to the Linux man page to understand these files.

 

System:

If high CPU usage is seen in system space, a CPU profiling can show what the most frequently called functions in the source code are, whether they are Linux kernel functions or Fortinet-specific system functions. This will give an idea of what is going on at the system level during a specific amount of time. After identifying which CPU is experiencing high system usage, run the following commands:


diag sys profile cpumask <ID of the CPU to analyze>

fnsysctl date

diag sys profile start

 

Wait approximately 10 to 30 seconds.

 

diag sys profile stop
diag sys profile show order

diagnose sys profile show detail
diagnose sys profile report

 

Show order will print something like this:

 

0xffffffff8025fb50: 454 vprintk_emit+0x190/0x4f0
0xffffffff80208b68: 345 poll_idle+0x18/0x30
0xffffffffa0009f73: 6 del_ip_session_expect_list+0x1283/0x1390
0xffffffff802bf1c8: 1 copy_pte_range+0x228/0x440
0xffffffff802e7558: 1 select_estimate_accuracy+0xe8/0x130
0xffffffff804f68e8: 1 __copy_skb_header+0x438/0x4e0
0xffffffff80602608: 1 __schedule+0x2b8/0x910
0xffffffffa0164feb: 1 np6_clean_rx+0xcb/0xe30
0xffffffffa080dc4b: 1 wcmod_uc_status_dump+0x311/0x156b

 

Note that poll_idle is the kernel function executed when no other function is being executed to measure the actual CPU usage, as stated at the beginning.

For further investigation, it will be necessary to involve Fortinet support.


IRQ / Interrupts:

Interrupts (hardware or software interrupts) are special functions triggered when certain events occur. They order the CPU to interrupt the current operation to execute the actual raised interrupt handler. This means the CPU is not waiting for the event to occur, but the interrupt sends the signal to the CPU upon being triggered and executes an interrupt of related code (a real-life analogy would be ringing a doorbell, so when the door rings the people inside stop what they are doing to check who is at the door).

 

On a firewall or network devices in general, the main source of interrupts is network packets received on interfaces.

 

High IRQ could mean an L2 issue such as L2 broadcast storms, a high number of non-offloaded sessions, or just a high session setup rate.

To investigate high softirq or irq, obtain useful information about session setup rate and offloaded vs non-offloaded sessions with 'get sys performance status'.

 

get sys performance status
CPU states: 1% user 2% system 0% nice 87% idle 0% iowait 0% irq 10% softirq
CPU0 states: 1% user 3% system 0% nice 96% idle 0% iowait 0% irq 0% softirq
..
CPU55 states: 0% user 1% system 0% nice 85% idle 0% iowait 0% irq 14% softirq
Memory: 264290500k total, 46488320k used (17.6%), 217336196k free (82.2%), 465984k freeable (0.2%)
Average network usage: 86273109 / 86082338 kbps in 1 minute, 87085803 / 87039054 kbps in 10 minutes, 86707706 / 86756164 kbps in 30 minutes
Maximal network usage: 102850930 / 102515500 kbps in 1 minute, 102850930 / 102515500 kbps in 10 minutes, 102850930 / 102515500 kbps in 30 minutes
Average sessions: 7238388 sessions in 1 minute, 6294477 sessions in 10 minutes, 6278432 sessions in 30 minutes
Maximal sessions: 7581448 sessions in 1 minute, 7581448 sessions in 10 minutes, 7871998 sessions in 30 minutes
Average session setup rate: 22146 sessions per second in last 1 minute, 24011 sessions per second in last 10 minutes, 23829 sessions per second in last 30 minutes
Maximal session setup rate: 24399 sessions per second in last 1 minute, 28758 sessions per second in last 10 minutes, 30005 sessions per second in last 30 minutes
Average NPU sessions: 2702855 sessions in last 1 minute, 3782150 sessions in last 10 minutes, 3875431 sessions in last 30 minutes
Maximal NPU sessions: 3057107 sessions in last 1 minute, 4233725 sessions in last 10 minutes, 4250696 sessions in last 30 minutes
Average nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Maximal nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Virus caught: 0 total in 1 minute
IPS attacks blocked: 0 total in 1 minute
Uptime: 112 days, 10 hours, 0 minutes

 

Additionally, use the command 'diagnose sys session stat':

 

diagnose sys session stat
misc info: session_count=170 setup_rate=7 exp_count=0 clash=0
memory_tension_drop=0 ephemeral=0/4259840 removeable=0 extreme_low_mem=0
npu_session_count=0
nturbo_session_count=0
delete=2, flush=0, dev_down=29/348
session walkers: active=0, vf-138, dev-29, saddr-0, npu-0, wildcard-0
TCP sessions:
8 in ESTABLISHED state
96 in SYN_SENT state
firewall error stat:
error1=00000000
error2=00000000
error3=00000000
error4=00000000
tt=00000000
cont=00000000
ips_recv=00000000
policy_deny=00007ac8
av_recv=00000000
fqdn_count=000001f7
fqdn6_count=00000000
global: ses_limit=0 ses6_limit=0 rt_limit=0 rt6_limit=0


These will offer an approximation of the actual network statistics and the network offloading.


Additionally, use a system profile (like as high system usage) to investigate offloading related system calls.

 

EG --> 0xffffffffa0164feb: 1 np6_clean_rx+0xcb/0xe30

 

To have an idea on how interrupts are distributed among CPUs, run the following:

 

diagnose hardware sysinfo interrupts

 

Or:

 

fnsysctl cat /proc/interrupts

 

fnsysctl cat /proc/interrupts
 CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
...
142: 3506701 0 0 0 0 0 0 0 PCI-MSI-edge np6_0-tx-rx0
143: 1 742138 0 0 0 0 0 0 PCI-MSI-edge np6_0-tx-rx1
144: 1 0 3850634 0 0 0 0 0 PCI-MSI-edge np6_0-tx-rx2
145: 1 0 0 3319842 0 0 0 0 PCI-MSI-edge np6_0-tx-rx3

...

 

Each line of the above output represents an interrupt with its unique ID a counter for each CPU so that the user can see on which CPU the interrupt is executed. Every interrupt is configured with an affinity to a CPU or a set of CPUs, this affinity can be reviewed with the below commands:

 

fnsysctl cat /proc/irq/<IRQ ID>/smp_affinity_list --> decimal

fnsysctl cat /proc/irq/<IRQ ID>/smp_affinity --> HEX mask

 

So, as per the previous output, it is possible to see that the configured affinity for interrupt NP6_0_tx-rx3 (ID:145, this interrupt is raised when NP6 0 is sending non-offloaded traffic to the kernel or when non-offloaded traffic processed by the kernel needs to be sent out through an interface) has an affinity with CPU 3 and it has been executed on CPU 3 3319842 times.

 

fnsysctl cat /proc/irq/145/smp_affinity
08

 

fnsysctl cat /proc/irq/145/smp_affinity_list
3