FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
hgarara
Staff
Staff
Article Id 247119
Description

This article explains SoftIrqs, what causes them to increase in frequency or show high variations, and some ways to check for them in FortiGate.

Scope All supported versions of FortiGate.
Solution

A SoftIrq is a software interrupt. It occurs when traffic reaches the CPU but is not accelerated to the NPU.

 

A SoftIrq can also be invoked by a special instruction of read or write data to a hardware device (hard-disk). Software interrupts are also crucial when real-time capability is required (such as in industrial applications).

 

It is possible to check for SoftIrqs in FortiGate and monitor increases by using the following command in the FortiGate CLI (example output is shown below):

 

dia sys mpstat

 

By default, this command will continuously fetch data after every 5 second interval until Ctrl+C is pressed to stop it.

 

dia sys mpstat 3 5

 

This command will fetch the same data as the command above, but with a 3 second interval up to 5 times. Customize these parameters as desired.

 

get sys performance status

CPU states: 0% user 0% system 0% nice 67% idle 0% iowait 0% irq 33% softirq

CPU0 states: 0% user 0% system 0% nice 55% idle 0% iowait 0% irq 45% softirq

CPU1 states: 0% user 0% system 0% nice 19% idle 0% iowait 0% irq 81% softirq

CPU2 states: 1% user 0% system 0% nice 32% idle 0% iowait 0% irq 67% softirq

CPU3 states: 0% user 0% system 0% nice 66% idle 0% iowait 0% irq 34% softirq

Memory: 1911192k total, 1002652k used (52.5%), 645292k free (33.8%), 263248k freeable (13.8%)

Average network usage: 4266268 / 4275456 kbps in 1 minute, 4145133 / 4155622 kbps in 10 minutes, 4091696 / 4101178 kbps in 30 minutes

Maximal network usage: 4539464 / 4547537 kbps in 1 minute, 4895169 / 4908443 kbps in 10 minutes, 4895169 / 4908443 kbps in 30 minutes

Average sessions: 291687 sessions in 1 minute, 293226 sessions in 10 minutes, 293696 sessions in 30 minutes

Maximal sessions: 292629 sessions in 1 minute, 298552 sessions in 10 minutes, 307791 sessions in 30 minutes

Average session setup rate: 2776 sessions per second in last 1 minute, 2749 sessions per second in last 10 minutes, 2742 sessions per second in last 30 minutes

Maximal session setup rate: 2893 sessions per second in last 1 minute, 3100 sessions per second in last 10 minutes, 3309 sessions per second in last 30 minutes

Average NPU sessions: 35 sessions in last 1 minute, 36 sessions in last 10 minutes, 36 sessions in last 30 minutes

Maximal NPU sessions: 37 sessions in last 1 minute, 43 sessions in last 10 minutes, 49 sessions in last 30 minutes

Average nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes

Maximal nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes

Virus caught: 0 total in 1 minute

IPS attacks blocked: 0 total in 1 minute

Uptime: 16 days,  17 hours,  47 minutes

 

Possible reasons for SoftIrq increments:

Check network traffic. This behavior might be caused by network loops such as layer2 loop/s, broadcast storms, unwanted packets, large quantities of ARP requests, or loops on the hardware if there are multiple switches connected to the relevant ports. STP breaking after an upgrade could be one of the main factors behind layer 2 loops.

 

All of the reasons mentioned above cause traffic to not be offloaded successfully from CPU to NPU, which raises SoftIrq frequency. The example shown above will have most of the sessions going through the CPU ('average sessions') and not through the NPU ('average NPU sessions'). This can be also confirmed by looking at the dashboard’s 'Sessions' widget.

 

While observing high CPU usage with 'get system performance stat', it is possible to see if SoftIrq levels are stable or increasing by executing the command repeatedly.

 

Troubleshooting steps:

  • Check for interface drops using 'diag hardware deviceinfo nic (interface name)' and search for 'Host TX dropped'. Check if it is increasing periodically by executing the command multiple times.

    Example output:

 

============ Counters ===========
Rx_CRC_Errors :0
Rx_Frame_Too_Longs:0
rx_undersize :0
Rx Pkts :64880428536
Rx Bytes :29923981233538
Tx Pkts :82496472350
Tx Bytes :42412599845273
rx_rate :0
tx_rate :0
nr_ctr_reset :0
Host Rx Pkts :64867748559
Host Rx Bytes :28413202957398
Host Tx Pkts :88100655721
Host Tx Bytes :48030145695805
Host Tx dropped :1316
FragTxCreate :0
FragTxOk :0
FragTxDrop :0

 

  • Capture the packets for this behavior to determine what is causing it. Try to run a general sniffer (with no filters) and search for unwanted/suspicious traffic related to specific ports, ipv6 traffic, flood or any other typical traffic mentioned above as a possible reason.
  • Check for reverse path verification failures using diagnose debug flow with filters corresponding to unwanted/suspicious traffic identified in the captures, if it is consistently increasing - it could indicate traffic dropped in CPU by SoftIrq.

 

id=20085 trace_id=1107 func=ip_route_input_slow line=1704 msg="reverse path check fail, drop"

 

  • One potentially useful test option is to disable interfaces one by one at a time, such as LAN, WAN, and DMZ to see if disabling any one interface resolves the issue.

 

If sessions are not being offloaded, consider checking FortiGate's session list for possible reasons traffic is not offloading:

diagnose sys session list no_ofld_reason field - FortiGate documentation.

 

Related article:
Troubleshooting Tip: FortiGate session table information.