FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
hgarara
Staff
Staff
Article Id 247119
Description

This article explains SoftIrqs, what causes them to increase in frequency or show high variations, and some ways to check for them in FortiGate.

Scope FortiGate.
Solution

A SoftIrq is a software interrupt. It occurs when traffic reaches the CPU but is not accelerated to the NPU.

 

A SoftIrq can also be invoked by a special instruction of read or write data to a hardware device (hard-disk). Software interrupts are also crucial when real-time capability is required (such as in industrial applications).

 

It is possible to check for SoftIrqs in FortiGate and monitor increases by using the following command in the FortiGate CLI (example output is shown below):

 

dia sys mpstat

 

By default, this command will continuously fetch data after every 5-second interval until Ctrl+C is pressed to stop it.

 

dia sys mpstat 3 5

 

This command will fetch the same data as the command above but with a 3-second interval up to 5 times. Customize these parameters as desired:

 

get sys performance status

CPU states: 0% user 0% system 0% nice 67% idle 0% iowait 0% irq 33% softirq

CPU0 states: 0% user 0% system 0% nice 55% idle 0% iowait 0% irq 45% softirq

CPU1 states: 0% user 0% system 0% nice 19% idle 0% iowait 0% irq 81% softirq

CPU2 states: 1% user 0% system 0% nice 32% idle 0% iowait 0% irq 67% softirq

CPU3 states: 0% user 0% system 0% nice 66% idle 0% iowait 0% irq 34% softirq

Memory: 1911192k total, 1002652k used (52.5%), 645292k free (33.8%), 263248k freeable (13.8%)

Average network usage: 4266268 / 4275456 kbps in 1 minute, 4145133 / 4155622 kbps in 10 minutes, 4091696 / 4101178 kbps in 30 minutes

Maximal network usage: 4539464 / 4547537 kbps in 1 minute, 4895169 / 4908443 kbps in 10 minutes, 4895169 / 4908443 kbps in 30 minutes

Average sessions: 291687 sessions in 1 minute, 293226 sessions in 10 minutes, 293696 sessions in 30 minutes

Maximal sessions: 292629 sessions in 1 minute, 298552 sessions in 10 minutes, 307791 sessions in 30 minutes

Average session setup rate: 2776 sessions per second in last 1 minute, 2749 sessions per second in last 10 minutes, 2742 sessions per second in last 30 minutes

Maximal session setup rate: 2893 sessions per second in last 1 minute, 3100 sessions per second in last 10 minutes, 3309 sessions per second in last 30 minutes

Average NPU sessions: 35 sessions in last 1 minute, 36 sessions in last 10 minutes, 36 sessions in last 30 minutes

Maximal NPU sessions: 37 sessions in last 1 minute, 43 sessions in last 10 minutes, 49 sessions in last 30 minutes

Average nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes

Maximal nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes

Virus caught: 0 total in 1 minute

IPS attacks blocked: 0 total in 1 minute

Uptime: 16 days,  17 hours,  47 minutes

 

Possible reasons for SoftIrq increments:

Check network traffic. This behavior might be caused by network loops such as layer2 loop/s, broadcast storms, unwanted packets, large quantities of ARP requests, or loops on the hardware if there are multiple switches connected to the relevant ports. STP breaking after an upgrade could be one of the main factors behind layer 2 loops.

 

It can also happen due to user traffic not being offloaded to hardware, it may be because offloading is disabled at the Firewall Policy level, of because the traffic is traversing a non-NPU interface. The example shown above will have most of the sessions going through the CPU ('average sessions') and not through the NPU ('average NPU sessions'). This can be also confirmed by looking at the dashboard’s 'Sessions' widget.

 

Device identification (Device Detection) on interfaces is another contributor to softirqs.

 

While observing high CPU usage with 'get system performance status', it is possible to see if SoftIrq levels are stable or increasing by executing the command repeatedly.

 

Troubleshooting steps:

  • Check for interface drops using 'diag hardware deviceinfo nic (interface name)' and search for 'Host TX dropped'. Check if it is increasing periodically by executing the command multiple times.

Example output:

 

============ Counters ===========
Rx_CRC_Errors :0
Rx_Frame_Too_Longs:0
rx_undersize :0
Rx Pkts :64880428536
Rx Bytes :29923981233538
Tx Pkts :82496472350
Tx Bytes :42412599845273
rx_rate :0
tx_rate :0
nr_ctr_reset :0
Host Rx Pkts :64867748559
Host Rx Bytes :28413202957398
Host Tx Pkts :88100655721
Host Tx Bytes :48030145695805
Host Tx dropped :1316
FragTxCreate :0
FragTxOk :0
FragTxDrop :0

 

  • Capture the packets for this behavior to determine what is causing it. Try to run a general sniffer (with no filters) and search for unwanted/suspicious traffic related to specific ports, ipv6 traffic, flood or any other typical traffic mentioned above as a possible reason.

    diagnose sniffer packet any "!port 22 and !port 443" 4 0 1   <----- General sniffer with no Filter excluding SSH and HTTPs (443 as default) admin access.

  • Check for reverse path verification failures using diagnose debug flow with filters corresponding to unwanted/suspicious traffic identified in the captures; if it is consistently increasing, it could indicate traffic dropped in CPU by SoftIrq.

 

id=20085 trace_id=1107 func=ip_route_input_slow line=1704 msg="reverse path check fail, drop"

 

  • One potentially useful test option is to disable interfaces one by one at a time, such as LAN, WAN, and DMZ to see if disabling any one interface resolves the issue.
  • Device-identification on interfaces requires the kernel to copy packets and establish a kernel-userspace memory mapping to share the packets for inspection, which significantly contributes to softirqs. To address this, disable device identification on interfaces via the FortiGate GUI: Network -> Interfaces -> Under Each interface, disable 'Device Detection'

 

If sessions are not being offloaded, consider checking FortiGate's session list for possible reasons traffic is not offloading:

diagnose sys session list no_ofld_reason field - FortiGate documentation.

 

In certain scenarios, Layer 2 loops or switch issues can cause traffic to be looped and forwarded to the firewall on the physical interface with untagged packet information. This can potentially lead to CPU core spikes on the firewall.

 

Run 'get system performance output' to verify if the CPU is going high & in case of the broadcast/L2 loop coming from the switches, the softirq will go high. The output should look like below.

 

get sys perf status
CPU states: 0% user 0% system 0% nice 55% idle 0% iowait 0% irq 45% softirq
CPU0 states: 1% user 0% system 0% nice 57% idle 0% iowait 0% irq 42% softirq
CPU1 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU2 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU3 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU4 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU5 states: 0% user 0% system 0% nice 15% idle 0% iowait 0% irq 85% softirq
CPU6 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU7 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU8 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU9 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU10 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU11 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU12 states: 2% user 0% system 0% nice 98% idle 0% iowait 0% irq 0% softirq
CPU13 states: 6% user 0% system 0% nice 94% idle 0% iowait 0% irq 0% softirq
CPU14 states: 1% user 0% system 0% nice 99% idle 0% iowait 0% irq 0% softirq
CPU15 states: 1% user 0% system 0% nice 99% idle 0% iowait 0% irq 0% softirq

 

If seeing softirq going high up to 100%, even for half of the core CPUs, understand that the packets are getting looped between the firewall & the switches.

 

To investigate this issue further, run 'dia netlink interface packet-rate' in the CLI, to see if receiving a high number of packets at the firewall interface, run this command 4-5 times an intervals of 2-3 seconds & verify the number of packets being received (TX-rate) at the firewall interface.

 

diagnose netlink interface packet-rate
Interface   RX-rate(per second) TX-rate(per second)
port1              600                             496504920
port_ha          25                                47
ha                  26                                 38

 

Collect the below sniffer output to identify what types of packets are coming to the firewall interface. For example, it could be ICMP, esp, or any other TCP/UDP packets.

 

SSH1:


diagnose sniffer packet (interface name) '' 6 2000 l 

 

In the case of ESP packets:

 

SSH2:


diagnose sniffer packet any 'esp' 6 2000 l

 

Sniffer output will capture 2000 packets; it is possible to tweak the packet size but be careful while running the sniffer in a CPU device.

 

Check on the switch side to know why they are forwarding a high number of packets to the firewall and ask them to rate-limit the packets at the Switch end or check if they are sending untagged/ legitimate traffic.

 

This is how untagged packets will look like in sniffer output, the tagged packets will have VLAN information.

 

2024-08-27 19:14:20.969882 port1-- x.x.x.x -> y.y.y.y: ESP(spi=0xdba59c0a,seq=0x61a)
2024-08-27 19:14:20.973437 port1-- x.x.x.x -> y.y.y.y: ESP(spi=0xdba59c0a,seq=0x61a)

 

In the case of ESP traffic, it will show the same seq numbers repeating for ESP multiple times.

 

Sniffer output will give us an idea, of whether it is a firewall or switch that is creating a loop. In case, find an issue with the FortiGate creating the loop, reach out to TAC to share all the given log output.

 

At the firewall end, it is possible to configure an Access Control List (ACL) on the physical interface to block if it is untagged traffic or if it is not legitimate.

 

Below is an example, of how ESP is blocked and IKE untagged VLAN ID packets received on port1 physical interface.

 

config firewall acl
    edit 1
        set interface port1
        set srcaddr "all"
        set dstaddr "all"
        set service "IKE" "ESP"
    next
end

 

Related article:
Troubleshooting Tip: FortiGate session table information.