Created on
‎08-21-2024
11:37 AM
Edited on
‎09-03-2025
03:24 AM
By
Jean-Philippe_P
Description |
This article describes how to detect L2 Loops (aka broadcast storms, switch/Layer 2 loops, etc.) on FortiGate based on performance commands. |
Scope | All FortiGate models and versions. |
Solution |
There are lots of potential reasons for high softirq, such as too much traffic or offloading issues. This article considers L2 loops as a reason for high softirq.
For initial context, softirq CPU usage on the FortiGate is frequently associated with the receiving and processing of incoming packets to a FortiGate network interface. Packets arrive on the network interface and trigger a software interrupt to the CPU (aka a softirq) to signal that the packet has arrived and must be processed.
For hardware FortiGates, softirq usage tends to remain very low since the bulk of the traffic flow can be offloaded to the onboard Network Processor (NP) hardware. Traffic is only handled by the CPU if the inspection is actively taking place, or if the configuration does not allow for hardware offloading of any kind (for example, Software switches, lack of hardware-offloading capability, etc.). Notably, broadcast packets must be processed by the CPU, even if the traffic is not relevant to the receiving host.
With that in mind, broadcast storms/L2 loops are scenarios where broadcast packets are allowed to continuously circulate and accumulate through the network, rather than the packet being sent through the network once. This situation occurs when Layer 2 network switches are physically connected in such a way that a loop path is allowed to form, and it results in a rapidly-growing flood of traffic circulating through the network.
The flood of traffic caused by an L2 loop can quickly overwhelm the interfaces of any connected device, resulting in high softirq CPU usage and major impacts to the network that can render connected devices unusable. Legitimate user traffic will frequently be dropped or heavily delayed/degraded during this period.
One frequently observed symptom is a deceptively low number of sessions in the performance statistics relative to the amount of CPU usage. In the example below, there are only 200 active sessions, yet there is 90%+ softirq on multiple CPU cores:
FortiGate # get sys perf stat
One way to determine if a broadcast storm/L2 loop situation is occurring is based on performance statistics. Check the interface statistics on the FortiGate with the command fnsysctl ifconfig and look out for a large amount of Received (RX/incoming) bytes relative to Transmitted (TX/outgoing) bytes. This will include bogus (for example, non-useful/storm) L2 traffic, and in this example, the output shows roughly 5800GB of received data compared to 2GB of transmitted data.
FGT01# fnsysctl ifconfig Use the CLI command to get system performance status to check the uptime of the FortiGate/cluster, then divide the volume of data received by the FortiGate's uptime to estimate the rate of traffic hitting the interface per second. In this example, this would be roughly 85 MB/s (680Mbps sustained since the start), which is abnormal when accounting for only 200 active sessions (and when factoring in historical usage for the network environment).
Note that the interface statistics produced by fnsysctl ifconfig are accrued after the FortiGate has booted up. If the device was running for a long time, then the calculated number on its own might suggest that no loop is happening.
For a more accurate calculation, run this command once, wait several seconds (for example, 15 to 60 seconds), then run the command again. From there, calculate the difference in Received bytes between the two commands (i.e., the Delta of Received bytes) and then divide that by the number of seconds waited to determine how much data was received within the testing period.
Another method available when checking for network loops is to run a brief network capture without any filters:
The sniffer only needs to be run for a handful of packets (10 to 100 at most) to gather a useful sample. The sample will show if there is a high volume of broadcast traffic being received compared to useful unicast traffic, and this can be used to determine if a broadcast storm is occurring.
To resolve a broadcast storm, the network loop must be eliminated and subsequently prevented from occurring. Disconnecting network interfaces that are creating this looped path is the recommended immediate course of action, and utilizing protocols like Spanning Tree can be useful for preventing loops from forming in the first place.
If the switching is done via FortiSwitch, refer to the following article:
To help identify the origin of a network loop, FortiOS v7.6.0 introduces a new feature: Logging MAC Address Flapping Events. This feature is very useful because if the same MAC address is learned on different FortiGate interfaces, it will be logged, making the loop mitigation easier and faster.
Additionally, if the logs are sent to a monitoring tool or Syslog server, detecting and addressing them promptly will reduce the duration of severe outages caused by broadcast storms. Details about this feature are provided below: Logging MAC address flapping events
Related articles: Troubleshooting Tip: Check SoftIrq increments (recommended when experiencing high CPU usage) Troubleshooting Tip: How high CPU usage should be investigated |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2025 Fortinet, Inc. All Rights Reserved.