Troubleshooting Tip: FortiSwitch high memory usage troubleshooting guide

Rudresh_Veerappaji · ‎07-08-2024

Description

This article describes how high memory usage on a FortiSwitch could be triggered due to a few reasons and how to analyze high memory usage on a FortiSwitch, identify potential causes that could trigger this, and how to remediate it. High memory usage could cause FortiSwitch CLI/GUI access slowness or inaccessibility, potential packet drops, etc.

Scope

FortiSwitch.

Solution

To troubleshoot high memory usage, start by gathering the basic memory-related outputs using CLI commands from the FortiSwitch, analyze which process or processes are using the most memory, any crashes seen in crash logs that could correspond to the issue, traffic patterns, frequent port flaps, STP issues, etc and go over possible reasons triggering the issue.

Here are the suggested steps to analyze high memory usage issue, with example outputs of CLI commands and what to look for in the outputs:

Step 1: Gather the current memory usage status:

Start by checking the overall memory usage on the FortiSwitch with the below commands. Repeat the command a few times (x5) to check for any pattern (whether the Memory/CPU usage is consistently increasing or spikes periodically).

FortiSwitch-1# get sys performance status

CPU states: 7% user 10% system 0% nice 83% idle

Memory states: 65% used <----- Shows the current overall memory usage.

Uptime: 77 days, 5 hours, 28 minutes

With the below two commands, observe the amount of free memory & cached memory values after running it a few times (5x).

If it is consistently increasing, it could indicate possible memory leaks, a process hogging memory, or the device could be overloaded due to reaching the max throughput of the switch.

Refer to the location here: FortiSwitches datasheets per model for the max throughput numbers of each of the FortiSwitch models.

FortiSwitch-1# fn top

Mem: 220584K used, 18888K free, 7160K shrd, 844K buff, 51676K cached <-----

CPU: 16% usr 16% sys 0% nic 66% idle 0% io 0% irq 0% sirq

Load average: 2.07 2.20 2.12 2/100 1402

FortiSwitch-1# get hardware memory
MemTotal: 239472 kB
MemFree: 18196 kB <-----
Buffers: 844 kB
Cached: 51688 kB <-----
SwapCached: 0 kB
Active: 54816 kB
Inactive: 43976 kB
Active(anon): 47644 kB
Inactive(anon): 5776 kB
Active(file): 7172 kB
Inactive(file): 38200 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB <-----
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 46260 kB
Mapped: 22340 kB
Shmem: 7160 kB
Slab: 23872 kB
SReclaimable: 16732 kB
SUnreclaim: 7140 kB
KernelStack: 792 kB
PageTables: 2784 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 119736 kB
Committed_AS: 805720 kB
VmallocTotal: 1048372 kB
VmallocUsed: 3060 kB
VmallocChunk: 1019336 kB

Step 2: Identify the process/processes using the most memory:

Review the processes currently active in the FortiSwitch, and their memory usage. Use option M with the command below to sort the list by memory usage, and review the top processes using the most memory.

In this example, the process 'pyfcgid' uses the most memory. Review if the process is related to a certain configuration or traffic load and if any changes can be made to remediate.

For example, if SNMPD is listed as the top process, review the SNMP configuration both on the switch, as well as the frequency/type of SNMP queries from the server, and tune it if it is too aggressive.

FortiSwitch-1# diag sys top (and press M)

Run Time: 77 days, 3 hours and 59 minutes

7U, 3S, 90I; 233T, 73F

pyfcgid 1006 S N 0.0 13.7 <----- Shows process pyfcgid using 13.7% of allocated memory, but not of concern since the overall memory usage in this example is still lower.

cmdbsvr 892 S 0.0 5.7

cu_swtpd 1055 S 0.0 5.2

httpsd 1007 S N 0.0 5.1

initXXXXXXXXXXX 1 S 0.0 4.6

ipconflictd 1050 S 0.0 4.3

newcli 1009 S < 0.0 4.3

sshd 1029 S 0.0 4.2

authd 1053 S 0.0 4.2

statsd 1021 S 0.0 4.2

eap_proxy 1054 S 0.0 4.2

forticron 1011 S 0.0 4.1

stpd 1047 S 1.7 4.1

Note:

The above output's column descriptions are process name, Process ID, Process state, CPU usage %, and Memory usage %.

The last column is for memory, so by typing M, it is sorted for memory to find which process is consuming the most memory.

Step 3: Check the crash logs for any process crashes or port flaps:

The command 'diag debug crashlog read' logs any process-related issues/terminations on the FortiSwitch. Review these logs to check for any process crashes that could indicate memory leaks or other issues.

FortiSwitch-1# diag debug crashlog read

80: 2024-06-03 17:51:05 Out of memory: kill process 1818 (httpsd) score 1490 or a child
81: 2024-06-03 17:51:05 Killed process 1718 (httpsd)

82: 2024-06-03 17:51:05 Out of memory: kill process 1831 (initXXXXXXXXXXX) score 948 or a child

83: 2024-06-03 17:51:05 Killed process 1732 (smit)
84: 2024-06-03 17:51:05 the killed daemon is /bin/getty: status=0x0
85: 1970-01-03 20:05:14 the killed daemon is /bin/getty: status=0x0

Step 4: Analyze the traffic pattern:

A sudden increase in traffic load (close to or beyond max throughput in the specification for that FortiSwitch model) could trigger higher memory/resource usage. Compare the expected traffic load on a typical day in the network, with the actual traffic rate on the FortiSwitch at the time of high memory usage.

Run the below command a few times and review the current traffic rate per port, as well as the total traffic rate through all the ports in the last row. Look for any ports sending/receiving more than expected traffic rates.

For example, if a throughput of 500Mbps is expected approximately over the switch in the network, but if 3x or 4x the amount of traffic say 2Gbps on this FortiSwitch, look if ports are sending the most amount of traffic to check the origin of this traffic. Broadcast storms, STP loops, incorrect cabling, etc could potentially cause such traffic spikes.

FortiSwitch-1# diagnose switch physical-ports linerate
Rate Display Mode: LINE_RATE
Port | TX Packets | TX Rate || RX Packets | RX Rate |
-----------------------------------------------------------------------------------------------
port1 | 1594425 | 0.0018 Mbps || 18964280 | 0.0000 Mbps |
port2 | 1565814 | 0.0029 Mbps || 822325 | 0.0011 Mbps |
port3 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port4 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port5 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port6 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port14 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port35 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port36 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port37 | 20556159 | 78.0012 Mbps || 2553175 | 82.0204 Mbps | <----- Look for any ports like this that are sending/receiving unusually large traffic rates compared to other ports and overall traffic on the FortiSwitch.
port38 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
port52 | 0 | 0.0000 Mbps || 0 | 0.0000 Mbps |
internal | 4201343 | 0.0136 Mbps || 5544004 | 0.0164 Mbps |
-----------------------------------------------------------------------------------------------
| 120.0195 Mbps || | 115.0379 Mbps | <----- Total throughput.

Step 5: Check for packet drops, CRC errors, oversizes, undersize, collisions etc on ports:

Check for any anomaly in traffic through the FortiSwitch ports by reviewing if any of these counters are consistently increasing with high values on any of the ports: errors, drops, oversizes, undersizes, collisions, and unknowns.

Large and frequent increments of these counters could indicate an underlying issue in the network like as misconfigurations, malicious traffic, loops, etc. Use the grep M option to filter for just the counters and repeat the command a few times (x5) to observe the rate of increase of these counters, as illustrated below.

Note that typically about 1% (of total packets) of increment collectivity for these counters could be expected in a network.

FortiSwitch-1# diagnose switch physical-ports list

Port(port1) is HW Admin up, SW Admin up, line protocol is up
Interface Type is Serial Gigabit Media Independent Interface(SGMII/SerDes)
Address is E8:1C:BA:D2:FC:C5, None loopback
MTU 9216 bytes, Encapsulation IEEE 802.3/Ethernet-II
full-duplex, 1000 Mb/s, link type is auto
input : 5289783578 bytes, 18964686 packets, 0 errors, 8892 drops, 0 oversizes
15423 unicasts, 2930663 multicasts, 16018600 broadcasts, 0 unknowns
output : 1075184645 bytes, 1595212 packets, 0 errors, 0 drops, 0 oversizes
0 unicasts, 896556 multicasts, 698656 broadcasts
0 fragments, 0 undersizes, 0 collisions, 0 jabbers

Port(port2) is HW Admin up, SW Admin up, line protocol is up
Interface Type is Serial Gigabit Media Independent Interface(SGMII/SerDes)
Address is E8:1C:BA:D2:FC:C6, None loopback
MTU 9216 bytes, Encapsulation IEEE 802.3/Ethernet-II
full-duplex, 1000 Mb/s, link type is auto
input : 107888109 bytes, 822721 packets, 0 errors, 8 drops, 0 oversizes
4 unicasts, 822717 multicasts, 0 broadcasts, 0 unknowns
output : 977499855 bytes, 1566570 packets, 0 errors, 0 drops, 0 oversizes
49 unicasts, 903491 multicasts, 663030 broadcasts
0 fragments, 0 undersizes, 0 collisions, 0 jabbers

Check if any of the anomaly counters like errors, drops, oversizes, undersizes, collisions, or jabbers are incrementing over time, by using grep as shown below.

Run the below command multiple times to see the rate of increase. If it is increasing rapidly, it indicates a possible issue at that port.

FortiSwitch-1# diagnose switch physical-ports list | grep -E errors|port
Port(port1) is HW Admin up, SW Admin up, line protocol is up
input : 5289819279 bytes, 18964963 packets, 0 errors, 8892 drops, 0 oversizes
output : 1075461752 bytes, 1595746 packets, 0 errors, 0 drops, 0 oversizes
Port(port2) is HW Admin up, SW Admin up, line protocol is up
input : 107923072 bytes, 822992 packets, 0 errors, 8 drops, 0 oversizes
output : 977750575 bytes, 1567085 packets, 0 errors, 0 drops, 0 oversizes
Port(port3) is HW Admin up, SW Admin up, line protocol is down
input : 0 bytes, 0 packets, 0 errors, 0 drops, 0 oversizes
output : 0 bytes, 0 packets, 0 errors, 0 drops, 0 oversizes

If these counters are increasing but it is not clear what is causing the issue, packet captures with SPAN or ERSPAN can be used to get a sample of the traffic on the ports with anomaly counter increments and analyze these packets using Wireshark for more insights. More details here: Packet mirroring on FortiSwitch.

Note:

Alternatively, it is possible to use 'diag switch physical-ports port-stats list', which gives a few more additional counters that would be useful to identify any anomaly with traffic patterns.

FortiSwitch-1# diag switch physical-ports port-stats list port1

port1 Port Stats:

Rx Bytes: 5298983012
Rx Packets: 19036079
Rx Unicasts: 15423
Rx NUnicasts: 19020656
Rx Multicasts: 3002056
Rx Broadcasts: 16018600
Rx Discards: 8892
Rx Pauses: 0
Rx 64 Octets Packets: 7328789
Rx 65-127 Octets Packets: 1657866
Rx 128-255 Octets Packets: 1106032
Rx 256-511 Octets Packets: 5063607
Rx 512-1023 Octets Packets: 3874169
Rx 1024-1518 OctetsPackets: 5616
Rx 1519-Max Octets Packets: 0

Tx Bytes: 1147039948
Tx Packets: 1733188
Tx Unicasts: 0
Tx NUnicasts: 1733188
Tx Multicasts: 990908
Tx Broadcasts: 742280
Tx Discards: 0
Tx Pauses: 0
Tx 64 Octets Packets: 0
Tx 65-127 Octets Packets: 556528
Tx 128-255 Octets Packets: 434380
Tx 256-511 Octets Packets: 0
Tx 512-1023 Octets Packets: 0
Tx 1024-1518 Octets Packets: 742280
Tx 1519-Max Octets Packets: 0

Fragments: 0
Undersize: 0
Oversize: 0
Jabbers: 0
Collisions: 0
CRC Alignment Errors: 0

Step 6: Check the pdu-counters list:

Review the Fortiswitch PDU counters for any frequent increments of various protocol packets outside of the expected range, as well as if any PDU counters are incrementing for a protocol/feature not enabled or needed on the FortiSwitch.

FortiSwitch-1# diagnose switch pdu-counters list
primary CPU counters:
packet receive error : 0
Non-zero port counters:
port1:
STP packet : 794208
LLDP packet : 41078
port2:
STP packet : 782002
LLDP packet : 41077
port5:
unknown packet type : 10
port37:
LACP packet : 39159
STP packet : 1108801
LLDP packet : 410900
unknown packet type : 821790

Step 7: Check for port flaps or STP flaps:

Frequent port flaps due to faulty SFP or cabling issues could trigger certain protocols like STP to spend multiple cycles to reconverge (if Flap-guard is not configured). Check for any frequent port flaps in the switch logs using the command below:

FortiSwitch-1# exec log display

date=2024-01-25 time=06:19:49 log_id=0105008255 type=event subtype=spanning_tree pri=notice vd=root user="stp_daemon" action="state-change" unit="primary" switch.physical-port="4EXYZXYZXYZ-0" instanceid="0" event="state migration" oldstate="forwarding" newstate="discarding" status="None" msg="primary port 4EXYZXYZXYZ-0 instance 0 changed state from forwarding to discarding"

date=2024-01-25 time=06:19:49 log_id=0105008255 type=event subtype=spanning_tree pri=notice vd=root user="stp_daemon" action="state-change" unit="primary" switch.physical-port="4EXYZXYZXYZ-0" instanceid="15" event="state migration" oldstate="learning" newstate="discarding" status="None" msg="primary port 4EXYZXYZXYZ-0 instance 0 changed state from forwarding to discarding"

date=2024-01-25 time=06:19:49 eventtime=1706192389389669819 tz="-0800" logid="0114032695" type="event" subtype="switch-controller" level="information" vd="root" logdesc="FortiSwitch link" user="Fortilink" sn="SR12XYZXYZXYZ-1" name="Fortiswitch-112D-2" msg="primary switch port port5 has gone down"

date=2024-01-25 time=06:19:49 eventtime=1706192389385657368 tz="-0800" logid="0114032695" type="event" subtype="switch-controller" level="information" vd="root" logdesc="FortiSwitch link" user="Fortilink" sn="SR12XYZXYZXYZ-2" name="Fortiswitch-112D-2" msg="primary switch port port2 has gone down"

Frequent port flaps could for example trigger TCNs (Topology Change Notifications) in STP, and cause reconvergences repeatedly. Look for TCN counters in the STP commands to analyze how frequently this could be happening.

In the example below, it is possible to see the TCNs received frequently, but it is necessary to find who is sending/triggering those TCNs, trace this in the connected peer FortiSwitches to find the source of port flaps.

FortiSwitch-1# diagnose stp instance list

MST Instance Information, primary-Channel:

Instance ID 0 (CST)

Config Priority 24576

Bridge MAC abcesaeasa, MD5 Digest ashah31hw7a8a8s7a7sa

Root MAC ancshsajejae, Priority 20480, Path Cost 0, Remaining Hops 19

Regional Root MAC ancshsajejae, Priority 20480, Path Cost 1, Root Port _FlInK1_MLAG0_

Active Times Forward Time 15, Max Age 20, Remaining Hops 19

TCN Events Triggered 1034 (0d 3h 35m 20s ago), Received 31024 (0d 0h 0m 1s ago) <----- TCNs received counter is incrementing faster and more frequently than TCNs triggered (locally on the switch), indicating the port flaps are likely happening on another switch in the topology.

Step 8: Review any recent configuration changes on the FortiSwitch:

From the above steps, if any specific protocol or configuration is suspected to be triggering the high memory usage, the FortiSwitch configuration revision list feature (which is enabled by default) can be used to review any new configuration changes recently done by following the steps in this document to obtain the config diffs with previous revisions/config versions:

Review recent configuration changes on the FortiSwitch

Step 9: Detailed analysis of memory usage:

If the cause of high memory usage is still not identified, below additional CLI commands can be used to collect info and contact the Fortinet support team for further troubleshooting.

FortiSwitch-1# fn top

FortiSwitch-1# fn ps -wl

FortiSwitch-1# fn ps -lw

FortiSwitch-1# diag hardware sysinfo memory

FortiSwitch-1# diag hardware sysinfo slab

FortiSwitch-1# diag debug report

FortiSwitch-1# diag debug crashlog read

FortiSwitch-1# exec log filter view-lines 500

FortiSwitch-1# exec log display

Step 10: Possible triggers:

Here are a few common triggers of high memory usage on a FortiSwitch for reference:

Note:

Check if the firmware version on the FortiSwitch is too old or End of support, and upgrade to the latest interim release if it is a feasible option.

Excessive SNMP querying: This could cause high CPU/memory usage, check if the SNMP process is among the list of top memory-using processes. Tune down the frequency of SNMP queries as a large amount of queries too frequently could cause high resource usage on a FortiSwitch.
Frequent port flaps: This could trigger STP TCNs and reconvergence, and excessive port flaps too frequently could use more resources in the FortiSwitch. Review the exec log display outputs, and STP outputs to look for any STP triggered issues. Consider configuring a flap-guard to protect against frequent port flaps causing STP reconvergence. Refer to Step 7 in the previous section for more details.
Power supply issues: Some of the FortiSwitches have dual power supplies, and if either one of them has frequent power supply issues it could affect system performance. Review the logs to see if the power supply connection is flapping.

1969-12-31 18:01:47 log_id=0103036104 type=event subtype=system pri=warning vd=root msg="external PS not connected"

1969-12-31 18:01:47 log_id=0103036101 type=event subtype=system pri=warning vd=root msg="internal PS changes to bad state"

High temperature & Fan issues: This could trigger high CPU/memory usage and cause performance issues, check the fan status.

FortiSwitch-1# diag sys pcb temp

Module Status
___________________________________
Sensor1 61.0 C <----- Abnormally high temp reading.
Sensor2 52.4 C <----- Abnormally high temp reading.

FortiSwitch-1# diag sys fan status

Module Status

___________________________________

Fan alarmed Last status(50.2 %) <----- Fan in alarm status.

Throughput exceeding beyond device specification: Check the performance specifications of the FortiSwitch (here), and confirm if it is running close to the spec for the specific FortiSwitch, in which case high resource usage would be expected. Each FortiSwitch would have different performance specs, so ensure the FortiSwitch is running within the specs (here is the location for all the FortiSwitch performance specs per model). Refer to Step 1 in the previous section for more details.

Restarting process causing high memory usage: If high memory usage corresponds to any specific process, 'diagnose sys kill 11 <process-id>' can be used to terminate and restart the process, use this with caution (could affect services) and under the guidance of a Fortinet support engineer.

The above list is not exhaustive, and only an initial list of possible triggers to review during troubleshooting. Contact Fortinet support with the outputs of commands listed in Step 9 for further troubleshooting.

Related Documents:

Configuring FortiSwitch port mirroring

Review recent configuration changes on the FortiSwitch

Troubleshooting Tip: FortiSwitch high memory usage troubleshooting guide

You are leaving our website