Created on 07-08-2024 10:10 PM Edited on 07-11-2024 02:01 PM
Description |
This article describes how high memory usage on a FortiSwitch could be triggered due to a few reasons and how to analyze high memory usage on a FortiSwitch, identify potential causes that could trigger this, and how to remediate it. High memory usage could cause FortiSwitch CLI/GUI access slowness or inaccessibility, potential packet drops, etc. |
Scope | FortiSwitch. |
Solution |
To troubleshoot high memory usage, start by gathering the basic memory-related outputs using CLI commands from the FortiSwitch, analyze which process or processes are using the most memory, any crashes seen in crash logs that could correspond to the issue, traffic patterns, frequent port flaps, STP issues, etc and go over possible reasons triggering the issue.
Here are the suggested steps to analyze high memory usage issue, with example outputs of CLI commands and what to look for in the outputs:
Step 1: Gather the current memory usage status: Start by checking the overall memory usage on the FortiSwitch with the below commands. Repeat the command a few times (x5) to check for any pattern (whether the Memory/CPU usage is consistently increasing or spikes periodically).
FortiSwitch-1# get sys performance status CPU states: 7% user 10% system 0% nice 83% idle Memory states: 65% used <----- Shows the current overall memory usage. Uptime: 77 days, 5 hours, 28 minutes
With the below two commands, observe the amount of free memory & cached memory values after running it a few times (5x). If it is consistently increasing, it could indicate possible memory leaks, a process hogging memory, or the device could be overloaded due to reaching the max throughput of the switch. Refer to the location here: FortiSwitches datasheets per model for the max throughput numbers of each of the FortiSwitch models.
FortiSwitch-1# fn top Mem: 220584K used, 18888K free, 7160K shrd, 844K buff, 51676K cached <----- CPU: 16% usr 16% sys 0% nic 66% idle 0% io 0% irq 0% sirq Load average: 2.07 2.20 2.12 2/100 1402 <snippet>
FortiSwitch-1# get hardware memory
Step 2: Identify the process/processes using the most memory: Review the processes currently active in the FortiSwitch, and their memory usage. Use option M with the command below to sort the list by memory usage, and review the top processes using the most memory.
In this example, the process 'pyfcgid' uses the most memory. Review if the process is related to a certain configuration or traffic load and if any changes can be made to remediate. For example, if SNMPD is listed as the top process, review the SNMP configuration both on the switch, as well as the frequency/type of SNMP queries from the server, and tune it if it is too aggressive.
FortiSwitch-1# diag sys top (and press M) Run Time: 77 days, 3 hours and 59 minutes 7U, 3S, 90I; 233T, 73F pyfcgid 1006 S N 0.0 13.7 <----- Shows process pyfcgid using 13.7% of allocated memory, but not of concern since the overall memory usage in this example is still lower. cmdbsvr 892 S 0.0 5.7 cu_swtpd 1055 S 0.0 5.2 httpsd 1007 S N 0.0 5.1 initXXXXXXXXXXX 1 S 0.0 4.6 ipconflictd 1050 S 0.0 4.3 newcli 1009 S < 0.0 4.3 sshd 1029 S 0.0 4.2 authd 1053 S 0.0 4.2 statsd 1021 S 0.0 4.2 eap_proxy 1054 S 0.0 4.2 forticron 1011 S 0.0 4.1 stpd 1047 S 1.7 4.1 <snippet>
Note: The above output's column descriptions are process name, Process ID, Process state, CPU usage %, and Memory usage %. The last column is for memory, so by typing M, it is sorted for memory to find which process is consuming the most memory.
Step 3: Check the crash logs for any process crashes or port flaps: The command 'diag debug crashlog read' logs any process-related issues/terminations on the FortiSwitch. Review these logs to check for any process crashes that could indicate memory leaks or other issues.
FortiSwitch-1# diag debug crashlog read <snippet> 80: 2024-06-03 17:51:05 Out of memory: kill process 1818 (httpsd) score 1490 or a child 82: 2024-06-03 17:51:05 Out of memory: kill process 1831 (initXXXXXXXXXXX) score 948 or a child 83: 2024-06-03 17:51:05 Killed process 1732 (smit)
Step 4: Analyze the traffic pattern: A sudden increase in traffic load (close to or beyond max throughput in the specification for that FortiSwitch model) could trigger higher memory/resource usage. Compare the expected traffic load on a typical day in the network, with the actual traffic rate on the FortiSwitch at the time of high memory usage.
Run the below command a few times and review the current traffic rate per port, as well as the total traffic rate through all the ports in the last row. Look for any ports sending/receiving more than expected traffic rates.
For example, if a throughput of 500Mbps is expected approximately over the switch in the network, but if 3x or 4x the amount of traffic say 2Gbps on this FortiSwitch, look if ports are sending the most amount of traffic to check the origin of this traffic. Broadcast storms, STP loops, incorrect cabling, etc could potentially cause such traffic spikes.
FortiSwitch-1# diagnose switch physical-ports linerate
Step 5: Check for packet drops, CRC errors, oversizes, undersize, collisions etc on ports: Check for any anomaly in traffic through the FortiSwitch ports by reviewing if any of these counters are consistently increasing with high values on any of the ports: errors, drops, oversizes, undersizes, collisions, and unknowns.
Large and frequent increments of these counters could indicate an underlying issue in the network like as misconfigurations, malicious traffic, loops, etc. Use the grep M option to filter for just the counters and repeat the command a few times (x5) to observe the rate of increase of these counters, as illustrated below. Note that typically about 1% (of total packets) of increment collectivity for these counters could be expected in a network.
FortiSwitch-1# diagnose switch physical-ports list Port(port1) is HW Admin up, SW Admin up, line protocol is up
Check if any of the anomaly counters like errors, drops, oversizes, undersizes, collisions, or jabbers are incrementing over time, by using grep as shown below. Run the below command multiple times to see the rate of increase. If it is increasing rapidly, it indicates a possible issue at that port.
If these counters are increasing but it is not clear what is causing the issue, packet captures with SPAN or ERSPAN can be used to get a sample of the traffic on the ports with anomaly counter increments and analyze these packets using Wireshark for more insights. More details here: Packet mirroring on FortiSwitch.
Note: Alternatively, it is possible to use 'diag switch physical-ports port-stats list', which gives a few more additional counters that would be useful to identify any anomaly with traffic patterns.
FortiSwitch-1# diag switch physical-ports port-stats list port1 port1 Port Stats: Rx Bytes: 5298983012 Tx Bytes: 1147039948 Fragments: 0
Step 6: Check the pdu-counters list: Review the Fortiswitch PDU counters for any frequent increments of various protocol packets outside of the expected range, as well as if any PDU counters are incrementing for a protocol/feature not enabled or needed on the FortiSwitch.
FortiSwitch-1# diagnose switch pdu-counters list
Step 7: Check for port flaps or STP flaps: Frequent port flaps due to faulty SFP or cabling issues could trigger certain protocols like STP to spend multiple cycles to reconverge (if Flap-guard is not configured). Check for any frequent port flaps in the switch logs using the command below:
FortiSwitch-1# exec log display date=2024-01-25 time=06:19:49 log_id=0105008255 type=event subtype=spanning_tree pri=notice vd=root user="stp_daemon" action="state-change" unit="primary" switch.physical-port="4EXYZXYZXYZ-0" instanceid="0" event="state migration" oldstate="forwarding" newstate="discarding" status="None" msg="primary port 4EXYZXYZXYZ-0 instance 0 changed state from forwarding to discarding" date=2024-01-25 time=06:19:49 log_id=0105008255 type=event subtype=spanning_tree pri=notice vd=root user="stp_daemon" action="state-change" unit="primary" switch.physical-port="4EXYZXYZXYZ-0" instanceid="15" event="state migration" oldstate="learning" newstate="discarding" status="None" msg="primary port 4EXYZXYZXYZ-0 instance 0 changed state from forwarding to discarding" date=2024-01-25 time=06:19:49 eventtime=1706192389389669819 tz="-0800" logid="0114032695" type="event" subtype="switch-controller" level="information" vd="root" logdesc="FortiSwitch link" user="Fortilink" sn="SR12XYZXYZXYZ-1" name="Fortiswitch-112D-2" msg="primary switch port port5 has gone down" date=2024-01-25 time=06:19:49 eventtime=1706192389385657368 tz="-0800" logid="0114032695" type="event" subtype="switch-controller" level="information" vd="root" logdesc="FortiSwitch link" user="Fortilink" sn="SR12XYZXYZXYZ-2" name="Fortiswitch-112D-2" msg="primary switch port port2 has gone down"
Frequent port flaps could for example trigger TCNs (Topology Change Notifications) in STP, and cause reconvergences repeatedly. Look for TCN counters in the STP commands to analyze how frequently this could be happening. In the example below, it is possible to see the TCNs received frequently, but it is necessary to find who is sending/triggering those TCNs, trace this in the connected peer FortiSwitches to find the source of port flaps.
FortiSwitch-1# diagnose stp instance list MST Instance Information, primary-Channel: Instance ID 0 (CST) Config Priority 24576 Bridge MAC abcesaeasa, MD5 Digest ashah31hw7a8a8s7a7sa Root MAC ancshsajejae, Priority 20480, Path Cost 0, Remaining Hops 19 Regional Root MAC ancshsajejae, Priority 20480, Path Cost 1, Root Port _FlInK1_MLAG0_ Active Times Forward Time 15, Max Age 20, Remaining Hops 19 TCN Events Triggered 1034 (0d 3h 35m 20s ago), Received 31024 (0d 0h 0m 1s ago) <----- TCNs received counter is incrementing faster and more frequently than TCNs triggered (locally on the switch), indicating the port flaps are likely happening on another switch in the topology.
Step 8: Review any recent configuration changes on the FortiSwitch: From the above steps, if any specific protocol or configuration is suspected to be triggering the high memory usage, the FortiSwitch configuration revision list feature (which is enabled by default) can be used to review any new configuration changes recently done by following the steps in this document to obtain the config diffs with previous revisions/config versions: Review recent configuration changes on the FortiSwitch
Step 9: Detailed analysis of memory usage: If the cause of high memory usage is still not identified, below additional CLI commands can be used to collect info and contact the Fortinet support team for further troubleshooting.
FortiSwitch-1# fn top FortiSwitch-1# fn ps -wl FortiSwitch-1# fn ps -lw FortiSwitch-1# diag hardware sysinfo memory FortiSwitch-1# diag hardware sysinfo slab FortiSwitch-1# diag debug report FortiSwitch-1# diag debug crashlog read FortiSwitch-1# exec log filter view-lines 500 FortiSwitch-1# exec log display
Step 10: Possible triggers: Here are a few common triggers of high memory usage on a FortiSwitch for reference:
Note: Check if the firmware version on the FortiSwitch is too old or End of support, and upgrade to the latest interim release if it is a feasible option.
1969-12-31 18:01:47 log_id=0103036104 type=event subtype=system pri=warning vd=root msg="external PS not connected" 1969-12-31 18:01:47 log_id=0103036101 type=event subtype=system pri=warning vd=root msg="internal PS changes to bad state"
FortiSwitch-1# diag sys pcb temp Module Status
FortiSwitch-1# diag sys fan status Module Status ___________________________________ Fan alarmed Last status(50.2 %) <----- Fan in alarm status.
The above list is not exhaustive, and only an initial list of possible triggers to review during troubleshooting. Contact Fortinet support with the outputs of commands listed in Step 9 for further troubleshooting.
Related Documents: |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.