Troubleshooting Tip: How to do initial troubleshooting of high memory utilization issues (conserve mode)

bmeta · ‎08-23-2019

Description

This article describes general actions that could be taken and which information should be sent to Fortinet Support in case of unexpected entry of the unit into Conserve Mode, the unit is out of memory.

Scope

FortiGate.

Solution

Run the CLI command 'get system performance status', the output will look similar to the sample below:

get system performance status
CPU states: 1% user 0% system 0% nice 99% idle 0% iowait 0% irq 0% softirq
CPU0 states: 1% user 0% system 0% nice 99% idle 0% iowait 0% irq 0% softirq
Memory: 2004540k total, 586528k used (29%), 1418012k free (71%)
Average network usage: 1 / 0 kbps in 1 minute, 0 / 0 kbps in 10 minutes, 0 / 0 kbps in 30 minutes
Average sessions: 25 sessions in 1 minute, 25 sessions in 10 minutes, 25 sessions in 30 minutes
Average session setup rate: 0 sessions per second in last 1 minute, 0 sessions per second in last 10 minutes, 0 sessions per second in last 30 minutes
Virus caught: 0 total in 1 minute
IPS attacks blocked: 0 total in 1 minute
Uptime: 0 days, 23 hours, 41 minutes

Run the command above a few times and compare patterns of memory usage, throughput, and number of sessions.

Check total memory usage on the output.

Memory: 2004540k total, 586528k used (29%), 1418012k free (71%)

If the used memory is more than 75%, this may indicate that a further check may be required. The unit is either getting overloaded or there is a memory leak in some process/kernel or there is a lot of cached memory.

Check the amount of traffic and compare it to the datasheet (throughput section). If it is too close, the device is likely to be overloaded and there is a sizing issue. If the amount is vastly different between the last 1 minute and the last 30 minutes, this might indicate a traffic spike.

Average sessions: 25 sessions in 1 minute, 25 sessions in 10 minutes, 25 sessions in 30 minutes
Session table is stored in memory as well.

Higher number of sessions lead to higher memory usage.

FortiGate performance data sheet also defines the maximum number of sessions firewall can handle.

Run the CLI command 'diagnose sys top 1 45 199' to find memory usage per process instance.

'1' stands for refreshing period in seconds
'45' stands for a number of processes displayed. See part of it as example below:
'199' stands for the number of times the command is repeated before stopping

diag sys top 1 45 199 diag sys top 1 45 199

In order:

process name, Process ID, Process state, CPU usage %, and Memory usage %, the last column is the CPU core on which the process is running.
By default, processes are sorted by CPU usage (4th column). To sort processes by memory usage (5th column) to find out which process is consuming the most memory resources, press Shift + M.

Check % of memory usage to see if any process is constantly using an unreasonably high fraction of memory, which may be the process causing the issue.

Note 1:

Some processes can have multiple instances like 'miglogd' in the example above. WAD and IPSengine are also such processes.

In such cases, sum up the total memory usage for all instances, and it should not exceed -20 -25%, but it depends on the device and its total memory - for small devices with a small amount of memory, it might be normal. Security profiles like Web filtering and antivirus can increase memory usage.

diagnose sys top | grep miglogd <----- Use grep to group all instances with process name <miglogd>.

Note 2:

In rare cases, the output of the 'get system performance status' command can show that memory utilization is high (for example: more than 90%), but at the same time 'diagnose sys top' command does not indicate any processes which are using memory. This can indicate that memory is utilized by the kernel and/or being cached. For that, refer to the following two articles:

Technical Tip: High cached memory due to increasing file-sizes

Technical Tip: FortiGate out of memory due to memory cache on v7.0/v7.2.

To speed up troubleshooting, run the commands below to gather all the relevant logs needed:

get system status

get system performance status <----- Use this command three times leaving a time 1 minute between each execution.

diagnose sys top 2 40 <----- Let this command run for 1 minute, then stop it by pressing 'q' or add repeat value to the end of command to automate. For example, 'diagnose sys top 2 40 30'.

diagnose sys top-summary <----- Let this command run for 1 minute, then stop it by pressing 'q' - on FortiOS 6.4 this command does not exist.

diag sys top-mem <----- Run this command 4 - 5 times.
diagnose hard sysinfo memory
diagnose hard sysinfo slab
diagnose hard sysinfo shm

diagnose autoupdate versions
diagnose hard sysinfo conserve
diagnose sys session stat
diagnose debug crashlog read <----- It lists all instances with timestamp for conserve modes and crashes, if any.

And these commands for each VDOM, if configured:

get log disk setting
get log disk filter
get log memory setting
get log memory filter

Meanwhile, The following script can be used when FortiGate starts entering conserve mode and exits out of conserve mode once rebooted.

By default the maximum log size of an auto-script is 10MB. If the file size is reached the log is deleted and the script starts anew. It should be avoided to use commands that generate too many outputs such as 'execute tac report' or 'diag sys session list'.

config system auto-script

edit "performance"

set interval 60 <- Will run every minute.

set repeat 3600

set start manual

set script "

execute time

get system performance status

get system ha status

diagnose hardware sysinfo memory

diagnose sys session full-stat

diagnose sys top 1 20 1"

set output-size 20

To start the script:

execute auto-script start SCRIPT_NAME

To stop the script:

execute auto-script stop SCRIPT_NAME

To view results for the script:

execute auto-script result SCRIPT_NAME

This will help find the process responsible for the high CPU/high memory pushing FortiGate to conserve mode at the time of the incident.

If the process is still consuming an abnormally large amount of memory resources, consider opening a Technical Support ticket (https://support.fortinet.com) and attaching the output to the ticket along with configuration and debug.log files while contacting Fortinet TAC (https://support.fortinet.com/).

Check file systems, tmp, shm (shared memory), and cmdb (command database) directories looking for too many entries for specific processes or file names with large sizes:

fnsysctl df -k
fnsysctl du -i /tmp
fnsysctl du -a /tmp
fnsysctl du -i /dev/shm
fnsysctl du -a /dev/shm
fnsysctl du -i /dev/cmdb
fnsysctl du -i /dev/cmdb
fnsysctl ls -l /dev/shm

Analyze the collected information and Cli output and proceed to steps such as:

Executing commands to collect status, traffic, or sessions being held by specific daemon or process such as ips, wad, fgtlogd, scanunitd, among others. For a short list describing different FortiGate daemons, check the article Technical Tip: Short list of processes.
Restart the process suspect to be causing high memory usage. There are multiple ways of performing this step. Some daemons have the option to be restarted using the 'diagnose test app' command while the majority can be restarted using the kill command:

diag sys process pid <process id>

To obtain the process ID number:

diag sys process pidof <PPROCESS_NAME>

Here are reference articles about restarting Daemons including different methods:
Technical Tip: Find and restart/kill a process on a FortiGate by the process ID (PID) via pidof

Technical Tip: Restarting internal processes/daemons

Check if there is a specific daemon causing this issue and what commands can be used to diagnose or analyze further the problem. The following article link is an example of analyzing and troubleshooting issues related to WAD: Troubleshooting Tip: WAD troubleshooting commands.
Check FortiOS release notes for a possible match with the Known Issue list for that specific release: FortiOS 7.4.5 Release Notes Known Issues page.
If the issue was not captured on debugs and logs during a high memory usage incident or if the problem is intermittent, The recommendation is to run a Teraterm macro script to monitor memory usage over long intervals using the same commands or possibly additional commands in case of troubleshooting specific daemon or process causing the problem. The following article illustrates how to run such a script: Technical Tip: FortiGate monitoring script
The following articles provide TeraTerm monitoring scripts to collect debug cater for troubleshooting multiple types of issues
Technical Tip: TAC debug script with TeraTerm
Troubleshooting Tip: Using a PID process debugging Teraterm script

Note:
The 'crashlog' might correlate with SNMP monitoring to have set up for this FortiGate.
External monitoring and recording like SNMP can greatly help to trace when such issues might have started.
Use 'diag sys top as an alternative to #diag sys top-summary'.

Related articles:

Technical Tip: How conserve mode is triggered

Technical Tip: Basic Troubleshooting for high memory or high CPU usage

Technical Tip: Low-end FortiGate models with RAM ≤ 2GB entering conserve mode due to increased ISDB ...

Troubleshooting Tip: How to do initial troubleshooting of high memory utilization issues (conserve mode)

You are leaving our website