Technical Tip: How to gather information and fix high CPU and Mem utilization conditions

mdeparisse_FTNT · ‎04-19-2019

Description

This article describes how to check high CPU usage and how to fix it.

Scope

FortiAnalyzer, FortiManager.

Solution

Double-check the hardware resources.
Check that the system sizing matches the network log requirements for FortiAnalyzer (for example on FortiAnalyzer KVM on v7.4.2).
Refer to the product's datasheet for hardware sizing. (example on FortiManager KVM on v7.4.2).
Refer to the product datasheet if you are using an HW FortiManager/FortiAnalyzer.
If a VM is being used, adjust the CPU and RAM allowance of the VM.

FortiManager sizing:

Get the number of managed devices using the following command:

diag dvm device list

Read more about the minimum system requirements in the documentation.

FortiAnalyzer sizing:

Retrieve the number of logs received per second using the following command:

diag fortilog log

diagnose fortilogd lograte (on version 7.0. and higher)

Once the HW is adjusted, a deeper analysis of how the system is behaving can be seen using the following command:

get system performance

Figure 1

exe top

Figure 2

The load average section represents the average 'load' over 1, 5, and 15 minutes. 'Load' is a measure of the amount of computational work a system performs.

A system is considered to be loaded when the CPU is above 90% for 1, 5, and 15 minutes, and also when the system appears to run slow.

The us value in Figure 2 is the time the CPU spends executing processes in userspace. Similarly, the sy value is the time spent on running kernelspace processes.

exe iotop

Figure 3

diag debug sysinfo

Figure 4

diag debug crashlog read

Check that the system is not looping into a crash that may create a high load on the CPU. Identify this kind of crash condition by running the above command and checking the time stamp.

Depending on the process listed as a high CPU user, try to restart it using the following command:

diag test application <module> 99

This is often useful to restart an OFTPD or SQLOGD daemon. For example:

diagnose test application oftpd 99; diagnose test application sqllogd 99

If the module cannot be accessed with the diagnose test application, use the kill command:

diagnose sys process list

Once the process responsible for hitting the CPU cycles is identified, kill the process using the following command:

diagnose test application sqllogd 99

(This may kill the postgress daemon as well as restart sqllogd.)

Or:

diagnose test application logfiled 99

Or:

diagnose test application oftpd 99

If the process is not one of the above, contact the support team for further analysis.

Note:

A specific PID may be killed by running the exe top command using the 'k' key. (Be aware that some processes should not be killed with the 'k' key in exe top, as this may result in system instability).

Monitor the system behavior using the exe top and get system performance commands to check if the system is now behaving as normal.

If the steps above do not remedy the issue, provide the following information to support:

exe tac report

exe top <- Let it run for 10 sec.

exe iotop <- Let it run for 10 sec.

diag debug sysinfo

get sys perf <- Run it multiple time.

diag hardware info

diag debug klog

Check what times of day the CPU usage is high and try to correlate these times with the network load. This can offer some hints for the troubleshooting phase.

If possible, run the attached tera term script and let it run until the CPU usage is identified.

Related articles:

Technical Tip: How to improve FortiAnalyzer performances when FortiSIEM module is not needed

Technical Tip: Continuous Debug Monitoring with Bash and Crontab - PART I

Technical Tip: FortiManager/FortiAnalyzer monitoring script

Troubleshooting Tip: DMworker high CPU troubleshooting under FortiManager

Technical Tip: Customizing CLI command 'execute top' to add column to its display output

Troubleshooting Tip: FortiAnalyzer performance is slow due capacity limits for physical appliances