FortiAnalyzer
FortiAnalyzer can receive logs and Windows host events directly from endpoints connected to EMS, and you can use FortiAnalyzer to analyze the logs and run reports.
mdeparisse_FTNT
Article Id 197534

Description


This article describes how to check high CPU and memory usage and how to fix it.

 

Scope

 

FortiAnalyzer, FortiManager.

Solution

 

  • Double-check the hardware resources.
  • Check that the system sizing matches the network log requirements for FortiAnalyzer (for example on FortiAnalyzer KVM on v7.4.2).
  • Refer to the product's datasheet for hardware sizing. (example on FortiManager KVM on v7.4.2).
  • Refer to the product datasheet if using a hardware FortiManager/FortiAnalyzer.
  • If a VM is being used, adjust the CPU and RAM allowance of the VM.

 

  1. FortiManager sizing:

Get the number of managed devices using the following command:

 

diagnose dvm device list

 

Read more about the minimum system requirements in the documentation.

Check the license to see how many devices/VDOMs should be managed, and make sure the devices in the 'diagnose dvm device list' output matching or not.

 

If some features are enabled, like FortiAnalyzer, FortiSOAR, or PolicyAnalyzer,  this means extra resources are needed according to each environment's sizing.

Read more about Management extension applications.

 

  1. FortiAnalyzer sizing:

Retrieve the number of logs received per second using the following command:

 

diagnose fortilog log

diagnose fortilogd lograte (on version 7.0. and higher)

 

Compare to the sustained rate from the output of the following command:

 

   get system loglimits

 

If the incoming logs show more than the sustained rate, this means the customer needs extra RAM memory and extra CPU to handle a higher log rate to get inserted more easily.

 

Once the hardware is adjusted, a deeper analysis of how the system is behaving can be seen using the following command:

 

get system performance

 

Figure 1.
 
execute top
 
Figure 2.
 

The load average section represents the average 'load' over 1, 5, and 15 minutes. 'Load' is a measure of the amount of computational work a system performs.

A system is considered to be loaded when the CPU is above 90% for 1, 5, and 15 minutes, and also when the system appears to run slowly.


The us value in Figure 2 is the time the CPU spends executing processes in userspace. Similarly, the sy value is the time spent on running kernelspace processes.

 

execute iotop

 

Figure 3.
 
diagnose debug sysinfo
 
Figure 4.
 
diagnose debug crashlog read
 

Check that the system is not looping into a crash that may create a high load on the CPU. Identify this kind of crash condition by running the above command and checking the timestamp.

Depending on the process listed as a high CPU user, try to restart it using the following command:

 

diagnose test application <module> 99

 

This is often useful to restart an OFTPD or SQLOGD daemon. For example:

 

diagnose test application oftpd 99; diagnose test application sqllogd 99

 

If the module cannot be accessed with the diagnose test application, use the kill command:

 

diagnose sys process list

 

Once the process responsible for hitting the CPU cycles is identified, kill the process using the following command:

 

diagnose test application sqllogd 99

 

This may kill the postgress daemon as well as restart sqllogd.

 

Or:

 

diagnose test application logfiled 99

 

Or:

 

diagnose test application oftpd 99

 

If the process is not one of the above, contact the support team for further analysis.

 

Note:

A specific PID may be killed by running the execute top command using the 'k' key (be aware that some processes should not be killed with the 'k' key in execute top, as this may result in system instability).

Monitor the system behavior using the execute top and get system performance commands to check if the system is now behaving as normal.

If the steps above do not remedy the issue, provide the following information to support:

 

execute tac report

execute top <- Let it run for 10 seconds.

execute iotop                    <- Let it run for 10 seconds.

diagnose debug sysinfo

get sys perf                                  <- Run it multiple times.

diagnose hardware info

diagnose debug klog

 

Check what times of day the CPU usage is high and try to correlate these times with the network load. This can offer some hints for the troubleshooting phase.

If possible, run the attached Tera Term script and let it run until the CPU usage is identified.

 

If there is a specific process that utilizes the CPU resources (an API call, for example, that takes some time), use the commands below to run it every 10 seconds while this specific process is running:

 

execute top
execute iotop
execute top -b1i -n 1
execute top -b -n 1
execute iotop -b -n 5
execute iotop -bo -n 1

 

Notice which process is utilizing the CPU.

 

These commands will help identify all the processes utilizing the CPU to know if there is any buggy behavior or if it is just a limited resources problem.

 

If there are limited resources, check on the output of the TAC report for the 'OOM-kill' keyword, and it will be possible to know that there is a limited memory problem, for example (in this case, memory resources should be doubled).

 

Related articles:

Technical Tip: How to improve FortiAnalyzer performances when FortiSIEM module is not needed

Technical Tip: Continuous Debug Monitoring with Bash and Crontab - PART I

Technical Tip: FortiManager/FortiAnalyzer monitoring script

Troubleshooting Tip: DMworker high CPU troubleshooting under FortiManager

Technical Tip: Customizing CLI command 'execute top' to add column to its display output

Troubleshooting Tip: FortiAnalyzer performance is slow due capacity limits for physical appliances