FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
lol
Staff
Staff
Article Id 357435
Description This article provides CLI commands for a methodical initial debugging of a memory-related issue on FortiGate.
Scope FortiGate.
Solution

FortiGate memory troubleshooting can be difficult.

This article provides a simplified and structured method to collect relevant debug outputs for the initial troubleshooting.

This will help focus on the most important commands to collect to assist Technical Support to resolve the issue.

 

 

  1. Memory areas:

 

 

A Linux kernel will differentiate areas where memory is allocated.
Refer to the Linux documentation in proc.txt  -> search for '/proc/meminfo'.

 

On FortiGate, most memory related issues are observed in the following areas:

  • Cached - memory allocated for disk I/O
  • Active - memory allocated for recently active processes
  • Shmem - shared memory for different processes accessing the same memory
  • Slab - kernel allocated memory

 

  1. General FortiGate commands to check in which area memory is allocated.
In FortiGate, the command 'get hardware memory' will show where memory is allocated by the kernel.
This is the same as the commands 'fnsysctl cat /proc/meminfo' or 'diagnose hardware sysinfo memory', which will show the same data.
 
To troubleshoot any memory issue, collect data when the memory is already allocated.
As a general guideline, this is at 75% memory usage or above, i.e. while already in conserve mode.
 
A debug report (same as 'execute tac report'):
 
diag debug report
 
Or the following commands (which are already included in a debug report, meaning a debug report would be preferred):
 
get sys stat
get sys perf stat
get hardware memory
 
Example output with interesting memory areas highlighted:
 
get system status
Version: FortiGate-100F v7.6.0,build3401,240724 (GA.F)
Serial-Number: FG100FTK12345678
Hostname: firewall02
Current HA mode: a-p, secondary
Cluster uptime: 533 days, 0 hours, 50 minutes, 55 seconds
Cluster state change time: 2024-08-30 09:47:00
System time: Fri Sep  6 00:01:05 2024
 
get system performance status
Memory: 3701384k total, 2951756k used (79.7%), 548876k free (14.8%), 200752k freeable (5.5%) <--- 79.7% memory usage
Uptime: 6 days,  14 hours,  16 minutes
 
get hardware memory
MemTotal:        3701664 kB <-----
MemFree:         2518100 kB <-----
Buffers:           14444 kB
Cached:           341552 kB <-----
SwapCached:            0 kB
Active:           724644 kB <-----
Inactive:          54092 kB
Active(anon):     449044 kB
Inactive(anon):     8616 kB
Active(file):     275600 kB
Inactive(file):    45476 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        422820 kB
Mapped:            76144 kB
Shmem:             34880 kB <-----
Slab:             152464 kB <-----
SReclaimable:       9348 kB
SUnreclaim:       143116 kB
KernelStack:        3488 kB
PageTables:        20992 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1850832 kB
Committed_AS:   10386140 kB
VmallocTotal:   260046784 kB
VmallocUsed:       75272 kB
VmallocChunk:   259873640 kB
 
If a conserve mode occurred in the past, the output of 'get hardware mem' will be written into the crashlog at the time of the event.
This helps to understand where memory was allocated at the time of the issue.
This data can be seen with the command 'diag debug crashlog read' or in a 'diag debug report' where this command is already included.
The output below is already filtered for the most interesting areas.
 
diag debug report
...
diagnose debug crashlog read
10: 2023-10-12 11:09:05 service=kernel conserve=on total="24140 MB" used="21247 MB" red="21243 MB"
11: 2023-10-12 11:09:05 green="19795 MB" msg="Kernel enters memory conserve mode"
12: 2023-10-12 11:09:07 MemTotal: 24720008 kB
13: 2023-10-12 11:09:07 MemFree: 1725984 kB
16: 2023-10-12 11:09:07 Cached: 1270152 kB
18: 2023-10-12 11:09:07 Active: 15681444 kB <----- Most memory was allocated in active mem.
32: 2023-10-12 11:09:08 Shmem: 593924 kB
33: 2023-10-12 11:09:08 Slab: 1459632 kB
 

 

  1. Depending on the area with high memory usage, collect the output of more commands for Technical support to troubleshoot.

 

After the area(s) with the most memory usage have been isolated, further commands should be used to help find the cause.

Make sure to also share all commands from step 1, i.e. the output of 'diag debug report'.

 

See below as simplified steps:

 

 

  1. If memory is high in cached memory, collect data about files on the disk.

 

fnsysctl df -h
fnsysctl du -d 1 -a
fnsysctl du -alLH /

 

  1. If memory is high in Active memory, collect data about active user space processes.

 

diagnose sys top-mem 99
diagnose sys top 1 99 5

 

  1. If memory is high in Shmem shared memory, collect data about shared memory and files on the disk.

 

diagnose hardware sysinfo shm
fnsysctl ls -al /dev/shm
fnsysctl du -d 1 -a /dev/shm/
fnsysctl ls -al /tmp
fnsysctl du -d 1 -a /tmp
fnsysctl cat /proc/sysvipc/shm

fnsysctl df -h
fnsysctl du -d 1 -a /
fnsysctl du -alLH /

 

  1. If memory is high in kernel slabs, collect slabs info.

 

diag hardware sysinfo slab

 

This is the same as the following command:

 

fnsysctl cat /proc/slabinfo

 

The command is also included in a debug report, so a debug report is preferred as it contains additional details.

 

diag debug report

 

With these initial details collected while the memory is allocated (above 75%), the root cause can be quickly isolated. Please note that the base memory consumption for smaller devices with 2GB of memory or less can be quite high, at times close to the 75% threshold mentioned. For small devices it is best to first implement memory optimizations as described in the KB article Technical Tip: Free up memory to avoid conserve mode. This way FortiOS will need to misallocate more memory to get to the 75% memory threshold making it easier to identify the problem.