Knowing how much memory a process or service is using is a core requirement for the following tasks:
- assessing the memory pressures on a host and how much a given process may be contributing to that pressure
- identifying the correct values to set the process's resource limits to.
An active FortiCNAPP Linux agent will always show at least two, but often times more than two, separate processes. These processes will have the word.'datacollector' in their process name. A typical process listing filtering for 'datacollector' would look like this:
ps -ef | grep -i datacollector
root 867 1 0 10:39 ? 00:00:03 /var/lib/lacework/datacollector
root 2902 867 1 10:39 ? 00:01:09 /var/lib/lacework/datacollector -r=collector
root 2943 2902 0 10:39 ? 00:00:02 /var/lib/lacework/datacollector -r=collector --processisolation
This output shows the following:
The first process (pid 867) is the agent's controller process. The second process (pid 2902) agents collector process (as identified by the '-r collector' flag). Note that the parent pit is the controller process. The third process is another instance of the collector running its separate process space in the space. this will be to collect data on containers running on the host. the number of additional processes will scale with the number of containers running at that time. Note that the additional collectors are child purchases of the initial collector. To determine the overall memory used requires the collection of memory statistics for the three processes.
- 'top'.
By default, the 'top' command reports metrics on all processes that the logged-in user has permissions to view. It can also be filtered to specific processes via the '-p' flag:
top -p867 -p2902 -p2943 -o RES
top - 12:48:23 up 2:08, 2 users, load average: 0.02, 0.05, 0.01
Tasks: 3 total, 0 running, 3 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.8 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 5920.3 total, 3420.2 free, 718.0 used, 1782.1 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 4900.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2902 root 20 0 3699784 198540 86888 S 0.0 3.3 1:17.69 datacollector
867 root 20 0 1964028 53936 32700 S 0.0 0.9 0:03.50 datacollector
2943 root 20 0 1963464 51472 30492 S 0.0 0.8 0:03.13 datacollector
The sixth column, 'RES', details the amount of resident ('physical') memory used by each process. This shows the collector (pid 2902) using the most memory, with roughly 200MB. The controller and child collect processes both then show roughly 50 MB each, making for a total of ~300 MB.
- This value of ~300MB is very typical usage for a newly started agent on a 'quiet' host (i.e there is currently little network traffic occurring for the agent to keep track of.)
- 'ps' command.
While 'top' continually updates it's metrics until the program is exited, the 'ps' command displays an instant capture of the latest usage and exits immediately.
ps aux |grep -e lacework -e RSS
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 867 0.0 0.8 1964028 53936 ? Ssl 10:39 0:03 /var/lib/lacework/datacollector
root 2902 0.9 3.3 3699784 202480 ? Sl 10:39 1:23 /var/lib/lacework/datacollector -r=collector
root 2943 0.0 0.8 1963464 51472 ? Sl 10:39 0:03 /var/lib/lacework/datacollector -r=collector --processisolation
The 'aux' flags add additional columns to the output, and just like 'top', the sixth column shows the resident memory for each process (RSS stand for 'Resident Set Size'). These values are very similar to those obtained via 'top'.
- 'systemctl status'.
Many recent Linux distributions use systemd to manage the system services, and on such systems the FortiCNAPP Linux agent is installed as a systemd service.
The 'systemctl status' command provides an overview of the current state of the service, including some high level metrics.
Note that the term used by systemd is 'service' (a collection of processes) rather than an individual 'processes'. Running 'systemctl status' against the 'datacollector service' shows reference to the same three processes seen before; however the memory value reported is significantly different:
systemctl status datacollector.service
● datacollector.service - Lacework agent
Loaded: loaded (/lib/systemd/system/datacollector.service; enabled; vendor preset: enabled)
Drop-In: /run/systemd/system.control/datacollector.service.d
└─50-CPUQuota.conf, 50-MemoryMax.conf, 50-MemorySwapMax.conf
Active: active (running) since Wed 2025-09-10 10:39:43 UTC; 22h ago
Main PID: 867 (datacollector)
Tasks: 51 (limit: 7057)
Memory: 708.5M (max: 750.0M swap max: 0B available: 41.4M)
CPU: 13min 48.899s
CGroup: /system.slice/datacollector.service
├─ 867 /var/lib/lacework/datacollector
├─2902 /var/lib/lacework/datacollector -r=collector
└─2943 /var/lib/lacework/datacollector -r=collector --processisolation
(There is also a systemd equivalent to 'top', 'systemd-cgtop'.)
This difference in reported size is because of the difference in how systemd counts memory usage, and in particular, how shared memory is accounted for against individual processes and services. Shared memory is where multiple individual processes wish to access (and therefore load into memory) the same object; this could be shared library files, program code, a log file etc. Instead of allocating a copy of the object for each process, the kernel gives all the processes a pointer to a single copy of that object. The memory pages used for that object remain loaded in memory until the last process using it releases its access.
- 'smem'.
- To get a better insight into the amount of memory that might be released by stopping a running agent, the 'smem' tool (apt install smem / dnf install smem) can be used. This provides separate metrics for the Unique Stack Size (USS), Proportional Stack Size (PSS) and the Resident Stack size (RSS).
The RSS is the same value used by top and ps. The USS excludes any shared memory used by the process. The PSS takes into account the shared memory, but divides and attributes it between the number of processes sharing it. This proves to be a reasonable estimation of the amount that will be released upon the process ending. This can be demonstrated with some example processes again:
There is 909MB free with the agent running:
free -m
total used free shared buff/cache available
Mem: 5920 754 909 2 4256 4863
Swap: 0 0 0
The USS column adds up to ~195 MB, and the PSS column adds up to ~240 MB.
smem -k | grep datacol | grep -v grep
524761 root /var/lib/lacework/datacolle 0 21.5M 31.1M 51.2M
524739 root /var/lib/lacework/datacolle 0 21.2M 31.7M 52.9M
524751 root /var/lib/lacework/datacolle 0 151.9M 178.9M 216.9M
After stopping the agent processes via 'systemctl stop datacollector.service':
systemctl stop datacollector.service
free -m
total used free shared buff/cache available
Mem: 5920 539 1132 2 4247 5077
Swap: 0 0 0
Used memory has dropped by 215 MB, free memory has increased by 223 MB. Both values are somewhere between the USS and PSS totals.
Summary:
There is no one 'correct' way to count and attribute these memory pages between the many processes accessing it; instead, each method is useful for different use cases:
- The systemd metric is what is compared when calculating usage against the systemd cgroup's memlimit for the service. In the event this value exceeds the cgroup memlimit, the kernel will OOM-kill the agent.
- However, because of the difference in shared memory counting, the actual amount of memory freed up by stopping or killing the service will always be somewhat less than the reported value that triggered it.
- Use 'systemctl status' when monitoring and tuning the cgroup memlimit for the agent.
- Use the 'PSS' and / or 'RSS' metrics to estimate the memory overhead that would be freed if the agent were stopped.
|