Solution |
The next steps can be followed when:
- There is some normal slowness on the GUI.
- The page is frozen at 'Initializing...' after the login page.
- There are latencies with the super-worker connections.
- There are some incidents in which it does not display event details.
- There are queries that take time to show a result.
- Check from a Web browser in the same local network. When connecting to the FortiSIEM, the web browser can go through long-distance links including possible internal or external proxies.
- Connect from a web browser in the same local network as the FortiSIEM super.
- If the FortiSIEM behaves better from this local web browser, a linking issue in the path (VPN, proxy, internet connection, etc.) must be checked further.
- If no change appears, check for other points in that document.
- Check for major system issues. From the SSH FortiSIEM super as root, execute the following commands:
journalctl -k --no-pager
- Check for major kernel errors. Those could be caused by the hardware being used or the VM platform FortiSIEM is on.
get-fsm-health.py --local -o /tmp/fsm-health.log
cat /tmp/fsm-health.log
- This command displays an overview of the FortiSIEM health node.
- The 'Health Assessment' section will show a hint on where to investigate first.
- 'App Server Exceptions' and 'Backend Errors' are listing the most repeated errors. Fixing those top errors will help.
- Check CPU and Memory stats. From the SSH FortiSIEM super as root, execute the following commands:
phstatus.py -a
- Check for the user and system CPU usage (us & sy).
- If those values are high and %id low, have a look at the CPU% column in the lower table and check if the process is constantly using CPU.
- If one or several ph processes are constantly high, check the activity of the process deeper with the following command (replace process1, process2 with the process high in CPU) and watch for explicit errors:
tail -f /opt/phoenix/log/phoenix.log | egrep -i 'process1|process2'
- If none of the listed processes are high, run the following command to identify which process on the top list is using CPU to investigate further:
top
- Check for Memory and Swap usage: Mem should have some free memory left and no swap used.
- Check the processes' memory usage. It is normal to have big values for phFortiInsightAI and AppSvr (if the UEBA feature is not and will not be used, the service can be deactivated here).
- Check processes for abnormal activity with previous tail commands provided before.
- If a lot of swaps are used, extend the memory.
- The use of a lot of Swap can be due to low storage performances (see next point).
- Be aware that recommended resources are 32cores and 32GB of memory, follow the below guide:
FortiSIEM Sizing Guide
- Check disks and online storage stats. For local disks:
- From the GUI at Analytics, run a random query, and from the FortiSIEM console run:
iostat -dhxm 5 5
- Check for r_await, w_await, aqu-sz, %util Device values for the disks and it needs to be the closest to 0. If not, it means that the system needs to wait before reading/writing. This behavior needs to be avoided.
iotop
- This allows checking which processes could use a lot of IO.
When online storage is on NFS:
- From the GUI at Analytics, run a random query, and from the FortiSIEM console run:
nfsiostat 2 3 > /tmp/nfsio_query.txt
cat /tmp/nfsio_query.txt
- Check the latencies from column avg RTT, avg exe, and avg queue. If one of those values is higher than 10ms, the NFS server setup needs to be reviewed.
- The NFS server needs to be closed to the FortiSIEM.
- Review hardware and resources of the server: interface rates, CPU, RAM, and disk type.
- It is strongly recommended an NFS server be dedicated to the FortiSIEM usage.
- Another load test can be applied:
echo 3 > /proc/sys/vm/drop_caches dd if=/dev/zero of=/data/test_file.bin bs=1M count=2000 conv=fsync & nfsiostat 2 3 > /tmp/nfsio_write.txt echo 3 > /proc/sys/vm/drop_caches dd if=/data/test_file.bin of=/dev/zero & nfsiostat 2 3 > /tmp/nfsio_read.txt rm -rf /data/test_file.bin
cat /tmp/nfsio_write.txt
cat /tmp/nfsio_read.txt
It is strongly recommended the use of high-performance nVme SSD for online data.
- VM platform usage (Only FortiSIEM-VM). When using a FortiSIEM-VM, it is hosted on a machine where hardware resources CPU/RAM/DISK are shared among other VMs.
If those VMs are loaded and using a lot of resources, this can affect the FortiSIEM performance and stability.
- Check the resource usage of the host and other VMs and arbitrate on the VM management.
- Isolate the FortiSIEM VM on one dedicated host without any other VMs and check the behavior.
- If there are several hosts in cluster mode, migrate the VM to another host and check again.
- Errors log related to performances. In /opt/glassfish/domains/domain1/logs/server.log:
[2024-05-28T02:00:01.184+0200] [glassfish 5.1] [ERROR] [] [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] [tid: _ThreadID=331 _ThreadName=PHScheduler_Worker-20] [timeMillis: 1716854401184] [levelValue: 1000] [[
ERROR: deadlock detected
Detail: Process 85521 waits for ShareLock on transaction 152825958; blocked by process 85522.
Process 85522 waits for ShareLock on transaction 152825964; blocked by process 85521.
Hint: See server log for query details.
Where: while deleting tuple (0,60) in relation "ph_change_set"]]
To fix this, move the VM to a faster disk host (especially /cmdb disk) and increase the number of CPUs.
In /opt/phoenix/log/phoenix.log on worker or collector:
2024-05-28T01:03:25.248477+02:00 machine phEventHandler: [PH_EVT_HANDLER_ERR]:[eventSeverity]=PHL_ERROR,[procName]=phEventHandler,[fileName]=phHttpRequestHandler.cpp,[lineNumber]=313,[phLogDetail]=Server is not running or congested, reject upload request
2024-05-28T01:01:09.649952+02:00 collector phEventPackager[90872]: [PH_HTTP_RESPONSE_FAILURE]:[eventSeverity]=PHL_WARNING,[procName]=phEventPackager,[fileName]=phHttpClient.cpp,[lineNumber]=614,[errorNo]=500,[phLogDetail]=HTTP response code failure
To fix this, check the FortiSIEM Super node where the Application Server is in a bad state.
|