FortiSIEM
FortiSIEM provides Security Information and Event Management (SIEM) and User and Entity Behavior Analytics (UEBA)
mbenvenuti
Staff
Staff
Article Id 317859
Description This article describes how to troubleshoot collectors.
Scope FortiSIEM Collector node.
Solution

In some cases, the collector has successfully been registered in the past, but is now experiencing some issues and shows a critical health status or seems to not receive or transmit events.

Follow the next steps to identify the source of the issue:

 

  1. Check the health status seen by the super.

Use the GUI to access Admin -> Health -> Collector Health.

 

collector_health.png

 

To get status details, hover the mouse on the collector status and check what is in the popup. This will make it easy to focus on what is worth attention, such as:

  • Processes being down.
  • Cache usage.
  • RAM / CPU usage.
  • Disk issues.

 Check the 'Last Status Updated' and the 'Last File Received' to see if the status is up-to-date or when the issue has started. This can correlate with external changes like the new firewall policy or link disconnection issues.

 

  1. Check the last collected events.

Perform a basic check to make sure that the collector is sending events to the super/workers from the GUI with the Analytics menu.

  • Run a Real-time query with filter 'Collector ID = 1000x' using the proper collector ID.
  • If no events have been displayed for 5 minutes, run a historical search with the same filter to check when the last events have been received.

 

  1. Check the running processes.

All ph services are down:

  • The collector might not have been registered correctly. The collector must be registered with the steps outlined in Register Collectors.

    phParser process shows high CPU usage:

  • A lot of unknown events received: This can be checked under Analytics. Run a query with the filter 'event type : Unknown_EventType'.
  • A new device may have been configured to send logs to FortiSIEM but this device is not compliant and is not part of the External Systems Configuration Guide. To fix that a dedicated parser can be defined or remove logging configuration on the source device.
  • Event dropping rules configured with complex regex. In Admin -> Settings -> Event Handling -> Dropping, rules to drop events are defined there. It is recommended to remove the logging configuration of the source device as much as possible and use the CONTAIN operator instead of REGEX as much as possible. The REGEX operator is computationally, intensive, especially on high EPS.

 

  1. Check disk usage.

Run the following command on the collector as root to check disk space:

df -h

 

If the / root disk needs to be checked:

mount -o bind / /mnt

du -h --max-depth=1 /mnt/ | sort -rh

umount /mnt #After checking directories

 

If /opt disk needs to be checked:

du -h --max-depth=1 /opt | sort -rh

 

Identify the directories that are taking up space. It could be because of system logs at:

  • /mnt/var/log and /mnt/var/log/httpd, /root/*.log or /tmp directories where some cleanup needs to be done.
  • /opt/phoenix/log/ FortiSIEM system logs.
  • /opt/phoenix/cache/parser/upload/ - the location of the events collected and waiting to be sent to the super/workers.

 

  1. Check for incoming events.

Check for SYSLOG packets:

tcpdump udp and port 514 -vvv -w /tmp/collector.pcap

 

Check for SNMP packets:

tcpdump udp and port 162 -vvv -w /tmp/collector.pcap

 

Check for Flow packets:

tcpdump 'udp and (port 2055 or port 6343)' -vvv -w /tmp/collector.pcap

 

Be aware that not all Flows are supported. See Flow Support.

If the source device is sending Netflow to the FortiSIEM but cannot decode it, make sure that the Netflow Template is sent by the source device regularly.

 

If the tcpdump command shows Got 0, review the network or device configuration.

/tmp/collector.pcap can be checked for further analysis.

 

Make sure phParser is listening on those ports to treat the UDP packets:

 

netstat -tulpn | egrep '514|162|2055|6343'
tcp6 0 0 :::514 :::* LISTEN 14161/phParser
tcp6 0 0 :::6514 :::* LISTEN 14161/phParser
udp 0 0 0.0.0.0:162 0.0.0.0:* 14161/phParser
udp6 0 0 :::514 :::* 14161/phParser
udp6 0 0 :::2055 :::* 14161/phParser
udp6 0 0 :::6343 :::* 14161/phParser

 

Check the EPS count on the collector:

 

tail -f /opt/phoenix/log/phoenix.log | grep PH_SYSTEM_PERF_EVENTS_PER_SEC
2024-06-17T11:49:50.600701+02:00 fsmCol714 phParser[14161]: [PH_SYSTEM_PERF_EVENTS_PER_SEC]:[eventSeverity]=PHL_INFO,[procName]=phParser,[fileName]=LicenseEnforce.cpp,[lineNumber]=575,[eventsPerSec]=0.011111,[peakEventsPerSec]=0.050000,[phLogDetail]=

 

  1. Check for communication with super and workers.

 

Check for cluster configuration on super under Admin -> Settings -> Cluster Config:

 

AdminSettingsClusterConfig.png

 

Check for configuration in sync in the collector:

cat /opt/phoenix/config/phoenix_config.txt | egrep 'APP_SERVER_HOST'
APP_SERVER_HOST=10.5.8.35

 

 

Make sure that IPs or FQDN listed as Event Upload Workers are reachable without filtering, proxy or SSL inspection.

 

From the collector CLI:

 

curl -vk https://super_or_worker_address
* Rebuilt URL to: https://super_or_worker_address/
* Trying super_or_worker_address...
* TCP_NODELAY set
* Connected to super_or_worker_address (super_or_worker_address) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=CA; L=SunnyVale; O=Fortinet; CN=localhost
* start date: May 16 10:07:44 2024 GMT
* expire date: May 14 10:07:44 2034 GMT
* issuer: C=US; ST=CA; L=SunnyVale; O=Fortinet; CN=localhost
* SSL certificate verify result: self signed certificate (18), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/1.1
> Host: super_or_worker_address
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200 OK
< Date: Wed, 12 Jun 2024 11:24:23 GMT
< Server: Apache
< X-XSS-Protection: 1; mode=block
< x-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Content-Security-Policy: default-src 'self' https://*.duosecurity.com https://*.googleapis.com https://*.gstatic.com; img-src 'self' data: https://maps.googleapis.com https://maps.gstatic.com; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://*.googleapis.com; style-src 'self' 'unsafe-inline' https://*.googleapis.com;
< Referrer-Policy: no-referrer-when-downgrade
< Last-Modified: Mon, 26 Feb 2024 14:01:19 GMT
< ETag: "120-612495807c093"
< Accept-Ranges: bytes
< Content-Length: 288
< Content-Type: text/html; charset=UTF-8
<
<html>
<head>
<title></title>
<script type="text/javascript">
var hst = location.hostname
document.write("<meta HTTP-EQUIV='REFRESH' content='0; url=https://"+hst+"/phoenix' >")
</script>

</head>
<body>
<b> <h2 align="center"> </h2> </b>

<br><br><br><br><br><br>

</body>
</html>

* Connection #0 to host super_or_worker_address left intact

 

If an SSL certificate is configured on the super and workers, make sure the common name matches with the Fully Qualified Domain Name of the machine. 

 

Check the events sending on the super or worker. From the super or worker CLI:

tail -f /var/log/httpd/ssl_request_log | grep evthandler | grep Collector_IP_or_Collector_ID
10.5.8.94 - 10001 [17/Jun/2024:12:03:53 +0200] "PUT //evthandler2?10001 HTTP/1.1" 200 - "-" "-"

 

If the HTTP code is 200, this means that events have been received.

If it is another code such as 401 or 403, the collector has authorization issues. Renew registration with a phProvision script on the collector with the --update option.

 

  1. Check for NO_EVENT_FILE_UPLOAD or NO_SVN_FILE_UPLOAD.

For safety reasons, the system may block itself to treat the events. On each FortiSIEM node, run the following command:

ls -l /opt/phoenix/cache/NO_*

 

If /opt/phoenix/cache/NO_EVENT_UPLOAD_FILE or NO_SVN_FILE_UPLOAD are present, those files behave like flags.

Check for explicit errors in the logs at /opt/phoenix/log/phoenix.log

Check for .err files under /opt/phoenix/cache/parser/upload/svn:

cd /opt/phoenix/cache/parser/upload/svn
ls -l
rm -rfv *.err

 

Remove those files with the following commands:

cd /opt/phoenix/cache/
rm -rfv NO_EVENT_UPLOAD_FILE NO_SVN_FILE_UPLOAD

 

  1. Check for cached events.

Events in the collector cache can be counted with this command from the collector CLI as root:

find /opt/phoenix/cache/parser -type f | wc -l

 

This result must be close to 0. If this figure is growing, it either means the connection to the super/worker is in a failure state or too slow compared to the amount of incoming events.

Details of events in the cache can be checked with the following command:

find /opt/phoenix/cache/parser -type f

 

  1. Check for errors in the logs.

In general, ongoing errors can be checked on the node with the next command from the collector CLI:

tail -f /opt/phoenix/log/phoenix.log | grep PHL_ERROR

 

Other known errors:

 

PH_HTTP_RESPONSE_FAILURE

2024-05-28T01:01:09.649952+02:00 collector phEventPackager[90872]: [PH_HTTP_RESPONSE_FAILURE]:[eventSeverity]=PHL_WARNING,[procName]=phEventPackager,[fileName]=phHttpClient.cpp,[lineNumber]=614,[errorNo]=500,[phLogDetail]=HTTP response code failure
 
Reason: The collector is not able to send the events to the worker or the super.
Resolution: The application Server process at Super needs to be checked.
 
PH_PARSER_FILE_STAT_FAILURE
2023-06-16T08:49:54.775570+03:00 collector phParser[4251]: [PH_PARSER_FILE_STAT_FAILURE]:[eventSeverity]=PHL_ERROR,[procName]=phParser,[fileName]=phAgentEventProcessor.cpp,[lineNumber]=401,[filePath]=/opt/phoenix/cache/parser/upload/win/TL5tpQ.gzs,[errorNoInt]=2,[phLogDetail]=Failed to stat file
 
Reason: The parser process does not correct permission to treat the file.
Resolution: Restore proper file permissions with the next commands from collector CLI as root:
 
find /opt/phoenix/cache/parser/ -ls
chown -R admin:admin /opt/phoenix/cache/parser/
chmod -R 755 /opt/phoenix/cache/parser
chmod -R 700 /opt/phoenix/cache/parser/fwdupload/
 
PH_UTIL_XML_HANDLING_ERROR
2024-03-08T01:19:00.818830+00:00 collector phParser[100925]: [PH_UTIL_XML_HANDLING_ERROR]:[eventSeverity]=PHL_ERROR,[procName]=phParser,[fileName]=phBaseXmlParser.cpp,[lineNumber]=332,[errReason]=Exception: Expected end of tag 'data', Level: Fatal error, Line No: 1, Column No: 10668,[phLogDetail]=Failed to handle XML
 
Reason: The wrong parser definition has been applied.
Resolution: Go on FortiSIEM GUI at Admin -> Device Support -> Parsers, review the last parser changed, deactivate it, and select the 'Apply' Button to sync with collectors. Then, review its definition until the parser can be activated again and sync with 'Apply'.