Description | This article describes how to troubleshoot collectors. |
Scope | FortiSIEM Collector node. |
Solution |
In some cases, the collector has successfully been registered in the past, but is now experiencing some issues and shows a critical health status or seems to not receive or transmit events. Follow the next steps to identify the source of the issue:
Use the GUI to access Admin -> Health -> Collector Health.
To get status details, hover the mouse on the collector status and check what is in the popup. This will make it easy to focus on what is worth attention, such as:
Check the 'Last Status Updated' and the 'Last File Received' to see if the status is up-to-date or when the issue has started. This can correlate with external changes like the new firewall policy or link disconnection issues.
Perform a basic check to make sure that the collector is sending events to the super/workers from the GUI with the Analytics menu.
All ph services are down:
Run the following command on the collector as root to check disk space:
If the / root disk needs to be checked: du -h --max-depth=1 /mnt/ | sort -rh umount /mnt #After checking directories
If /opt disk needs to be checked:
Identify the directories that are taking up space. It could be because of system logs at:
Check for SYSLOG packets:
Check for SNMP packets:
Check for Flow packets:
Be aware that not all Flows are supported. See Flow Support. If the source device is sending Netflow to the FortiSIEM but cannot decode it, make sure that the Netflow Template is sent by the source device regularly.
If the tcpdump command shows Got 0, review the network or device configuration. /tmp/collector.pcap can be checked for further analysis.
Make sure phParser is listening on those ports to treat the UDP packets:
netstat -tulpn | egrep '514|162|2055|6343'
Check the EPS count on the collector:
tail -f /opt/phoenix/log/phoenix.log | grep PH_SYSTEM_PERF_EVENTS_PER_SEC
Check for cluster configuration on super under Admin -> Settings -> Cluster Config:
Check for configuration in sync in the collector:
Make sure that IPs or FQDN listed as Event Upload Workers are reachable without filtering, proxy or SSL inspection.
From the collector CLI:
curl -vk https://super_or_worker_address </head> <br><br><br><br><br><br> </body> * Connection #0 to host super_or_worker_address left intact
If an SSL certificate is configured on the super and workers, make sure the common name matches with the Fully Qualified Domain Name of the machine.
Check the events sending on the super or worker. From the super or worker CLI: tail -f /var/log/httpd/ssl_request_log | grep evthandler | grep Collector_IP_or_Collector_ID
If the HTTP code is 200, this means that events have been received. If it is another code such as 401 or 403, the collector has authorization issues. Renew registration with a phProvision script on the collector with the --update option.
From the GUI at Admin -> Settings -> Cluster Config, note the list of supervisor addresses. From the GUI at Admin -> Health -> Collector, note the collector ID. Then for each supervisor address, run from collector CLI as root with the noted collector ID (ex:10001) :
srvpass=`phLicenseTool --showServicePassword`
Check for the result of the output command and the /tmp/systemConfig.xml. Each super address should be resolvable, command should not get 4xx or 5xx code and /tmp/systemConfig.xml should contain configuration data from the super. If it is an authentication issue, please renew registration using phProvision script on the collector with the --update option.
For safety reasons, the system may block itself to treat the events. On each FortiSIEM node, run the following command:
If /opt/phoenix/cache/NO_EVENT_UPLOAD_FILE or NO_SVN_FILE_UPLOAD are present, those files behave like flags.
Check for explicit errors in the logs at /opt/phoenix/log/phoenix.log. Check for .err files under: /opt/phoenix/cache/parser/upload/svn:
Remove those files with the following commands:
Events in the collector cache can be counted with this command from the collector CLI as root:
This result must be close to 0. If this figure is growing, it either means the connection to the super/worker is in a failure state or too slow compared to the amount of incoming events. Details of events in the cache can be checked with the following command: find /opt/phoenix/cache/parser -type f
In general, ongoing errors can be checked on the node with the next command from the collector CLI: tail -f /opt/phoenix/log/phoenix.log | grep PHL_ERROR
Other known errors:
PH_HTTP_RESPONSE_FAILURE 2024-05-28T01:01:09.649952+02:00 collector phEventPackager[90872]: [PH_HTTP_RESPONSE_FAILURE]:[eventSeverity]=PHL_WARNING,[procName]=phEventPackager,[fileName]=phHttpClient.cpp,[lineNumber]=614,[errorNo]=500,[phLogDetail]=HTTP response code failure
Reason: The collector is not able to send the events to the worker or the super.
Resolution: The application Server process at Super needs to be checked.
PH_PARSER_FILE_STAT_FAILURE
2023-06-16T08:49:54.775570+03:00 collector phParser[4251]: [PH_PARSER_FILE_STAT_FAILURE]:[eventSeverity]=PHL_ERROR,[procName]=phParser,[fileName]=phAgentEventProcessor.cpp,[lineNumber]=401,[filePath]=/opt/phoenix/cache/parser/upload/win/TL5tpQ.gzs,[errorNoInt]=2,[phLogDetail]=Failed to stat file
Reason: The parser process does not correct permission to treat the file.
Resolution: Restore proper file permissions with the next commands from collector CLI as root:
find /opt/phoenix/cache/parser/ -ls
chown -R admin:admin /opt/phoenix/cache/parser/ chmod -R 755 /opt/phoenix/cache/parser chmod -R 700 /opt/phoenix/cache/parser/fwdupload/ PH_UTIL_XML_HANDLING_ERROR
2024-03-08T01:19:00.818830+00:00 collector phParser[100925]: [PH_UTIL_XML_HANDLING_ERROR]:[eventSeverity]=PHL_ERROR,[procName]=phParser,[fileName]=phBaseXmlParser.cpp,[lineNumber]=332,[errReason]=Exception: Expected end of tag 'data', Level: Fatal error, Line No: 1, Column No: 10668,[phLogDetail]=Failed to handle XML
Reason: The wrong parser definition has been applied.
Resolution: Go on FortiSIEM GUI at Admin -> Device Support -> Parsers, review the last parser changed, deactivate it, and select the 'Apply' Button to sync with collectors. Then, review its definition until the parser can be activated again and sync with 'Apply'.
|
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.