Technical Tip: Performance issue and some general recommendations

ebilcari · ‎07-05-2023

Description

This article provides some general recommendations for improving FortiNAC's day-to-day operations.

This includes disabling forgotten debugs, checking for enabled logs, and specific configurations that can add extra load to FortiNAC performance.

These recommendations are based on previous experience with similar cases and numerous internal discussions surrounding these topics.

It is important to note that these recommendations may not apply to every case, as each situation can have its unique considerations and factors to take into account.

Scope

FortiNAC v9.x.

Solution

For VM deployments, the resources need to be allocated according to the Data Sheet.

Ports in the Network	Target Environment	vCPU Qty	Memory (GB)	Disk (GB)
Up to 2 000	Small	8	16*	100
Up to 15 000	Medium	24	32	100
Up to 25 000	Large	32	96	100

* Minimum of 1 <6GB of RAM.

Plugin debugs left enabled from the previous troubleshooting session.

How to find them:

How to disable them:

> nacdebug -name SnmpV1

Setting SnmpV1 debug to false

* A reboot of the system will disable all these debugs.

Debugs set in network device level:

These debugs can be seen in the device configuration added as attributes:

> device -all | grep DEBUG
Name = DEBUG value = TelnetServer ForwardingInterface length = 32

For each line present, it means that there is a device in the list that has the debugs enabled. To disable them it is first necessary to find the device that has them currently enabled. The output of '> device -all' can be extracted to an external editor and manually search for devices that have a similar line:

> device -all | grep DEBUG -B 60 | grep "IP\|DEBUG"
IP = 10.0.0.1
Name = DEBUG value = TelnetServer length = 12

To remove these debug run this command:

> device -ip 10.0.0.1 -delAttr -name DEBUG

* A reboot of the system will not disable debugs at the device level.

RADIUS logs enabled from GUI.

When there is no troubleshooting going on, the RADIUS logs have to be set to Low or better completely disabled:

Leaving these logs on for a long time will cause authentication latency, high disk usage, and a very big /var/log/radius/radius.log file.
Check the L2/L3 polling to ensure they are not set too intensively.

The default scheduled L2 polling interval is 10 minutes for wireless devices and 60 minutes for wired devices, 30 minutes for L3 polling.

During normal procedures when polling is needed FortiNAC will trigger it automatically. The main reason for scheduled polling is to correlate the already gathered information from other sources. Lowering the polling value will give the impression of higher responsive time from FortiNAC but in deployments that include many network devices this will interfere with necessarily triggered polling and that can cause delays in host status updates.

On switches that support SNMP MAC notification traps, the best practice is to use these traps, instead of the standard linkUp and linkDown traps. When MAC Notification traps are implemented, FortiNAC does not have to read the forwarding tables (L2 polling) of the switches each time a host connects or disconnects from the network: Configuring Traps for MAC Notification.

Several integration guides suggest the recommended configurations for 3rd party device integration with FortiNAC that can be found here: FortiNAC 9.4.
LDAP recommendation and synchronization.

Search Branches should not have an entry that is the base domain. This is easy to set up for testing purposes but it will not work well in production. Scheduled synchronization does not have to be frequent, 1 day is the default and the recommended value.

Manual synchronization can be run at any time if there are changes done on the AD. This will sync the user attributes and user groups, The user's passwords are checked in real time and there is no need for synchronization.

For deployments with multiple domain controllers for a single domain, it is suggested to put only one directory with one primary and secondary server.

More info can be found here:

Technical Note: Best practices for LDAP configuration
Loggers.

Same as with plugin debugs, loggers can be enabled and forgotten. Related information can be found here:

Technical Tip: NACDebug logger types
Aging.

Depending on the setup requirements and user/host dynamics, in high turnover setups it's suggested to put some Aging values for Inactive Hosts and Users:

More info can be found here: Technical Note: Modify aging of hosts and users.
Database and older backups.

Some options help manage the information stored and archived in the DB.

The default settings should be ok for most types of deployments. In case they need to be increased and need to be planned carefully.

FortiNAC will take a backup of the DB every day and will clean up the backup files that are more than 90 days old:

In cases where these files get too big and start to occupy the disk, they can be deleted manually from the CLI in this path:

> ll /bsc/campusMgr/master_loader/mysql/backup

This is another example of a manual cleanup of the DB file for a specific input type via the CLI:

Technical Tip: Database tables growing large due to constant port changes
Disk utilization and cleanup.

Disk utilization can be checked from GUI, in the Dashboard: System Performance widget. In case of high disk usage, an alarm will also be generated.

Usually, the cause of high disk utilization is big log files as a result of unnecessary debugs, database backups, message files, etc.

Searching for big files:

> df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 17M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/centos-root 100G 3.7G 96G 4% /
/dev/sda1 509M 287M 222M 57% /boot
tmpfs 783M 0 783M 0% /run/user/0

Change the directory to root, and recursively check within any bigger folder that is listed:

> cd /
> du -sch *
.
600K tmp
2.1G usr
90.6G var
.
> cd var
> du -sch *
.
462M lib
88G log
.
> cd var
> du -sch *
.
10.4G messages.1
1.3M mysqld.log
20.5G radius
.
> cd radius

When a big file is found (this applies to log/text files only), for example,/var/log/messages.1 or /var/log/radius/radius.log.1 it can be easily emptied with the following command:

> echo > radius.log.1

Usually, these log files can be emptied and will not cause any undesired behavior.
Resource utilization.

This can be checked from the CLI using the sar utility that is already integrated into FortiNAC. It will show the values for every 10 minutes in the last 24 hours. For example CPU, memory, and storage IOPS:

sar -u
sar -r
sar -b

Output example:

> sar -u

09:30:01 AM CPU %user %nice %system %iowait %steal %idle
09:40:01 AM all 0.98 0.00 0.57 0.05 0.00 98.40
09:50:01 AM all 1.12 0.00 0.49 0.02 0.00 98.37
10:00:01 AM all 0.80 0.00 0.50 0.02 0.00 98.69
Network devices are not properly configured in the Network Inventory.

In deployments involving multiple network devices managed by different teams, changes made to these devices often do not get reflected in FortiNAC configurations. This can lead to delays in regular operational processes for other managed devices, such as polling.

It is possible to check these warnings in the output.master file after changing the directory to > logs. The commands will go through the log file and count how many times FortiNAC has tried and failed to communicate with each device:

These devices are not properly configured under network inventory:

grep "CLI credentials not filled in for device model" output.master | cut -d ' ' -f 14-30 | sort | uniq -c
752 CLI credentials not filled in for device model, cannot telnet/SSH to device, IP = 10.6.20.43
951 CLI credentials not filled in for device model, cannot telnet/SSH to device, IP = 10.8.10.14

The user configured under credentials may have limited rights:

grep "failed to execute CLI commands" output.master | cut -d ' ' -f 10-30 | sort | uniq -c
9 failed to execute CLI commands for Router_f1 at 10.6.20.44
7 failed to execute CLI commands for SW_Floor1 at 10.8.10.15

The SSH credentials may be changed, a new SSH fingerprint is generated or a device is replaced after a RMA:

grep "failed to create an SSH2 session" output.master | cut -d ' ' -f 10-30 | sort | uniq -c
130 failed to create an SSH2 session for Router_f2 at 10.6.20.45
789 failed to create an SSH2 session for SW_Floor2 at 10.8.10.16

.

grep "Warning: failed to connect to" output.master | cut -d ' ' -f 9-19 | sort | uniq -c
129 Warning: failed to connect to Router_f3/10.6.20.46 No more authentication methods available. code = (14)
154 Warning: failed to connect to SW_Floor3/10.8.10.17 No more authentication methods available. code = (14)

If some of these errors are happening a few times which is an indication of delayed communications or a device being offline for a specific period, no action needs to be taken. Otherwise, the configuration needs to be changed accordingly.

Multiple Network Devices sending Syslog Events to FortiNAC:

When FortiNAC receives a large number of Syslog events that it is not able to parse, it will consume a lot of CPU and memory resources.

Debugs needed to verify received Syslog messages in FNAC:

> nacdebug -name SyslogServer true

FortiNAC will print in output.master the following messages:

yams.SecurityEventManager WARNING :: 2023-08-28 16:29:38:332 :: #734 :: Invalid Filter Tag/Column

In such cases, it is necessary to disable not needed Syslog IDs that are sent to FortiNAC.

Taking FortiGate VPN integration as an example we need only specific Syslog IDs sent to FortiNAC:

https://fortinetweb.s3.amazonaws.com/docs.fortinet.com/v2/attachments/0948ecde-090c-11ed-bb32fa163e1...

The FortiGate section 'config log syslogd filter' specifies which logs to forward.

'logids' 0101039947,0101039948,0101037129,0101037134 specify the tunnel up/down messages.

Related documents for Log messages:
Event
Log ID numbers