FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
sfernando
Staff
Staff
Article Id 356739
Description

This article describes how to handle issues where a device may see high resource utilization, such as IPS fail-open messages in crash logs, high CPU, high SoftIrq on some or all vCPU cores, slow responses for traffic, etc.

There are multiple possible causes for these issues, so this article outlines simple troubleshooting steps that can be used to determine the correct ones.

Scope FortiGate.
Solution

In these situations, get the following outputs.

 

get system performance status


CPU states: 4% user 0% system 0% nice 53% idle 0% iowait 0% irq 43% softirq
CPU0 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq
CPU1 states: 1% user 0% system 0% nice 99% idle 0% iowait 0% irq 0% softirq
CPU2 states: 0% user 0% system 0% nice 1% idle 0% iowait 0% irq 99% softirq
CPU3 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq
CPU4 states: 3% user 0% system 0% nice 5% idle 0% iowait 0% irq 92% softirq
CPU5 states: 4% user 1% system 0% nice 95% idle 0% iowait 0% irq 0% softirq
CPU6 states: 4% user 0% system 0% nice 8% idle 0% iowait 0% irq 88% softirq
CPU7 states: 15% user 2% system 0% nice 83% idle 0% iowait 0% irq 0% softirq
CPU8 states: 7% user 1% system 0% nice 21% idle 0% iowait 0% irq 71% softirq
CPU9 states: 7% user 1% system 0% nice 92% idle 0% iowait 0% irq 0% softirq
CPU10 states: 4% user 0% system 0% nice 26% idle 0% iowait 0% irq 70% softirq
CPU11 states: 8% user 2% system 0% nice 90% idle 0% iowait 0% irq 0% softirq
Memory: 16432164k total, 9682308k used (58.9%), 6115808k free (37.2%), 634048k freeable (3.9%)
Average network usage: 8685743 / 8610623 kbps in 1 minute, 8904241 / 8814890 kbps in 10 minutes, 8821108 / 8735910 kbps in 30 minutes
Maximal network usage: 10888036 / 11340676 kbps in 1 minute, 10888036 / 11340676 kbps in 10 minutes, 11270637 / 11340676 kbps in 30 minutes
Average sessions: 786108 sessions in 1 minute, 772779 sessions in 10 minutes, 777503 sessions in 30 minutes
Maximal sessions: 788393 sessions in 1 minute, 788393 sessions in 10 minutes, 817579 sessions in 30 minutes
Average session setup rate: 5923 sessions per second in last 1 minute, 5885 sessions per second in last 10 minutes, 5946 sessions per second in last 30 minutes
Maximal session setup rate: 6270 sessions per second in last 1 minute, 7830 sessions per second in last 10 minutes, 8926 sessions per second in last 30 minutes
Average NPU sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Maximal NPU sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Average nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Maximal nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes
Virus caught: 0 total in 1 minute
IPS attacks blocked: 0 total in 1 minute
Uptime: 258 days, 16 hours, 8 minutes

 

In the above output, it is observable that no NPU sessions are available even though there are over 700K sessions. This can raise a concern on why the all the sessions are handled by the CPU instead of the NPU.

 

The following are NPU related sessions counts, which give further proof.

 

diagnose npu np6 session-stats 0

qid ins44 ins46 del4 ins64 ins66 del6
ins44_e ins46_e del4_e ins64_e ins66_e del6_e
---------------- ---------- ---------- ---------- ---------- ----------
0 0 0 0 0 0 0
0 0 0 0 0 0
1 0 0 0 0 0 0
0 0 0 0 0 0
2 2 0 1 0 0 0
0 0 0 0 0 0
3 0 0 0 0 0 0
0 0 0 0 0 0
4 0 0 0 0 0 0
0 0 0 0 0 0
5 0 0 0 0 0 0
0 0 0 0 0 0
---------------- ---------- ---------- ---------- ---------- ----------
Total 2 0 1 0 0 0
0 0 0 0 0 0
---------------- ---------- ---------- ---------- ---------- ----------

diagnose npu np6 session-stats 0[K1

qid ins44 ins46 del4 ins64 ins66 del6
ins44_e ins46_e del4_e ins64_e ins66_e del6_e
---------------- ---------- ---------- ---------- ---------- ----------
0 4 0 4 0 0 0
0 0 0 0 0 0
1 28 0 27 0 0 0
0 0 0 0 0 0
2 3 0 1 0 0 0
0 0 0 0 0 0
3 2 0 2 0 0 0
0 0 0 0 0 0
4 34 0 33 0 0 0
0 0 0 0 0 0
5 7 0 7 0 0 0
0 0 0 0 0 0
---------------- ---------- ---------- ---------- ---------- ----------
Total 78 0 74 0 0 0
0 0 0 0 0 0

 

From the above, it is observable that there are high SoftIrq and especially no NPU sessions.

Having no NPU sessions means all the traffic is handled by the kernel and the load to the kernel is causing these issues.

 

In these cases, investigate the reason why the traffic is not offloaded to the NPU.

The best way to do it is by getting the session list and checking the NO OFFLOAD reason.

 

In the following example, SFLOW has caused the issue as SLFOW has been applied on many interfaces.

 

session info: proto=6 proto_state=01 duration=58046 expire=3598 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3
origin-shaper=
reply-shaper=
per_ip_shaper=
class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255
state=may_dirty netflow-origin netflow-reply
statistic(bytes/packets/allow_err): org=40171092/250588/1 reply=18432661/246730/1 tuples=2
tx speed(Bps/kbps): 15/0 rx speed(Bps/kbps): 15/0
orgin->sink: org pre->post, reply pre->post dev=69->84/84->69 gwy=x.y.z.p/x.x.x.x
hook=pre dir=org act=noop p.q.r.s:7554->p.p.p.p:22(0.0.0.0:0)
hook=post dir=reply act=noop p.p.p.p:22->p.q.r.s:7554(0.0.0.0:0)
pos/(before,after) 0/(0,0), 0/(0,0)
dst_mac=00:xx:yy:zz:rr:ss
misc=0 policy_id=1019 pol_uuid_idx=15814 auth_info=0 chk_client_info=0 vd=2
serial=00268b69 tos=ff/ff app_list=0 app=0 url_cat=0
rpdb_link_id=00000000 ngfwid=n/a
npu_state=0x020008
no_ofld_reason: sflow      <- Reason why NPU offloading is not done.

 

Note: When SFLOW is applied on interfaces the traffic is not handled by the NPU and the entire traffic on that interface is handled by the kernel. When sflow is applied on all interfaces, will result in all traffic will not be offloaded to NPU instead handles by kernel. This will cause kernel to be very busy and cause high CPU and fail open and other resource related issues.

 

Hence it is required to apply sflow only for interfaces where necessary and especially when the traffic is very high, need to be very cautious

 

Disable SFLOW as below,

 

config system interface

edit "ToInternet"

set vdom "root"

set ip x.y.z.r 255.255.255.248
set allowaccess ping
set sflow-sampler enable <- Disable this.
set alias "Northbound Transit"
set device-identification enable
set monitor-bandwidth enable
set role lan
set snmp-index 56
set interface "port33"
set vlanid 234

next

 

Afterwards, there will be NPU offloaded traffic, and resource utilization will be eased, causing less CPU consumption and no fail opens.

 

Alternatively, consider using Netflow as it is supported by NPU offloading.

 

Related document:

sFlow and NetFlow and hardware acceleration