|
In these situations, get the following outputs.
get system performance status
CPU states: 4% user 0% system 0% nice 53% idle 0% iowait 0% irq 43% softirq CPU0 states: 0% user 0% system 0% nice 0% idle 0% iowait 0% irq 100% softirq CPU1 states: 1% user 0% system 0% nice 99% idle 0% iowait 0% irq 0% softirq CPU2 states: 0% user 0% system 0% nice 1% idle 0% iowait 0% irq 99% softirq CPU3 states: 0% user 0% system 0% nice 100% idle 0% iowait 0% irq 0% softirq CPU4 states: 3% user 0% system 0% nice 5% idle 0% iowait 0% irq 92% softirq CPU5 states: 4% user 1% system 0% nice 95% idle 0% iowait 0% irq 0% softirq CPU6 states: 4% user 0% system 0% nice 8% idle 0% iowait 0% irq 88% softirq CPU7 states: 15% user 2% system 0% nice 83% idle 0% iowait 0% irq 0% softirq CPU8 states: 7% user 1% system 0% nice 21% idle 0% iowait 0% irq 71% softirq CPU9 states: 7% user 1% system 0% nice 92% idle 0% iowait 0% irq 0% softirq CPU10 states: 4% user 0% system 0% nice 26% idle 0% iowait 0% irq 70% softirq CPU11 states: 8% user 2% system 0% nice 90% idle 0% iowait 0% irq 0% softirq Memory: 16432164k total, 9682308k used (58.9%), 6115808k free (37.2%), 634048k freeable (3.9%) Average network usage: 8685743 / 8610623 kbps in 1 minute, 8904241 / 8814890 kbps in 10 minutes, 8821108 / 8735910 kbps in 30 minutes Maximal network usage: 10888036 / 11340676 kbps in 1 minute, 10888036 / 11340676 kbps in 10 minutes, 11270637 / 11340676 kbps in 30 minutes Average sessions: 786108 sessions in 1 minute, 772779 sessions in 10 minutes, 777503 sessions in 30 minutes Maximal sessions: 788393 sessions in 1 minute, 788393 sessions in 10 minutes, 817579 sessions in 30 minutes Average session setup rate: 5923 sessions per second in last 1 minute, 5885 sessions per second in last 10 minutes, 5946 sessions per second in last 30 minutes Maximal session setup rate: 6270 sessions per second in last 1 minute, 7830 sessions per second in last 10 minutes, 8926 sessions per second in last 30 minutes Average NPU sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes Maximal NPU sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes Average nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes Maximal nTurbo sessions: 0 sessions in last 1 minute, 0 sessions in last 10 minutes, 0 sessions in last 30 minutes Virus caught: 0 total in 1 minute IPS attacks blocked: 0 total in 1 minute Uptime: 258 days, 16 hours, 8 minutes
In the above output, it is observable that no NPU sessions are available even though there are over 700K sessions. This can raise a concern on why the all the sessions are handled by the CPU instead of the NPU.
The following are NPU related sessions counts, which give further proof.
diagnose npu np6 session-stats 0
qid ins44 ins46 del4 ins64 ins66 del6 ins44_e ins46_e del4_e ins64_e ins66_e del6_e ---------------- ---------- ---------- ---------- ---------- ---------- 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 1 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 ---------------- ---------- ---------- ---------- ---------- ---------- Total 2 0 1 0 0 0 0 0 0 0 0 0 ---------------- ---------- ---------- ---------- ---------- ----------
diagnose npu np6 session-stats 0[K1
qid ins44 ins46 del4 ins64 ins66 del6 ins44_e ins46_e del4_e ins64_e ins66_e del6_e ---------------- ---------- ---------- ---------- ---------- ---------- 0 4 0 4 0 0 0 0 0 0 0 0 0 1 28 0 27 0 0 0 0 0 0 0 0 0 2 3 0 1 0 0 0 0 0 0 0 0 0 3 2 0 2 0 0 0 0 0 0 0 0 0 4 34 0 33 0 0 0 0 0 0 0 0 0 5 7 0 7 0 0 0 0 0 0 0 0 0 ---------------- ---------- ---------- ---------- ---------- ---------- Total 78 0 74 0 0 0 0 0 0 0 0 0
From the above, it is observable that there are high SoftIrq and especially no NPU sessions.
Having no NPU sessions means all the traffic is handled by the kernel and the load to the kernel is causing these issues.
In these cases, investigate the reason why the traffic is not offloaded to the NPU.
The best way to do it is by getting the session list and checking the NO OFFLOAD reason.
In the following example, SFLOW has caused the issue as SLFOW has been applied on many interfaces.
session info: proto=6 proto_state=01 duration=58046 expire=3598 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255 state=may_dirty netflow-origin netflow-reply statistic(bytes/packets/allow_err): org=40171092/250588/1 reply=18432661/246730/1 tuples=2 tx speed(Bps/kbps): 15/0 rx speed(Bps/kbps): 15/0 orgin->sink: org pre->post, reply pre->post dev=69->84/84->69 gwy=x.y.z.p/x.x.x.x hook=pre dir=org act=noop p.q.r.s:7554->p.p.p.p:22(0.0.0.0:0) hook=post dir=reply act=noop p.p.p.p:22->p.q.r.s:7554(0.0.0.0:0) pos/(before,after) 0/(0,0), 0/(0,0) dst_mac=00:xx:yy:zz:rr:ss misc=0 policy_id=1019 pol_uuid_idx=15814 auth_info=0 chk_client_info=0 vd=2 serial=00268b69 tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=0x020008 no_ofld_reason: sflow <- Reason why NPU offloading is not done.
Note: When SFLOW is applied on interfaces the traffic is not handled by the NPU and the entire traffic on that interface is handled by the kernel. When sflow is applied on all interfaces, will result in all traffic will not be offloaded to NPU instead handles by kernel. This will cause kernel to be very busy and cause high CPU and fail open and other resource related issues.
Hence it is required to apply sflow only for interfaces where necessary and especially when the traffic is very high, need to be very cautious
Disable SFLOW as below,
config system interface
edit "ToInternet"
set vdom "root"
set ip x.y.z.r 255.255.255.248 set allowaccess ping set sflow-sampler enable <- Disable this. set alias "Northbound Transit" set device-identification enable set monitor-bandwidth enable set role lan set snmp-index 56 set interface "port33" set vlanid 234
next
Afterwards, there will be NPU offloaded traffic, and resource utilization will be eased, causing less CPU consumption and no fail opens.
Alternatively, consider using Netflow as it is supported by NPU offloading.
Related document:
sFlow and NetFlow and hardware acceleration
|