Description |
This article describes a special behavior where the NPU offloading is not working due to ‘non-npu-intf’ but those interfaces do support hardware acceleration. This happens in some FortiGate models that include two NPU processors. |
Scope | FortiGate models include two NPU processors such as 200E. |
Solution |
Topology: PC (10.30.1.2) -> Port 1 (10.30.1.96) FortiGate 200E Port 14 (10.61.1.96) -> Server (10.61.1.12).
The firewall policy has been configured as normal. ‘auto-asic-offload ‘ has been enabled in the policy setting by default. Also, no change to the NPU fast path setting.
However, when looking at the session table for the HTTPS traffic between the PC and server on this FortiGate 200E, it shows the session has not been offloaded by the NPU. The non-offload-reason is ‘non-npu-intf’.
session info: proto=6 proto_state=01 duration=39 expire=3560 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255 state=log may_dirty f00 statistic(bytes/packets/allow_err): org=92/2/1 reply=52/1/1 tuples=2 tx speed(Bps/kbps): 2/0 rx speed(Bps/kbps): 1/0 orgin->sink: org pre->post, reply pre->post dev=19->10/10->19 gwy=10.61.1.12/10.30.1.2 hook=post dir=org act=snat 10.30.1.2:55358->10.61.1.12:443(10.61.1.96:55358) hook=pre dir=reply act=dnat 10.61.1.12:443->10.61.1.96:55358(10.30.1.2:55358) pos/(before,after) 0/(0,0), 0/(0,0) misc=0 policy_id=2 pol_uuid_idx=540 auth_info=0 chk_client_info=0 vd=1 serial=001c7676 tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=0x040000 no_ofld_reason: non-npu-intf
By looking at the np6 port-list, both ports 1 and 14 are part of the list so should support the hardware acceleration but why does the non-offloading-reason in the session table show they are non-npu-intf.
FortiGate-200E (global) # diagnose npu np6lite port-list Chip XAUI Ports Max Cross-chip Speed offloading ------ ---- ------- ----- ---------- np6lite_0 2 port9 1000M NO 1 port10 1000M NO 4 port11 1000M NO 3 port12 1000M NO 6 port13 1000M NO 5 port14 1000M NO 9 port15 1000M NO 10 port16 1000M NO 8 port17 1000M NO 7 port18 1000M NO np6lite_1 2 wan1 1000M NO 1 wan2 1000M NO 4 port1 1000M NO 3 port2 1000M NO 6 port3 1000M NO 5 port4 1000M NO 8 port5 1000M NO 7 port6 1000M NO 10 port7 1000M NO 9 port8 1000M NO
This is because traffic on port1 is handled by the NPU chip np6lite_1 while the traffic on port 14 is handled by the other NPU chip np6lite_0.
According to the FortiGate 200E fast path architecture, traffic will only be offloaded if it enters and exits the FortiGate 200E on interfaces connected to the same NP6 processor since the inter-connectivity between each NPU chip is via CPU.
Due to this, offloading the traffic between two chipsets (also called cross-chip offloading) is not supported on FortiGate 200E, which results from the sessions between these two interfaces are not offloaded.
The FortiGate 200E fast path architecture is below:
Reference link: FortiGate 200E and 201E fast path architecture
If the outgoing interface is changed to wan1, which is under the same chipset (np6lite_1) as the incoming interface port1, the NPU offloading is successful.
session info: proto=6 proto_state=01 duration=18 expire=3581 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255 state=log may_dirty npu f00 statistic(bytes/packets/allow_err): org=92/2/1 reply=52/1/1 tuples=2 tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0 orgin->sink: org pre->post, reply pre->post dev=19->17/17->19 gwy=10.56.243.254/10.30.1.2 hook=post dir=org act=snat 10.30.1.2:55619->172.217.24.35:80(10.56.241.96:55619) hook=pre dir=reply act=dnat 172.217.24.35:80->10.56.241.96:55619(10.30.1.2:55619) pos/(before,after) 0/(0,0), 0/(0,0) misc=0 policy_id=2 pol_uuid_idx=540 auth_info=0 chk_client_info=0 vd=1 serial=001c92c1 tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=0x4000c00 ofld-O ofld-R npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=76/78, ipid=78/76, vlan=0x0000/0x0000 vlifid=78/76, vtag_in=0x0000/0x0000 in_npu=2/2, out_npu=2/2, fwd_en=0/0, qid=1/0
The same behavior can also happen to the NPU VDOM link, which is the inter-VDOM link that supports the NPU offloading.
On FGT 200E, there are two NPU VDOM links (npu0_vlink & npu1_vlink). As per the name, the npu0_vlink is handled by the NPU chipset np6lite_0, while the npu1_vlink is handled by np6lite_1.
In this case, for example, if the incoming traffic is from port1, which is handled by np6lite_1 while the outgoing interface is the NPU VDOM link interface npu0_vlink0, there will not be NPU offloading as well due to the same reason.
The cross-chip offloading is supported by some higher-end models such as FortiGate 3300E and so on since they have the Integrated Switch Fabric which connects all front panel data interfaces and all of the NP6 processors.
FortiGate 3300E fast path architecture is below: FortiGate 3300E and 3301E fast path architecture
In conclusion, when designing the traffic topology, it is recommended to consider the NPU fast path architecture to make full use of the hardware acceleration for most of the sessions. This can help to increase the performance of the network and avoid unnecessary CPU usage.
The latest hardware acceleration architecture for each FortiGate model can be found here: |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.