FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
GeorgeZhong
Staff
Staff
Article Id 338313
Description

This article describes a special behavior where the NPU offloading is not working due to ‘non-npu-intf’ but those interfaces do support hardware acceleration. This happens in some FortiGate models that include two NPU processors.

Scope FortiGate models include two NPU processors such as 200E.
Solution

Topology:

PC  (10.30.1.2)   ->  Port 1 (10.30.1.96)  FortiGate 200E   Port 14 (10.61.1.96)  ->   Server (10.61.1.12).

 

The firewall policy has been configured as normal. ‘auto-asic-offload ‘ has been enabled in the policy setting by default. Also, no change to the NPU fast path setting.

 

However, when looking at the session table for the HTTPS traffic between the PC and server on this FortiGate 200E, it shows the session has not been offloaded by the NPU. The non-offload-reason is ‘non-npu-intf’.

 

session info: proto=6 proto_state=01 duration=39 expire=3560 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3

origin-shaper=

reply-shaper=

per_ip_shaper=

class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255

state=log may_dirty f00

statistic(bytes/packets/allow_err): org=92/2/1 reply=52/1/1 tuples=2

tx speed(Bps/kbps): 2/0 rx speed(Bps/kbps): 1/0

orgin->sink: org pre->post, reply pre->post dev=19->10/10->19 gwy=10.61.1.12/10.30.1.2

hook=post dir=org act=snat 10.30.1.2:55358->10.61.1.12:443(10.61.1.96:55358)

hook=pre dir=reply act=dnat 10.61.1.12:443->10.61.1.96:55358(10.30.1.2:55358)

pos/(before,after) 0/(0,0), 0/(0,0)

misc=0 policy_id=2 pol_uuid_idx=540 auth_info=0 chk_client_info=0 vd=1

serial=001c7676 tos=ff/ff app_list=0 app=0 url_cat=0

rpdb_link_id=00000000 ngfwid=n/a

npu_state=0x040000

no_ofld_reason:  non-npu-intf

 

By looking at the np6 port-list, both ports 1 and 14 are part of the list so should support the hardware acceleration but why does the non-offloading-reason in the session table show they are non-npu-intf.

 

FortiGate-200E (global) # diagnose npu np6lite port-list

Chip   XAUI Ports            Max   Cross-chip

                             Speed offloading

------ ---- -------          ----- ----------

np6lite_0

       2    port9            1000M          NO

       1    port10           1000M          NO

       4    port11           1000M          NO

       3    port12           1000M          NO

       6    port13           1000M          NO

       5    port14           1000M          NO

       9    port15           1000M          NO

       10   port16           1000M          NO

       8    port17           1000M          NO

       7    port18           1000M          NO

np6lite_1

       2    wan1             1000M          NO

       1    wan2             1000M          NO

       4    port1            1000M          NO

       3    port2            1000M          NO

       6    port3            1000M          NO

       5    port4            1000M          NO

       8    port5            1000M          NO

       7    port6            1000M          NO

       10   port7            1000M          NO

       9    port8            1000M          NO

 

This is because traffic on port1 is handled by the NPU chip np6lite_1 while the traffic on port 14 is handled by the other NPU chip np6lite_0.

 

According to the FortiGate 200E fast path architecture, traffic will only be offloaded if it enters and exits the FortiGate 200E on interfaces connected to the same NP6 processor since the inter-connectivity between each NPU chip is via CPU.

 

Due to this, offloading the traffic between two chipsets (also called cross-chip offloading) is not supported on FortiGate 200E, which results from the sessions between these two interfaces are not offloaded.

 

The FortiGate 200E fast path architecture is below:

 

GeorgeZhong_0-1725263592780.png

 

Reference link:

FortiGate 200E and 201E fast path architecture

 

If the outgoing interface is changed to wan1, which is under the same chipset (np6lite_1) as the incoming interface port1, the NPU offloading is successful.

 

session info: proto=6 proto_state=01 duration=18 expire=3581 timeout=3600 flags=00000000 socktype=0 sockport=0 av_idx=0 use=3

origin-shaper=

reply-shaper=

per_ip_shaper=

class_id=0 ha_id=0 policy_dir=0 tunnel=/ vlan_cos=0/255

state=log may_dirty npu f00

statistic(bytes/packets/allow_err): org=92/2/1 reply=52/1/1 tuples=2

tx speed(Bps/kbps): 0/0 rx speed(Bps/kbps): 0/0

orgin->sink: org pre->post, reply pre->post dev=19->17/17->19 gwy=10.56.243.254/10.30.1.2

hook=post dir=org act=snat 10.30.1.2:55619->172.217.24.35:80(10.56.241.96:55619)

hook=pre dir=reply act=dnat 172.217.24.35:80->10.56.241.96:55619(10.30.1.2:55619)

pos/(before,after) 0/(0,0), 0/(0,0)

misc=0 policy_id=2 pol_uuid_idx=540 auth_info=0 chk_client_info=0 vd=1

serial=001c92c1 tos=ff/ff app_list=0 app=0 url_cat=0

rpdb_link_id=00000000 ngfwid=n/a

npu_state=0x4000c00 ofld-O ofld-R

npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=76/78, ipid=78/76, vlan=0x0000/0x0000

vlifid=78/76, vtag_in=0x0000/0x0000 in_npu=2/2, out_npu=2/2, fwd_en=0/0, qid=1/0

 

The same behavior can also happen to the NPU VDOM link, which is the inter-VDOM link that supports the NPU offloading.

 

On FGT 200E, there are two NPU VDOM links (npu0_vlink & npu1_vlink). As per the name, the npu0_vlink is handled by the NPU chipset np6lite_0, while the npu1_vlink is handled by np6lite_1.

 

In this case, for example, if the incoming traffic is from port1, which is handled by np6lite_1 while the outgoing interface is the NPU VDOM link interface npu0_vlink0, there will not be NPU offloading as well due to the same reason.

 

The cross-chip offloading is supported by some higher-end models such as FortiGate 3300E and so on since they have the Integrated Switch Fabric which connects all front panel data interfaces and all of the NP6 processors.

 

FortiGate 3300E fast path architecture is below:

FortiGate 3300E and 3301E fast path architecture

 

GeorgeZhong_1-1725263592790.png

 

  

In conclusion, when designing the traffic topology, it is recommended to consider the NPU fast path architecture to make full use of the hardware acceleration for most of the sessions. This can help to increase the performance of the network and avoid unnecessary CPU usage.

 

The latest hardware acceleration architecture for each FortiGate model can be found here:

Hardware acceleration