Table of Contents:
Introduction
NP7 DCE drop counters
Further Troubleshooting
1. NPU packet sniffers
2. NPU performance monitors
3. Packet Buffer Allocator (PBA) Health check
4. NPU IPSec offload failures
5. NPU session statistics
6. NPU Host Interface statistics
Additional NP7-related CLI outputs
Common issues and troubleshooting tips
Related documents
Introduction:
The Network Processor Units (NPUs) in FortiGates assist with hardware acceleration of traffic when a session is offloaded from the CPU to the NPU upon session creation. There have been several generations of NPUs, and the current FortiGate models utilize NP7, NP7Lite, NP6, NP6XLite, and NP6Lite. The F & G series FortiGates have the NP7 network processor, for example the 400F, 401F, 600F, 601F, 2600F, 2601F, 900G, 901G, etc: FortiGate NP7 architectures.
While troubleshooting packet drop issues on a FortiGate, the usual troubleshooting tools like diagnose and debug flow would not suffice for hardware-accelerated traffic. Following is a list of NP7 diagnostic commands to analyze the drop counters, along with an explanation of what each of these drop counters corresponds to.
NP7 DCE drop counters:
The primary NP7 drop counter diagnostic CLI command is the dce-drop-all command, which shows the list of all drops by the various sub-modules in NP7. If a packet is dropped for any reason in one of the Packet Descriptor Queues (PDQ - used to transfer packets between different sub-modules in NP7) or at any other stage of packet processing in NP7, it will be tracked and accounted for by the Drop Counter Engine (DCE), which can be displayed using the following CLI command.
FortiGate# diagnose npu np7 dce-drop-all all
The dce-drop-all option includes outputs of counters for several NP7 sub-modules - all in one CLI command. Here are the drop types tracked under this CLI command:
FortiGate# diagnose npu np7 dce?
dce-drop-all Show/clear all drop counters. [Take 0-2 arg(s)]
dce-eif-drop Show/clear EIF IHP drop counters. [Take 0-2 arg(s)]
dce-htx-drop Show/clear HTX IHP drop counters. [Take 0-2 arg(s)]
dce-ipti-drop Show/clear IPTI IHP drop counters. [Take 0-2 arg(s)]
dce-l2ti-drop Show/clear L2TI IHP drop counters. [Take 0-2 arg(s)]
dce-dfr-drop Show/clear DFR IHP drop counters. [Take 0-2 arg(s)]
dce-xhp-drop Show/clear XHP IHP drop counters. [Take 0-2 arg(s)]
dce-l2p-drop Show/clear L2P IHP drop counters. [Take 0-2 arg(s)]
dce-hif-drop Show/clear HIF drop counters. [Take 0-2 arg(s)]
dce-sse-drop Show/clear SSE drop counters. [Take 0-2 arg(s)]
dce-ipsec-drop Show/clear IPSec drop counters. [Take 0-2 arg(s)]
dsw-drop-all Show/clear DSW drop counters. [Take 0-2 arg(s)]
When a session is handed over to NP7 by CPU for hardware acceleration, packets enter the network processor and flow through a set of sub-modules in NP7, depending on the type of packet and the corresponding packet processing necessary. Packets could be dropped at any stage in this flow, and this CLI command would help trace the trigger and location of these drops, determine if these drops are for a valid reason, like sanity check failures or not, and troubleshoot further if necessary. Explanations of these drop counters at various NP7 sub-modules are given below.
Note:
Take the output of this CLI command multiple times with a few-minute intervals and review which counters are consistently increasing. A drop counter with some low value (and not increasing with time) may not indicate a persistent issue, hence it is important to check if the drop counter in question is increasing with time over multiple iterations of this CLI command. The outputs below are when running the CLI with verbose (v) level to illustrate the most useful drop counters for troubleshooting. Additional descriptions have been added where relevant to explain the counter type & provide additional context.
FortiGate# diagnose npu np7 dce-drop-all all
<EIF drop counters>
"EIF refers to the Ethernet Interface pipelines in NP7 (a total of eight EIFs in NP7 as indicated by the output below - EIF_0 to EIF_7) over which packets are distributed, and counters below indicate possible packet drops over these EIFs. Typically the reasons for drops tracked here are sanity check drops, protocol anamoly drops, checksum errors, minimum expected length errors, etc."
[NP7_0]
Counter EIF_0 EIF_1 EIF_2 EIF_3 EIF_4 EIF_5 EIF_6 EIF_7 Total
------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------------
[1]l3l4_parse 1113 1135 1111 1168 1609 1649 1550 1634 10969
[5]ipv4_csum 23 23 31 24 0 0 0 0 101
[27]tcp_csum 17337 17457 17272 17393 16592 16357 16426 16480 135314
[29]tcp_synoptpar 1083 1128 1083 1094 1189 1107 1081 1142 8907
[30]tcp_mssopt 586 633 637 645 624 609 570 633 4937
[32]udp_csum 70 77 74 79 111 109 103 112 735
[35]udp_plen 3744 3761 3766 3860 3306 3386 3280 3358 28461
[38]icmp_csum 371 375 363 364 270 269 240 283 2535
[48]vxlan_minlen 4 2 1 2 0 0 0 0 9
------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------------
Total_drop : 191968
* EIF1-7: Eight EIF Ethernet Interface pipelines in NP7
* l3l4_parse: L3/L4 packet parsing errors, protocol sanity check failures
* tcp_csum: TCP Checksum failures"
* ipv4_csum: IPv4 sanity check failures, L3 checksum value error
<HTX drop counters>
"HTX refers to the Host Transmit module which handles the traffic from CPU to NP7, drops here correspond to packet format issues like checksum failures, anamolies in packet flow, etc"
[NP7_0]
Counter HTX_0 HTX_1 HTX_2 HTX_3 Total
------------------------- ---------- ---------- ---------- ---------- ------------
[31]udp_ulite_minlen 9 12 15 11 47
[32]udp_csum 1673 1592 1626 1660 6551
------------------------- ---------- ---------- ---------- ---------- ------------
Total_drop : 6598
* HTX 0-3: Four Host Transmit Engines on this platform, could vary depending on the FortiGate model type.
* udp_ulite_minlen: UDP min length verification failures
* udp_csum: UDP checksum failures
<DFR drop counters>
"DFR refers to the Data Fragmentation & Reassemly, or Defragmentation module. Drops here could mean issues with fragmentation or reassembly."
[NP7_0]
Counter DFR
------------------------- ----------
None
------------------------- ----------
Total_drop : 0
<IPTI drop counters>
"IPTI refers to IP Tunnel Inbound module. Look for packet parsing errors in this module like ESP or GRE minlen check failure errors, ttl errors, jumbo packet errors, etc"
[NP7_0]
Counter IPTI_0 IPTI_1 IPTI_2 IPTI_3 Total
------------------------- ---------- ---------- ---------- ---------- ------------
None
------------------------- ---------- ---------- ---------- ---------- ------------
Total_drop : 0
* l2_parse: Layer 2 fram parsing errors
* l3l4_parse: L3/L4 packet parsing errors
* esp_minlen: ESP packet minimum expected length errors
* gre_minlen: GRE packet minimum expected length errors
* ipv4_ttl: TTL errors
* ipv4_land: IP Land related errors
* ipv4_proto: Protocol errors
<L2TI drop counters>
"L2TI refers to the L2 Tunnel Inbound module, which corresponds to any drops with CAPWAP L2 tunnel traffic, VxLAN traffic, etc"
[NP7_0]
Counter L2TI_0 L2TI_1 L2TI_2 L2TI_3 Total
------------------------- ---------- ---------- ---------- ---------- ------------
None
------------------------- ---------- ---------- ---------- ---------- ------------
Total_drop : 0
<XHP drop counters>
"XHP module refers to Extensible Header Processing module for IPSec Inbound traffic, which checks packets after IPSec processing for sanity checks. Check this section for any IP parse errors, minlen errors, checksum validation failure errors, land errors, etc "
[NP7_0]
Counter XHP_0 XHP_1 XHP_2 XHP_3 Total
------------------------- ---------- ---------- ---------- ---------- ------------
None
------------------------- ---------- ---------- ---------- ---------- ------------
Total_drop : 0
<L2P drop counters>
"Layer 2 Protocol ingress/egress processing module, check for drops due to any L2 issues like Ethertype, MAC filter, congestion, etc"
[NP7_0]
Module SSE_TPE EGR_FLOW IGR_FLOW TPE MAC_FILTER ETH_ACT TGT_ACT SRC_ACT Total
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----------
eif_0 0 0 0 0 190887 0 0 0 190887
eif_1 0 0 0 0 185916 0 0 0 185916
sse_0 16 0 0 0 0 0 0 0 16
sse_1 15 0 0 0 0 0 0 0 15
sse_2 35 0 0 0 0 0 0 0 35
sse_3 4 0 0 0 0 0 0 0 4
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----------
Total_drop : 376873
* SSE_TPE: Session-based traffic-policing drop.
* EGR_FLOW: Egress congestion drop.
* IGR_FLOW: Ingress congestion drop.
* TPE: Generic traffic-policing drop.
* MAC_FILTER: MAC-filter drop, related to missing or corrupt mac addresses for destinations.
* ETH_ACT: Ethertype action drop.
* TGT_ACT: Target action drop.
* SRC_ACT: Source action drop.
<HIF drop counters>
"HIF refers to the Host InterFace Module, which is used to exchange data with the kernel, check for drops due to congestion in the RX/TX queues, host interface buffer full errors, etc. Use the additional command in section 2.6 to get a more detailed HIF drop counter output."
[NP7_0]
Qid DSWH_DTS HRX_NOBD HTX2DSWH HTL2DSWH
--- ---------- ---------- ---------- ----------
None
--- ---------- ---------- ---------- ----------
Total_drop : 0
<IPSec drop counters>
"IPSec module that handles encryption, decryption and all other IPSec related functions in hardware, look for any continuous increments of drop counters to check if packets are dropped here and for what reason - packet check errors, auth errors, anti-replay check errors, etc ."
[NP7_0]
Counter Value
----------------- ----------
ipsec_enc_chk 0
ipsec_dec_auth 0
ipsec_dec_chk_ar 0
----------------- ----------
Total_drop(enc/dec): 0/0
* ipsec_enc_check: Encryption check failure
* ipsec_dec_auth: Decryption authentication failure
* ipsec_dec_chk_ar: Anti-replay check failure
<SSE drop counters>
"SSE refers to the Session Search Engine, and this output tracks counters for session management in NP7 like session insertion, update, search, etc and any errors. Use the additional CLI in section 2.4 to display SSE stats in more detail."
[NP7_0]
Counter Value
--------------- ----------
sse0_s_cfg 16
sse1_s_cfg 15
sse2_s_cfg 35
sse3_s_cfg 4
--------------- ----------
Total_drop 70
--------------- ----------
* s_cfg: Drop by session action flag (act_drop = 1)
* ttl_f: TTL check failure (TTL = 0 or TTL = 1)
* nss_f: NSS session search miss
* if_mis: Ingress interface mismatch
* nhi_null: NHI is NULL
* nhi_miss: NHI search miss
* nhi_cfg: Drop by NHI action
* nif_null/nhi_miss/nhi_cfg are not real drop
<DSW drop counters>
"DSW refers to the Descripter Switch that transfers packets between various sub-modules, and possible packet drops during this transfer is recorded in the below counters. Analyzing this section provides which modules to focus on to troubleshoot the packet drops."
[NP7_0]
SRC_mod -> DST_mod Drop
---------- ---------- ----------
EIF0 -> EIF0 0
EIF0 -> EIF1 0
EIF1 -> IPSECI 0
EIF1 -> IPSECO 0
EIF7 -> TSK 0
EIF7 -> QTM 0
HTX1 -> EIF0 0
HTX1 -> EIF1 0
SSE3 -> VEP0 0
SSE3 -> VEP2 0
DFR -> EIF7 0
DFR -> HRX 0
IPSECI -> SSE2 0
IPSECI -> SSE3 0
IPTI -> EIF3 0
IPTI -> EIF4 0
VEP0 -> PLE 0
VEP0 -> SYNK 0
VEP4 -> IVS 0
VEP4 -> L2TI1 0
PLE -> PLE 0
PLE -> SYNK 0
SPATH -> NSS 0
SPATH -> TSK 0
QTM -> RLT 0
QTM -> DFR 0
---------- ---------- ----------
Total_drop : 0
* SRC_mod: Source module
* DST_mod: Destination module
* IPSecI: IPSec engine Inbound
* IPSecO: IPsec engine Outbound
* QTM: Queuing based Traffic management
* SPATH: Slow Path Processing Module
* DFR: Defragmentation module
To clear these dce-drop-all counters, use the below syntax:
FortiGate # diagnose npu np7 dce-drop-all all clear
Further troubleshooting:
Based on the information gathered from the drop counters, narrow down which sub-module could be related to the packet drops. Use the relevant additional CLI commands shared below to trace the issue, along with the option of enabling specific (filtered) debugs. 'diagnose debug flow' does not work for hardware-accelerated traffic.
- NPU packet sniffers: NPU packet sniffers help with capturing packets without having to disable asic-offload. Use with caution & make the filter very specific, since this activity can be resource-intensive with a generic filter.
diagnose npu sniffer filter
diagnose npu sniffer filter intf port15
diagnose npu sniffer filter dir 2
diagnose npu sniffer filter srcip 172.16.10.10
diagnose npu sniffer filter protocol 6
diagnose npu sniffer start
diagnose sniffer packet npudbg <----- Displays the packets captured. Verbosity can also be added via ‘ ‘ 6 0 a
More details on all available filter options are in this KB article: Troubleshooting Tip: Collecting NP7 packet capture without disabling offload.
Note: To verify the npu sniffers are working correctly, check if the sniffer output correlates with the increase in counters corresponding to the interface (with NP7 enabled) in both RX and TX directions using the command "fnsysctl ifconfig portXX". One of the 2 directions or both might be abnormal when there is an issue; for example, a scenario wherein RX counters are increasing but TX has stopped increasing, which could indicate FortiGate stops sending responding or forwarding packets on that interface in the egress direction.
-
NPU performance monitors:
Resource utilization of each of the submodules can be displayed using the pmon CLI command. Check if any of the sub-modules have a high usage percentage % which could indicate a possible issue.
FortiGate# diagnose npu np7 pmon 0
[NP7_0]
EIF0_IGR EIF1_IGR EIF1_EGR EIF1_EGR HRX HTX DFR
-------- -------- -------- -------- -------- -------- -------- --------
Usage% 1 1 1 1 1 1 0
-------- -------- -------- -------- -------- -------- -------- --------
SSE0 SSE1 SSE2 SSE3
-------- -------- -------- -------- --------
Usage% 1 1 1 1
-------- -------- -------- -------- --------
IPSEC IPTI IPTO L2TI L2TO VEP IVS
-------- -------- -------- -------- -------- -------- -------- --------
Usage% 0 0 0 1 0 0 1
-------- -------- -------- -------- -------- -------- -------- --------
PLE MSE SYNK DSE NSS
-------- -------- -------- -------- -------- --------
Usage% 0 0 0 11 0
-------- -------- -------- -------- -------- --------
* EIFx_IGR: EIF ingress, EIFx_EGR: EIF egress
* Usage%: "Resource usage of the corresponding sub-module, verify if any of them are close to 100 which could indicate an issue."
-
Packet Buffer Allocator (PBA) Health Check:
Packets entering NP7 for hardware acceleration should either be transmitted or dropped, but if they stay in the memory for too long, it could indicate a possible PBA leak. Check the status using the CLI command below, and see if the 'Leak' message persists over time.
FortiGate# diagnose npu np7 pba
[NP7_0]
normal current Delta Empty
pba 00003f7c 00003f71 11 0
dba 00001ddf 00001dde 1
hba 00000ff5 00000ff5 0
!!!Leak!!! -------> "look for this entry and see if the leak persists over time"
"Delta should ideally be 0, indicating no packets are left in the NP7 memory for too long."
-
NPU IPSec offload failures:
IPSec SAs created after successful VPN tunnel creation would be offloaded to NP7. Check the below CLI command to review if any of the SAs did not get offloaded or any other related drops.
FortiGate# diagnose npu np7 session-offload-stats all
[NP7_0]
Name Count
----------------------- -----------
pushed 6489660
IPSec-enc-SA-not-offloaded 10
IPSec-dec-SA-not-offloaded 5
shaper_fail 0
----------------------- -----------
"Check to see if any of the not-offloaded or fail counters are non zero."
-
NPU Session statistics:
Use the below CLI command to review the session statistics of the SSE Engines, and check for any of the fail counters incrementing over time.
FortiGate# diagnose npu np7 sse-stats all
[NP7_0]
Counters SSE0 SSE1 SSE2 SSE3 Total
--------------- --------------- --------------- --------------- --------------- ---------------
entcnt 41613 41792 41114 41281 165800
inssucc 2756441 2758330 2754911 2758230 11027912
insfail 0 0 0 0 0
updsucc 150840855 155233591 141994735 257962638 706031819
delsucc 2714828 2716538 2713797 2716949 10862112
delfail 0 1 0 0 1
depfail 0 0 0 0 0
srhsucc 156858808 161781004 146914188 277080141 742634141
srhfail 20377068 20368585 21180871 20659337 82585861
agesucc 0 0 0 0 0
oftfcnt 8388533 8388518 8388501 8388507
--------------- --------------- --------------- --------------- --------------- ---------------
* entcnt: Session count
* inssucc: Insertion success, insfail: Insertion failure
* delsucc: Deletion success, delfail: Deletion failure
* updsucc: Update success, depfail: Depth failure
* srhsucc: Search success, srhfail: Search failure
* agesucc: Aging success
* oftfcnt: Overflow table entry count
-
NPU Host Interface statistics:
CLI below provides details on drop types for HiF interfaces. Check for large and persistent increments in any of the error counters.
FortiGate# diagnose npu np7 hif-stats
[NP7_0]
RX pkts msg ipsec ipt cwp dvlif e_nlif e_len e_nomem e_ipsec e_ipt e_cwp t_lpbk t_drop sg e_sg e_hairpin
TX pkts cmd clean ips_ofld frags npu_proc e_pkt_full e_cmd_full e_headroom e_frag intr pad e_pad e_nturbo
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
tx0 0 80309 80309 0 0 0 0 0 0 0 80310 0 0
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
Total_Queue:1
RX PKTS :0
TX PKTS :0
RX MSG :0
TX CMD :80309
Additional NP7-related CLI outputs:
Here is a summary of some additional CLI commands that would be useful to collect (under the guidance of Fortinet TAC) if the issue persists, for further analysis.
execute tac report
diagnose npu np7 dce-drop-all all
diagnose npu np7 sse-stats all
diagnose npu np7 pdq
diagnose npu np7 pmon
diagnose npu np7 hif-stats
diagnose npu np7 cgmac-stats all
diagnose sniffer packet port1 "specific filter" 6 0 a
fnsysctl cat /proc/net/np7/np7_0/tbl/dce_dce0
fnsysctl cat /proc/net/np7/np7_0/tbl/dce_dce2
fnsysctl cat /proc/net/np7/np7_0/hif_que
fnsysctl cat /proc/net/np7/np7_0/hif_intc
fnsysctl cat /proc/net/np7/np7_0/hif_intr
fnsysctl cat /proc/net/np7/np7_0/hif_stats
fnsysctl cat /proc/net/np7/np7_0/msg
fnsysctl cat /proc/net/np7/qtm
fnsysctl cat /proc/net/np7/qtm_stat
fnsysctl cat /proc/net/np7/tpe_stats
Common issues and Troubleshooting tips:
Here are some common NP7-related issues and their solutions. It is always recommended to be on the latest interim version of the FortiOS release (refer to: Technical Tip: Recommended Release for FortiOS).
- Review the release notes of the FortiOS version: Refer to the known issues section of the release notes for the FortiOS version running on the FortiGate (for example, 7.4.8 Known issues in Release Notes ) for any existing NP7 defects, and contact Fortinet TAC for any questions.
- Verify whether the session is offloaded to NP7: Even if asic-offload is enabled, there are several valid reasons why a session might not be offloaded for hardware acceleration. In the output of 'diagnose sys session list' (with filter applied for the specific traffic that has the issue), check the status of the 'npu info: offload' field & look for 'no_ofld_reason:' entry. Offload field values are described here: Explaining the NPU offload fields and the list of reasons for no offload is here: no_ofld_reason field reasons.
- Isolate if NP7 is the issue: Disable asic-offload (hardware acceleration) for a specific firewall policy and then retest the traffic to confirm if the issue persists. If it does persist even without asic-offload, then it is unlikely to be a hardware acceleration issue and requires troubleshooting with standard packet sniffers, diag debug flow with filters, etc.
- Verify which shaping engine is active: For the latest FortiOS versions, the recommended shaping engine is TPE instead of QTM. Verify which shaping engine is enabled and change it to policing if necessary (will trigger a reboot, so change it only in a maintenance window). Check this KB article: Technical Tip: FortiGate QoS traffic shaping considerations for NP7 platforms - QTM vs TPE engines for more details.
Related documents:
|