Under certain circumstances, FortiGate NP7 platforms may experience a PBA leak on NP7. This can be checked using the following commands:
diagnose npu np7 pba all
diagnose npu np7 pmon all
diagnose npu np7 pdq
For example:
FGT1800F # diagnose npu np7 pba all
[NP7_0]
    normal current Delta Empty
pba 00003f7c 0000267e 6398 0
dba 00001ddf 00001494 2379
hba 00000ff5 00000ff5 0
!!!Leak!!! < ----
FGT1800F # diagnose npu np7 pba all
[NP7_0]
    normal current Delta Empty
pba 00003f7c 0000267e 6398 0
dba 00001ddf 00001494 2379
hba 00000ff5 00000ff5 0
!!!Leak!!! < ----
NP7 may become stuck affecting the traffic through FortiGate interfaces. For example, the LACP interface may become down even when physical interfaces are UP.
FGT1800F # diagnose npu np7 pmon all
[NP7_0]
EIF0_IGR EIF1_IGR EIF0_EGR EIF1_EGR HRX HTX DFR
-------- -------- -------- -------- -------- -------- -------- --------
Usage% 0 0 0 0 0 100 0
-------- -------- -------- -------- -------- -------- -------- --------
SSE0 SSE1 SSE2 SSE3
-------- -------- -------- -------- --------
Usage% 100 100 100 100 < ---
-------- -------- -------- -------- --------
IPSEC IPTI IPTO L2TI L2TO VEP IVS
-------- -------- -------- -------- -------- -------- -------- --------
Usage% 0 0 0 25 0 0 12
-------- -------- -------- -------- -------- -------- -------- --------
PLE MSE SYNK DSE NSS
-------- -------- -------- -------- -------- --------
Usage% 0 0 0 12 0
-------- -------- -------- -------- -------- --------
* EIFx_IGR: EIF ingress, EIFx_EGR: EIF egress
FortiGate1800F # diagnose npu np7 pdq
[NP7_0]
Name WPCnt RPCnt WBCnt RBCnt Delta-p Delta-b Stuck?
----------------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
eif[0].ehp.tunobip_ipdq 3848191412 3848191412 375204932 375204932 0 0 No
eif[0].ehp.quad_eg_ipdq 2507309921 2507309921 1497964509 1497964509 0 0 No
eif[1].ehp.tunobip_ipdq 3848046284 3848046284 374192309 374192309 0 0 No
eif[2].ehp.tunobip_ipdq 3848102457 3848102457 374922694 374922694 0 0 No
eif[3].ehp.tunobip_ipdq 3847871656 3847871656 373644574 373644574 0 0 No
eif[4].ehp.tunobip_ipdq 25972296 25972296 67919811 67919811 0 0 No
eif[4].ehp.quad_eg_ipdq 101111740 101111740 262900913 262900913 0 0 No
eif[5].ehp.tunobip_ipdq 25930808 25930808 67738360 67738360 0 0 No
eif[6].ehp.tunobip_ipdq 24689761 24689761 64103404 64103404 0 0 No
eif[7].ehp.tunobip_ipdq 24518875 24518875 63139338 63139338 0 0 No
sse[0].pdq 252161747 252161719 3155571431 3155571367 28 64 Yes
sse[1].pdq 147218542 147218514 2871406521 2871406455 28 66 Yes
sse[2].pdq 3174424944 3174424915 4025955903 4025955837 29 66 Yes
sse[3].pdq 3888526567 3888526539 1423333281 1423333216 28 65 Yes
hrx.ipdq 587270431 587270431 1755681627 1755681627 0 0 No
hrx.dswh_ipdq 587270431 587270431 1755681627 1755681627 0 0 No
hrx.dswh_opdq[0] 587270431 587270431 1755681627 1755681627 0 0 No
hrx.tunpdq[0] 148911722 148911722 445051451 445051451 0 0 No
FortiGate1800F # diagnose netlink aggregate name LAG_CORE
LACP flags: (A|P)(S|F)(A|I)(I|O)(E|D)(E|D)
(A|P) - LACP mode is Active or Passive
(S|F) - LACP speed is Slow or Fast
(A|I) - Aggregatable or Individual
(I|O) - Port In sync or Out of sync
(E|D) - Frame collection is Enabled or Disabled
(E|D) - Frame distribution is Enabled or Disabled
status: down
npu: n
flush: n
asic helper: y
npu-grp: 0(HS:1)
ports: 2
link-up-delay: 50ms
min-links: 1
ha: backup
distribution algorithm: L4
LACP mode: active
LACP speed: slow
LACP HA: enable
aggregator ID: 2
actor key: 33
actor MAC address: 48:3a:01:6d:22:26
partner key: 33
partner MAC address: 78:18:ec:fc:55:13
member: port34
index: 0
link status: up < ---
link failure count: 0
permanent MAC addr: 48:3a:01:6d:22:26
LACP state: negotiating
LACPDUs RX/TX: 2659/3104
actor state: ASAIDD
actor port number/key/priority: 1 33 255
partner state: ASAODD
partner port number/key/priority: 65 33 255
partner system: 65535 78:18:ec:fc:55:13
aggregator ID: 2
speed/duplex: 10000 1
RX state: CURRENT 6
MUX state: ATTACHED 3
member: port35
index: 1
link status: up < ---
link failure count: 0
permanent MAC addr: 48:3a:01:6d:22:27
LACP state: negotiating
LACPDUs RX/TX: 2654/3103
actor state: ASAIDD
actor port number/key/priority: 2 33 255
partner state: ASAODD
partner port number/key/priority: 1 33 255
partner system: 65535 78:18:ec:fc:55:13
aggregator ID: 2
speed/duplex: 10000 1
RX state: CURRENT 6
MUX state: ATTACHED 3
To solve this issue, the customer may reboot the device to recover the services.
To fix this behavior, the following changes should be applied on FortiGate NPU settings:
config system npu
set dedicated-management-cpu enable
set dedicated-lacp-queue enable
set htab-msg-queue dedicated
set vlan-lookup-cache disable
end
Note: 'vlan-lookup-cache disable' forces the FortiGate's reboot, so it should be applied during a maintenance window. If there is an HA setup in place, both devices (primary and secondary) will be rebooted at the same time.
The VLAN lookup cache has an 8K SPV/TPV entry limit. Since the SPV/TPV table is shared by VLAN and other virtual interfaces (e.g., IPsec), exceeding this limit can cause a PBA leak.
If only VLAN interfaces are used, the maximum supported number is 8K VLANs.
set dedicated-management-cpu enable
set dedicated-lacp-queue enable
The two commands above will help assign a dedicated queue and CPU resource for LACP processing.
set htab-msg-queue dedicated
This setting helps to reduce the impact of the message, which further reduces the
congestion at NP7.
set vlan-lookup-cache disable
This setting helps to prevent PBA leak if the SPV/TPV limit is reached.
The following commands can be run periodically to see the behavior of SPV/TPV:
fnsysctl cat proc/net/np7/np7_0/tbl/cdb_tpv_htab_csr_info
fnsysctl cat proc/net/np7/np7_0/tbl/cdb_spv_htab_csr_info
From versions 7.6.5 and 8.0.0 those settings are disabled by default:
'1138921 - Suggest to change the default NPU setting to reduce the high-frequent of spv/tpv table messages'.
|