Created on
‎02-12-2025
01:25 AM
Edited on
‎02-13-2025
08:01 AM
By
Jean-Philippe_P
Description |
This article describes Fortinet SD-WAN Remote SLAs and is divided into 2 parts:
|
Scope | FortiGate. |
Solution |
Performance SLAs (Health-Check in CLI) are a pillar of Fortinet SD-WAN, they are used to monitor the status of SD-WAN members. Here is the most recent official documentation regarding them: Performance SLA.
As described in the document a Performance SLA can work in 5 ways, this article is regarding a Performance SLA configured Remotely (with the info regarding the link health obtained from the remote peers via ICMP probes packets), usually called: Remote SLAs.
It is necessary, first, to know how to configure them, which is explained here: Embedded SD-WAN SLA information in ICMP probes - FortiOS 7.2.1.
After that, it is necessary to know that:
Supposing a Remote SLA configured on an SD-WAN Hub, like this:
config health-check edit "REMOTE_SLA_T1" set detect-mode remote set sla-id-redistribute 1 set members 3 config sla edit 1 set link-cost-factor latency set latency-threshold 100 set priority-in-sla 10 set priority-out-sla 20 next
The expectation is: that all routes coming through the SD-WAN member 3 should have a priority of 10 or 20.
That condition to be satisfied is necessary that:
The first, second, and fifth activities are performed by the daemon of the Hub called lnkmtd, the third from the Spokes and the fourth from another Hub's daemon called: lnkmt_passive.
In a Fortinet SD-WAN Hub and Spokes deployment with BGP on loopback as explained here: BGP on Loopback and with additional paths feature configured; for each subnet advertised from each Spoke, the Hub should have a route associated to each SD-WAN member usable to forward the traffic.
To verify the third and the fourth condition, is possible to use this command:
diag sys sdwan health-check remote REMOTE_SLA_T1 Remote Health Check: REMOTE_SLA_T1(3) Passive remote statistics of Hub_T1(20): Hub_T1_0(10.0.0.3): timestamp=01-17 17:42:52, latency=0.515, jitter=0.084, pktloss=0.000%, SLA id=1, pass
To verify the second is necessary to analyze the routing table, to see it the command is:
get router info routing-table all
Supposing that in the routing table there is a route for a subnet announced from a Spoke through the tunnel Hub_T1_0, configured like SD-WAN member 3, so monitored from the Remote SLA called: REMOTE_SLA_T1 shown at the beginning of the article, has the default priority 1, as shown here:
B 10.200.2.0/24 [200/0] via 10.150.1.2 (recursive via Hub_T1 tunnel 10.0.0.1 [1]), 00:21:53
There is incorrect behavior of the Hub because it should have priority 10 if the performances measured are below the threshold otherwise it should have priority 20.
For the first it is necessary to understand with which Spoke the tunnel is created, making the hypothesis of a dialup IPSec tunnel configured on the Hub, to find the name of the tunnel created with a specific Spoke way could go to from the Hub GUI to: Dashboard -> IPSec Monitor page and filtering, for example, the Remote Gateway Column, specifying the IP of the interface used from that IPSec tunnel on the Spoke, as shown here:
Here is an explanation of how to add the IPSec Monitor page in the GUI: Adding Fortiview Widgets.
Returning to the wrong behavior observed, it could be caused by the bug 1109286, fixed starting from the FortiOS release v7.6.3 and affecting all FortiOS v7.2, v7.4 patches, and the first 2 patches of the minor release v7.6 (FortiOS firmware version terminology).
The trigger condition of the bug is a crash of the Hub daemon iked which causes a rekey of all IPSec tunnels. The see all daemon crashes execute from CLI:
diagnose debug crashlog read
If there was an iked crash, similar rows will be listed:
755: 2024-12-03 20:25:04 <00565> firmware FortiGate-3400E v7.2.5,build1517b1517,230606 (GA.F) (Release)
The are multiple causes for an iked crash and need to be investigated with a ticket to the Fortinet TAC; one of them is: in an SD-WAN Hub done with a FortiGates Cluster and with more than 3000 Spokes a crash with signal 11 can be caused by the bug 0951667, resolved in the FortiOS firmware release 7.2.11 and 7.4.2 and all newer.
After the crash and until the entire Cluster (not only a unit) is rebooted, the priorities will be no longer updated for many routes.
A workaround to solve the issue is to kill the process created from the lnkmtd daemon, here is explained how to kill a process: Find and restart/kill a process on a FortiGate.
Related documents:
Troubleshooting Tip: SD-WAN performance SLA down Technical Tip: Explaining the SD-WAN rule matching process Technical Tip: Fortinet's Secure SD-WAN Resource List
|