Technical Tip: Fortinet SD-WAN Remote SLAs

stroia · ‎02-12-2025

Description

This article describes Fortinet SD-WAN Remote SLAs and is divided into 2 parts:

First part: How Remote SLAs work.
Second part: Remote SLA Troubleshooting guide.

Scope

FortiGate.

Solution

How Remote SLAs work.

Performance SLAs (Health-Check in CLI) are a pillar of Fortinet SD-WAN, they are used to monitor the status of SD-WAN members.

Here is the most recent official documentation regarding them: Performance SLA.

As described in the document a Performance SLA can work in 5 ways, this article is regarding a Performance SLA configured Remotely (with the info regarding the link health obtained from the remote peers via ICMP probes packets), usually called: Remote SLAs.

It is necessary, first, to know how to configure them, which is explained here: Embedded SD-WAN SLA information in ICMP probes - FortiOS 7.2.1.

After that, it is necessary to know that:

This feature is available from the FortiOS firmware release v7.2.1.
It is typically configured on FortiGates acting like SD-WAN Hub, to choose the best SD-WAN member to send traffic, without actively monitoring all the paths to reach all Spokes.
They influenced the FortiGates decision regarding the SD-WAN member to choose, updating related route priorities.
How route priorities influence FortiGate decisions is explained here: Routing behavior depending on distance and priority for static routes, and Policy Based Routes.
To take advantage of the feature is necessary that also the additional paths feature is configured, how to configure it is explained here: How to apply additional path.
For the 'return traffic' a FortiGate, like an SD-WAN Hub, uses the reply direction of the related session (here is how to read the session’s information: FortiGate session table information). Making an example: considering an application Server reachable via the Hub LAN, Hub sends the Server responses to clients connected to a Branch LAN, using the reply direction of the session, allocated for the related client request, without SD-WAN/routing lookup.
Until the FortiOS minor release v7.4 with all related patches, an SD-WAN Spoke with a Performance SLA configured can transmit, using ICMP probes packets, only values measured and the Hub compares those values with his local threshold; starting from FortiOS v7.6.0, a FortiGate can transmit the information if a member is in SLA or out of SLA, in this way also the route priorities of the Hub, are updated according to the local Spokes threshold, that can be the different between different Spokes, as explained here: Embedded SD-WAN SLA status in ICMP probes - FortiOS 7.6.

Remote SLA Troubleshooting guide.

Supposing a Remote SLA configured on an SD-WAN Hub, like this:

config health-check

edit "REMOTE_SLA_T1"

set detect-mode remote

set sla-id-redistribute 1

set members 3

config sla

edit 1

set link-cost-factor latency

set latency-threshold 100

set priority-in-sla 10

set priority-out-sla 20

That condition to be satisfied is necessary that:

The member is UP and monitored.
The routes coming through the member are associated with it.
The member is periodically measured, and measures are received from the Hub.
The Hub read measures received.
The Hub updates the priorities of associated routes.

The first, second, and fifth activities are performed by the daemon of the Hub called lnkmtd, the third from the Spokes and the fourth from another Hub's daemon called: lnkmt_passive.

In a Fortinet SD-WAN Hub and Spokes deployment with BGP on loopback as explained here: BGP on Loopback and with additional paths feature configured; for each subnet advertised from each Spoke, the Hub should have a route associated to each SD-WAN member usable to forward the traffic.

To verify the third and the fourth condition, is possible to use this command:

diag sys sdwan health-check remote REMOTE_SLA_T1

Remote Health Check: REMOTE_SLA_T1(3)

Passive remote statistics of Hub_T1(20):

Hub_T1_0(10.0.0.3): timestamp=01-17 17:42:52, latency=0.515, jitter=0.084, pktloss=0.000%, SLA id=1, pass

To verify the second is necessary to analyze the routing table, to see it the command is:

get router info routing-table all

Supposing that in the routing table there is a route for a subnet announced from a Spoke through the tunnel Hub_T1_0, configured like SD-WAN member 3, so monitored from the Remote SLA called: REMOTE_SLA_T1 shown at the beginning of the article, has the default priority 1, as shown here:

B 10.200.2.0/24 [200/0] via 10.150.1.2 (recursive via Hub_T1 tunnel 10.0.0.1 [1]), 00:21:53

There is incorrect behavior of the Hub because it should have priority 10 if the performances measured are below the threshold otherwise it should have priority 20.

For the first it is necessary to understand with which Spoke the tunnel is created, making the hypothesis of a dialup IPSec tunnel configured on the Hub, to find the name of the tunnel created with a specific Spoke way could go to from the Hub GUI to: Dashboard -> IPSec Monitor page and filtering, for example, the Remote Gateway Column, specifying the IP of the interface used from that IPSec tunnel on the Spoke, as shown here:

Here is an explanation of how to add the IPSec Monitor page in the GUI: Adding Fortiview Widgets.

Returning to the wrong behavior observed, it could be caused by the bug 1109286, fixed starting from the FortiOS release v7.6.3 and affecting all FortiOS v7.2, v7.4 patches, and the first 2 patches of the minor release v7.6 (FortiOS firmware version terminology).

The trigger condition of the bug is a crash of the Hub daemon iked which causes a rekey of all IPSec tunnels.

The see all daemon crashes execute from CLI:

diagnose debug crashlog read

If there was an iked crash, similar rows will be listed:

755: 2024-12-03 20:25:04 <00565> firmware FortiGate-3400E v7.2.5,build1517b1517,230606 (GA.F) (Release)
756: 2024-12-03 20:25:04 <00565> application iked
757: 2024-12-03 20:25:04 <00565> *** signal 11 (Segmentation fault) received ***
758: 2024-12-03 20:25:04 <00565> Register dump:

The are multiple causes for an iked crash and need to be investigated with a ticket to the Fortinet TAC; one of them is: in an SD-WAN Hub done with a FortiGates Cluster and with more than 3000 Spokes a crash with signal 11 can be caused by the bug 0951667, resolved in the FortiOS firmware release 7.2.11 and 7.4.2 and all newer.

After the crash and until the entire Cluster (not only a unit) is rebooted, the priorities will be no longer updated for many routes.

A workaround to solve the issue is to kill the process created from the lnkmtd daemon, here is explained how to kill a process: Find and restart/kill a process on a FortiGate.

Related documents:

Documents and articles regarding Fortinet SD-WAN Troubleshooting:

Troubleshooting Tip: SD-WAN performance SLA down

Technical Tip: Explaining the SD-WAN rule matching process

Troubleshooting SD-WAN

Technical Tip: Fortinet's Secure SD-WAN Resource List

Article regarding IPSec Troubleshooting on FortiGate:

Troubleshooting Tip: IPsec VPN tunnels

Technical Tip: FortiGate IPSec VPN resource list

Technical Tip: Fortinet SD-WAN Remote SLAs

You are leaving our website