Technical Tip: SD-WAN Performance SLA with multiple servers
Description
This article describes how SDWAN Performance SLA functions with multiple servers.
Scope
FortiGate.
Solution
This article will use the example of two server IPs configured under the 'SLA_Internet' health check:
config health-check
edit "SLA_Internet"
set server "198.18.21.2" "198.18.22.2"
set update-static-route disable
set members 1 2
config sla
edit 1
set link-cost-factor latency
set latency-threshold 300
next
end
end
Refer to the snippet below for configuring via GUI:

The current behavior, as of the time this article is written (in the current latest version, FortiOS v7.4.5), is that when there are 2 servers in health check, as long as the first server is reachable by at least 1 member, the health check will only use this server. It will use the second server only when all members fail to reach the first server.
During every second (set interval 500), both SLA servers are probed over the SD-WAN participant interfaces (set members 1 2):
diagnose sniffer packet any 'host 198.18.21.2 or host 198.18.22.2' 4
Using Original Sniffing Mode
interfaces=[any]
filters=[host 198.18.21.2 or host 198.18.22.2]
0.239732 port1 out 198.18.11.1 -> 198.18.21.2: icmp: echo request
0.239915 port1 out 198.18.11.1 -> 198.18.22.2: icmp: echo request
0.240035 port2 out 198.18.12.1 -> 198.18.21.2: icmp: echo request
0.240136 port2 out 198.18.12.1 -> 198.18.22.2: icmp: echo request
0.242943 port2 in 198.18.22.2 -> 198.18.12.1: icmp: echo reply
0.242982 port1 in 198.18.22.2 -> 198.18.11.1: icmp: echo reply
0.423018 port1 in 198.18.21.2 -> 198.18.11.1: icmp: echo reply
0.423811 port2 in 198.18.21.2 -> 198.18.12.1: icmp: echo reply
0.735073 port1 out 198.18.11.1 -> 198.18.21.2: icmp: echo request
0.735210 port1 out 198.18.11.1 -> 198.18.22.2: icmp: echo request
0.735266 port2 out 198.18.12.1 -> 198.18.21.2: icmp: echo request
0.735356 port2 out 198.18.12.1 -> 198.18.22.2: icmp: echo request
0.739354 port2 in 198.18.22.2 -> 198.18.12.1: icmp: echo reply
0.739393 port1 in 198.18.22.2 -> 198.18.11.1: icmp: echo reply
0.917637 port1 in 198.18.21.2 -> 198.18.11.1: icmp: echo reply
0.918004 port2 in 198.18.21.2 -> 198.18.12.1: icmp: echo reply
Both SLA servers must fail to consider the SD-WAN member as dead. If either of them is reachable, the member is considered alive.
Despite how two SLA servers are configured, the health-check commands only show a single value per SLA metric (latency, jitter, loss). They do not show the values for each SLA server.
This is by design. FortiOS measures the quality of the link itself and does not measure the quality of each SLA.
For example, the first server '198.18.21.2' is experiencing higher latency than the second server '198.18.22.2':
execute ping 198.18.21.2
PING 198.18.21.2 (198.18.21.2): 56 data bytes
64 bytes from 198.18.21.2: icmp_seq=0 ttl=254 time=183.3 ms
64 bytes from 198.18.21.2: icmp_seq=1 ttl=254 time=182.8 ms
64 bytes from 198.18.21.2: icmp_seq=2 ttl=254 time=182.6 ms
64 bytes from 198.18.21.2: icmp_seq=3 ttl=254 time=183.2 ms
64 bytes from 198.18.21.2: icmp_seq=4 ttl=254 time=183.4 ms
--- 198.18.21.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 182.6/183.0/183.4 ms
FGT-SDW-1 # exe ping 198.18.22.2
PING 198.18.22.2 (198.18.22.2): 56 data bytes
64 bytes from 198.18.22.2: icmp_seq=0 ttl=254 time=3.9 ms
64 bytes from 198.18.22.2: icmp_seq=1 ttl=254 time=2.0 ms
64 bytes from 198.18.22.2: icmp_seq=2 ttl=254 time=2.2 ms
64 bytes from 198.18.22.2: icmp_seq=3 ttl=254 time=5.8 ms
64 bytes from 198.18.22.2: icmp_seq=4 ttl=254 time=1.8 ms
--- 198.18.22.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 1.8/3.1/5.8 ms
So long as the first server in the list is alive, its SLA metrics can be viewed with diagnose sys virtual-wan-link health-check:
diagnose sys sdwan health-check status SLA_Internet
Health Check(SLA_Internet):
Seq(1 port1): state(alive), packet-loss(0.000%) latency(183.431), jitter(0.809), mos(4.220), bandwidth-up(9999995), bandwidth-dw(9999994), bandwidth-bi(19999989) sla_map=0x1
Seq(2 port2): state(alive), packet-loss(0.000%) latency(183.223), jitter(1.106), mos(4.219), bandwidth-up(9999996), bandwidth-dw(9999996), bandwidth-bi(19999992) sla_map=0x1

When the first server goes down, the SLA starts to probe the statistics of the second server:
diagnose sys sdwan health-check status SLA_Internet
Health Check(SLA_Internet):
Seq(1 port1): state(alive), packet-loss(0.000%) latency(3.500), jitter(1.355), mos(4.401), bandwidth-up(9999996), bandwidth-dw(9999996), bandwidth-bi(19999992) sla_map=0x1
Seq(2 port2): state(alive), packet-loss(0.000%) latency(3.433), jitter(1.207), mos(4.401), bandwidth-up(9999996), bandwidth-dw(9999996), bandwidth-bi(19999992) sla_map=0x1

If both servers are probed over multiple links, the SLA metrics of the second server are only used if the first server is unreachable through all links.
The SLA metrics of the first server (198.18.21.2) are used as long as there are probe responses from 198.18.21.2 over member 1 or member 2.
The SLA metrics of the second server (198.18.22.2) are used only if there are no probe responses from the first server 198.18.21.2 over member 1 and member 2.
Note: Difference between Single Performance SLA with Two Servers and Dual Performance SLA with Each SLA Containing One Server.
Single performance SLA with two servers.
- The SLA will consider the state of both servers together. This means that both servers must fail for the SLA to trigger a failure state.
- This setup is essentially an 'AND' circuit, where both servers need to be unreachable for the rule to consider the link as down.
- This configuration is useful when it is wanted to ensure that both servers are down before rerouting traffic, which might be suitable for services that are mirrored or load-balanced across multiple servers.
Dual performance SLA with each SLA containing one server.
- Each SLA operates independently. If one server fails, the corresponding SLA will trigger a failure state for that server.
- This setup allows for more granular control and can provide quicker failover since each server is monitored separately.
- This configuration is beneficial when you want to ensure high availability and quick failover.
- If one server goes down, traffic can be rerouted based on the status of the other SLA.
Related documents:
