Description
This article explains how SDWAN Performance SLA functions with multiple servers.
Scope
FortiGate.
Solution
This article will use the example of two server IPs configured under the 'SLA_Internet' health check:
# config health-check
edit "SLA_Internet"
set server "198.18.21.2" "198.18.22.2"
set update-static-route disable
set members 1 2
config sla
edit 1
set link-cost-factor latency
set latency-threshold 300
next
end
end
The current behavior, as of the time this is article is written (in the current latest version, FortiOS 7.4.5), is that when there are 2 servers in health check, as long as the first server is reachable by at least 1 member, the health check will only use this server. It will use the second server only when all members fail to reach the first server.
During every second (set interval 500), both SLA servers are probed over the SD-WAN participant interfaces (set members 1 2):
# diag sniffer packet any 'host 198.18.21.2 or host 198.18.22.2' 4
Using Original Sniffing Mode
interfaces=[any]
filters=[host 198.18.21.2 or host 198.18.22.2]
0.239732 port1 out 198.18.11.1 -> 198.18.21.2: icmp: echo request
0.239915 port1 out 198.18.11.1 -> 198.18.22.2: icmp: echo request
0.240035 port2 out 198.18.12.1 -> 198.18.21.2: icmp: echo request
0.240136 port2 out 198.18.12.1 -> 198.18.22.2: icmp: echo request
0.242943 port2 in 198.18.22.2 -> 198.18.12.1: icmp: echo reply
0.242982 port1 in 198.18.22.2 -> 198.18.11.1: icmp: echo reply
0.423018 port1 in 198.18.21.2 -> 198.18.11.1: icmp: echo reply
0.423811 port2 in 198.18.21.2 -> 198.18.12.1: icmp: echo reply
0.735073 port1 out 198.18.11.1 -> 198.18.21.2: icmp: echo request
0.735210 port1 out 198.18.11.1 -> 198.18.22.2: icmp: echo request
0.735266 port2 out 198.18.12.1 -> 198.18.21.2: icmp: echo request
0.735356 port2 out 198.18.12.1 -> 198.18.22.2: icmp: echo request
0.739354 port2 in 198.18.22.2 -> 198.18.12.1: icmp: echo reply
0.739393 port1 in 198.18.22.2 -> 198.18.11.1: icmp: echo reply
0.917637 port1 in 198.18.21.2 -> 198.18.11.1: icmp: echo reply
0.918004 port2 in 198.18.21.2 -> 198.18.12.1: icmp: echo reply
Both SLA servers must fail to consider the SD-WAN member as dead. If either of them is reachable, the member is considered alive.
Despite how two SLA servers are configured, the health-check commands only show a single value per SLA metric (latency, jitter, loss). They do not show the values for each SLA server.
This is by design. FortiOS measures the quality of the link itself, and does not measure the quality of each individual SLA.
For example, the first server '198.18.21.2' is experiencing higher latency than the second server '198.18.22.2':
# exe ping 198.18.21.2
PING 198.18.21.2 (198.18.21.2): 56 data bytes
64 bytes from 198.18.21.2: icmp_seq=0 ttl=254 time=183.3 ms
64 bytes from 198.18.21.2: icmp_seq=1 ttl=254 time=182.8 ms
64 bytes from 198.18.21.2: icmp_seq=2 ttl=254 time=182.6 ms
64 bytes from 198.18.21.2: icmp_seq=3 ttl=254 time=183.2 ms
64 bytes from 198.18.21.2: icmp_seq=4 ttl=254 time=183.4 ms
--- 198.18.21.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 182.6/183.0/183.4 ms
FGT-SDW-1 # exe ping 198.18.22.2
PING 198.18.22.2 (198.18.22.2): 56 data bytes
64 bytes from 198.18.22.2: icmp_seq=0 ttl=254 time=3.9 ms
64 bytes from 198.18.22.2: icmp_seq=1 ttl=254 time=2.0 ms
64 bytes from 198.18.22.2: icmp_seq=2 ttl=254 time=2.2 ms
64 bytes from 198.18.22.2: icmp_seq=3 ttl=254 time=5.8 ms
64 bytes from 198.18.22.2: icmp_seq=4 ttl=254 time=1.8 ms
--- 198.18.22.2 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 1.8/3.1/5.8 ms
So long as the first server in the list is alive, its SLA metrics can be viewed with diag sys virtual-wan-link health-check:
# diagnose sys sdwan health-check status SLA_Internet
Health Check(SLA_Internet):
Seq(1 port1): state(alive), packet-loss(0.000%) latency(183.431), jitter(0.809), mos(4.220), bandwidth-up(9999995), bandwidth-dw(9999994), bandwidth-bi(19999989) sla_map=0x1
Seq(2 port2): state(alive), packet-loss(0.000%) latency(183.223), jitter(1.106), mos(4.219), bandwidth-up(9999996), bandwidth-dw(9999996), bandwidth-bi(19999992) sla_map=0x1
When the first server goes down, the SLA starts to probe the statistics of the second server:
# diagnose sys sdwan health-check status SLA_Internet
Health Check(SLA_Internet):
Seq(1 port1): state(alive), packet-loss(0.000%) latency(3.500), jitter(1.355), mos(4.401), bandwidth-up(9999996), bandwidth-dw(9999996), bandwidth-bi(19999992) sla_map=0x1
Seq(2 port2): state(alive), packet-loss(0.000%) latency(3.433), jitter(1.207), mos(4.401), bandwidth-up(9999996), bandwidth-dw(9999996), bandwidth-bi(19999992) sla_map=0x1
If both servers are probed over multiple links, the SLA metrics of the second server are only used if the first server is unreachable through all links.
The SLA metrics of the first server (198.18.21.2) are used as long as there are probe responses from 198.18.21.2 over member 1 or member 2.
The SLA metrics of the second server (198.18.22.2) are used only if there are no probe responses from the first server 198.18.21.2 over member 1 and member 2.
Related documents: