Created on 08-15-2023 01:49 AM Edited on 10-17-2024 10:25 PM By Jean-Philippe_P
This article describes the best configuration steps for an SD-WAN design that uses two or more links. One primary is used for all internet traffic while the secondary links are in a 'standby' state. The design must monitor the availability and reliability of the links, removing failed links and allowing sessions to move over to the secondary ISP.
The design keeps monitoring any failed link until it is back online, where sessions may be set to fallback or not, depending on environment needs.
The caveats of this design and further details are addressed later in this article, as the nature of TLS sessions may become invalid in certain circumstances if the source NAT IP is changed.
Scope
FortiGate with SD-WAN.
Solution
Given the topology below, this example will use two distinct ISP links, one connected to port2 and another connected to port10.
In this example, a new SD-WAN Zone called 'Internet' is created, intended to receive more ISP links and make the design easily scalable.
Make sure the ISP interfaces have no existing configurations or child 'references', as they may not be available for selection.
That being said, this tutorial assumes this is a new configuration.
exec ping-options interface port2
exec ping 198.51.100.17
PING 198.51.100.17 (198.51.100.17): 56 data bytes
64 bytes from 198.51.100.17: icmp_seq=0 ttl=255 time=0.2 ms
64 bytes from 198.51.100.17: icmp_seq=1 ttl=255 time=0.4 ms
64 bytes from 198.51.100.17: icmp_seq=2 ttl=255 time=0.1 ms
64 bytes from 198.51.100.17: icmp_seq=3 ttl=255 time=0.4 ms
64 bytes from 198.51.100.17: icmp_seq=4 ttl=255 time=0.3 ms
--- 198.51.100.17 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.2/0.4 ms
Step 3: Create the sd-wan zone.
Testing and homologating the design.
The easiest way to check the outgoing interface of traffic in real time is to use a sniffer. This example test uses one IP (8.8.4.4) that is not used for probing on performance SLA to prevent seeing duplicate packets.
This example also introduces a 'logic' failure on the link being tested to ensure that the traffic stops, but the interface is not disconnected. This is very important to the test because it is only desirable for the probe to fail. The fiber is disconnected from the Primary ISP, allowing port2 and port10 to remain unchanged.
diag sniffer packet any "icmp and host 8.8.4.4" 4
From the 'Workstation' in the Topology diagram at the very top of this article, a constant ping was started to 8.8.4.4 which received successful replies.
The sniffer results below show the moment where there was a transition between port10 and port2 after the fibre cable was pulled from the port10 ISP's router.
It is also possible to use an SD-WAN health check to verify the 'current' status of those links and probes:
diag sys sdwan health-check status <name or Enter>
Note that port10 state is 'dead' and it is in a 'logical' failed state.
The option 'update static-route' causes the link to be removed from the FIB database.
get router info routing-table all
Note that the session 'serial' (or ID) has changed despite keeping the same 5-tuple information (original source port 3511). Even so, it is still egressing the FortiGate using a different dev (interface) and gateway after the failure.
The setting 'snat-route-change' mostly controls the fallback behavior. Sessions are flushed upon link failures if the SNAT IP is changed. This is an expected behavior response that allows a new route lookup to take place and the secondary interface to be chosen. A new firewall policy lookup will also be required to determine a new SNAT address. Without it, there is no way the 5-tuple can still be considered 'valid' or consistent.
However, TCP and applications should rely on re-connect or Layer-7 resume mechanisms.
snat-route-change enable <- Sessions will fail back instantly.
snat-route-change disable <- Sessions will keep using the current link for the duration of the session. New sessions use the currently active link.
If the current case is the exception mentioned in Step7 and the public IP is fluctuating over different links, the session will not have been flushed, as demonstrated in the screenshot below. Upon a failover event, the session is 'dirtied' and updated. It is important to note that this is an advanced setting and requires multiple additional steps and testing. Remember that the public IP can only exist in one place at a time, so synchronized updates must take place for this to work smoothly.
Optionally, consult the historical information of the Link Status Probes individually.
If a longer history is required, then consult Log & Report -> System Events -> SD-WAN Events.
diag sys sdwan sla-log status-probe 1
diag sys sdwan sla-log status-probe 2
Conclusion:
By applying the settings above, the desired SD-WAN failover and fail-back can be achieved as shown in the logs and results.
The newly created SD-WAN zone called 'Internet' can receive other links in the future and is scalable.
The 'all Internet IP' object allows better compatibility with policies using RFC1918 private IPs if a VPN zone is added to the SD-WAN in the future.
The nature of TCP, TLS, and other 'session' oriented protocols may invalidate sessions if the source public IP changes.
Related articles:
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.