FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
lcamilo
Staff
Staff
Article Id 268524
Description

 

This article describes the best configuration steps for an SD-WAN design that uses two or more links. One primary is used for all internet traffic while the secondary links are in a 'standby' state. The design must monitor the availability and reliability of the links, removing failed links and allowing sessions to move over to the secondary ISP.

The design keeps monitoring any failed link until it is back online, where sessions may be set to fallback or not, depending on environment needs. 

 

The caveats of this design and further details are addressed later in this article, as the nature of TLS sessions may become invalid in certain circumstances if the source NAT IP is changed. 

 

Scope

 

FortiGate with SD-WAN.

 

Solution

 

Given the topology below, this example will use two distinct ISP links, one connected to port2 and another connected to port10.
In this example, a new SD-WAN Zone called 'Internet' is created, intended to receive more ISP links and make the design easily scalable. 

sdwan isp failover.png


Make sure the ISP interfaces have no existing configurations or child 'references', as they may not be available for selection. 
That being said, this tutorial assumes this is a new configuration.

 
Step 1: Configure the ISP interfaces. 
 
Under Network -> Interfaces, edit port2 and port10 to set their static public IP and network masks. Add 'Alias' as required and set the Role as 'WAN'.
 
step1_interface.png

 

Step 2: Test the P2P connection.
 
With only this setting, it must be possible to ping the Gateway's IP as it is on the same collision domain. 
Use 'exec ping' to test the gateway's reachability before adding any further settings. Use ping-options to define the interface to test. 
 

exec ping-options interface port2

exec ping 198.51.100.17
PING 198.51.100.17 (198.51.100.17): 56 data bytes
64 bytes from 198.51.100.17: icmp_seq=0 ttl=255 time=0.2 ms
64 bytes from 198.51.100.17: icmp_seq=1 ttl=255 time=0.4 ms
64 bytes from 198.51.100.17: icmp_seq=2 ttl=255 time=0.1 ms
64 bytes from 198.51.100.17: icmp_seq=3 ttl=255 time=0.4 ms
64 bytes from 198.51.100.17: icmp_seq=4 ttl=255 time=0.3 ms

 

--- 198.51.100.17 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.2/0.4 ms

 

Step 3: Create the sd-wan zone.

 

The SD-WAN zone will allow a logical segmentation, which is especially useful when there are links used for different purposes present, such as MPLS, IPSEC, Internet, Lan-to-LAN, etc. Keeping the static routes associated with that link in mind will make it easier to design zones when required. 
Navigate to Network -> SD-WAN -> SD-WAN Zones -> Create New -> SD-WAN Zone.
 
step3_sdwan_zone.png
 
  • Give it a name. 'Internet' will be used for this example.
  • Select 'Interface members' and another menu will pop up. Notice it does not contain the new ISP yet.
  • Select 'Create' and a new 'Edit SD-WAN Member' menu will pop up.

step3_sdwan_members.png

 

  • Select the Interface (port2 and port10 in this case).
  • SD-WAN Zone = Internet.
  • Gateway = Add the Gateway here.
  • Select Ok to save.
Repeat for additional ISPs / Interfaces and make sure to add them to the same zone. 
 
Step 4: Adjust the Performance SLAs.
 
Delete or disable entries that are not necessary. They may otherwise contribute to false positives if not properly set or tested. 
To disable an entry, simply edit it, select Participants -> Specify, and then leave the list empty. 
This way the entry will not contribute to interface metrics and statuses. 
 
step4_adjust_probe.png
 
Keep in mind that Link Status and SLA Target are two distinct options that will trigger different outcomes despite working together.
This design will rely on the Link Status Probes and the 'Update Static Route' setting where, when a failure occurs, the link will be put in a 'logical' downstate, causing it to be removed from the Routing FIB and subsequently triggering multiple other options such as dirtying sessions and causing source-nat changes. 
 
When configuring multiple servers on the same probe, this will be an 'AND' circuit where both must fail for the rule to trigger a 'failure' state. 
When configuring more than one 'Performance SLA Rule', all rules will influence the participant interfaces. They all must fail at the same time.
 
In this design, the plan is to have two Performance SLA Rules (Default_Office_365 and status_probe) where the Office_365 uses an HTTP probe while the status_probe uses Ping. This will provide better accuracy in most cases when determining if one link is down and when it is back online. It is important to note that in these cases, it is ideal to ensure the Check interval, Failures before inactive, and Restore link After fields have the same values in both rules. 
 
step4_main_probe.png
 
It is typically good practice to set a high value in the 'Restore link after' field to prevent flaps. Assume an 'unhealthy' link should be avoided for some time until it is ensured to be reliable again. 
 
Step 5: Create one SD-WAN Rule.
 
A policy using a 'reverse' version of an RFC1918 private address that would match all 'public Internet Ranges' will be created. This method will prevent false positives from occurring on private addresses from egressing the network.  
Optionally, use the public_range script attached to this article to create those objects, or use 'All' as the destination. 
 
step5_rule_match_criteria.png
 
This policy uses a 'manual' strategy to provide control over which interface is primary by setting it at the top of the list. Notice port10 was selected first and is at the top of the list, making it the 'primary' interface during use.
After defining these settings, select Ok to save. 
 
step5_action.png

 

Step 6: Static Routes / FIB.
 
Next, tell the system to use the SD-WAN zone called 'Internet' as a 0.0.0.0/0 default route. 
Under Network -> Static Routes -> New static route, create a new entry set the SD-WAN 'Internet' as the interface, and save it. 
 
step6_static_route.png
 
Step 7: Firewall policy.
 
The firewall policy must allow sessions from the LAN to the 'Internet' zone. 
Navigate to Policy & Objects -> Firewall Policy -> Create New Firewall Policy
 
step7_firewall_match.png
 
Make sure to enable NAT and set it as 'Use Outgoing Interface Address'.
 
step7_firewall_nat.png
 
If the ISP provides transit networks where the public IP Block fluctuates over BGP, use 'IP Pool' on this step. 
With this setting, it is possible to achieve FULL SD-WAN Transitions with minor or close to zero impact on any active sessions. 
See this article for more details. In this case, make sure to enable 'auxiliary-session'.
 
Step 8: Fine-tuning.
 
Finally, it is recommended to fine-tune a few settings to make sure failover occurs smoothly.
 
Optionally, enable snat-route-change to force a 'fallback' of sessions when the primary link is healthy again. 
The viability of this option depends on the goals of the setup, but it may be recommended to have this setting 'disabled' and allow sessions to keep using their current ISP until they have finished allowing the fewest transitions. 
 
config system global
    set snat-route-change disable (default setting)
end
 
Make sure all ISP interfaces have 'preserve-session-route' disabled. It is disabled by default, but it is best practice to check to ensure the setting is disabled.
This setting will make sure sessions have their current routing information 'removed' and marked as 'dirty', meaning a new interface can be chosen. 
 
config system interface
    edit <interface>
        set preserve-session-route disable (default setting)
    next
end

 

Testing and homologating the design.

 

The easiest way to check the outgoing interface of traffic in real time is to use a sniffer. This example test uses one IP (8.8.4.4) that is not used for probing on performance SLA to prevent seeing duplicate packets.

 

This example also introduces a 'logic' failure on the link being tested to ensure that the traffic stops, but the interface is not disconnected. This is very important to the test because it is only desirable for the probe to fail. The fiber is disconnected from the Primary ISP, allowing port2 and port10 to remain unchanged.  

 

diag sniffer packet any "icmp and host 8.8.4.4" 4

 

From the 'Workstation' in the Topology diagram at the very top of this article, a constant ping was started to 8.8.4.4 which received successful replies.

The sniffer results below show the moment where there was a transition between port10 and port2 after the fibre cable was pulled from the port10 ISP's router.

 

test_001.png

 

It is also possible to use an SD-WAN health check to verify the 'current' status of those links and probes:

 

diag sys sdwan health-check status <name or Enter>

 

Note that port10 state is 'dead' and it is in a 'logical' failed state. 

 

test_002.png

 

The option 'update static-route' causes the link to be removed from the FIB database.

 

get router info routing-table all

 

test_003.png

 

Note that the session 'serial' (or ID) has changed despite keeping the same 5-tuple information (original source port 3511). Even so, it is still egressing the FortiGate using a different dev (interface) and gateway after the failure. 

 

The setting 'snat-route-change' mostly controls the fallback behavior. Sessions are flushed upon link failures if the SNAT IP is changed. This is an expected behavior response that allows a new route lookup to take place and the secondary interface to be chosen. A new firewall policy lookup will also be required to determine a new SNAT address. Without it, there is no way the 5-tuple can still be considered 'valid' or consistent. 

However, TCP and applications should rely on re-connect or Layer-7 resume mechanisms. 

 

test_004.png

 

snat-route-change enable <- Sessions will fail back instantly.

snat-route-change disable <- Sessions will keep using the current link for the duration of the session. New sessions use the currently active link.

 

If the current case is the exception mentioned in Step7 and the public IP is fluctuating over different links, the session will not have been flushed, as demonstrated in the screenshot below. Upon a failover event, the session is 'dirtied' and updated. It is important to note that this is an advanced setting and requires multiple additional steps and testing. Remember that the public IP can only exist in one place at a time, so synchronized updates must take place for this to work smoothly. 

 

test_005.png

 

Optionally, consult the historical information of the Link Status Probes individually.

If a longer history is required, then consult Log & Report -> System Events -> SD-WAN Events.

 

diag sys sdwan sla-log status-probe 1

diag sys sdwan sla-log status-probe 2

 

test_006.png

 

 

Conclusion: 

By applying the settings above, the desired SD-WAN failover and fail-back can be achieved as shown in the logs and results. 

The newly created SD-WAN zone called 'Internet' can receive other links in the future and is scalable. 

The 'all Internet IP' object allows better compatibility with policies using RFC1918 private IPs if a VPN zone is added to the SD-WAN in the future. 

The nature of TCP, TLS, and other 'session' oriented protocols may invalidate sessions if the source public IP changes.

 

Related articles: