In some cases, a few packet drops may lead to SLA Failover on FortiSASE and hence interruption on some TCP connections.
When Packet drops are observed from FortiSASE to any destination in Hub Side. From the HUB FortiGate Side, collect the below by following the steps:
- Open a Putty session on Hub FortiGate and execute the following commands:
diag sniffer packet any "host x.x.x.x and icmp" 4 0 l
- Then Start the ping from the Client side, after multiple drops are seen, stop the ping and analyze if the packets are coming from the same Tunnel or a different tunnel. If packets are coming first from one tunnel and after some time come in from another tunnel, this article can be followed to overcome this issue.
Below results on Hub FortiGate are an example of the results where traffic is switching to another tunnel because of SLA being dead on FortiSASE.
36.287308 DC-YYY-SASE2 in 10.255.18.53 -> 172.10.0.15: icmp: echo request 36.287319 LAN_DC out 10.255.18.53 -> 172.10.0.15: icmp: echo request 36.287320 x5 out 10.255.18.53 -> 172.10.0.15: icmp: echo request 36.287610 x5 in 172.10.0.15 -> 10.255.18.53: icmp: echo reply 36.287610 LAN_DC in 172.10.0.15 -> 10.255.18.53: icmp: echo reply 36.287616 DC-YYY-SASE2 out 172.10.0.15 -> 10.255.18.53: icmp: echo reply 37.482760 DC-YYY-SASE2 in 10.255.18.53 -> 172.10.0.15: icmp: echo request 37.482768 LAN_DC out 10.255.18.53 -> 172.10.0.15: icmp: echo request 37.482769 x5 out 10.255.18.53 -> 172.10.0.15: icmp: echo request 37.483038 x5 in 172.10.0.15 -> 10.255.18.53: icmp: echo reply 37.483039 LAN_DC in 172.10.0.15 -> 10.255.18.53: icmp: echo reply 37.483047 DC-YYY-SASE2 out 172.10.0.15 -> 10.255.18.53: icmp: echo reply 52.270352 DC-XXX-SASE1 in 10.255.98.37 -> 172.10.0.15: icmp: echo request ---------------------> Traffic came from different tunnel at this point because of Performance SLA dead 52.270370 LAN_DC out 10.255.98.37 -> 172.10.0.15: icmp: echo request 52.270371 x5 out 10.255.98.37 -> 172.10.0.15: icmp: echo request 52.270703 x5 in 172.10.0.15 -> 10.255.98.37: icmp: echo reply 52.270704 LAN_DC in 172.10.0.15 -> 10.255.98.37: icmp: echo reply 52.270709 DC-XXX-SASE1 out 172.10.0.15 -> 10.255.98.37: icmp: echo reply 53.274523 DC-XXX-SASE1 in 10.255.98.37 -> 172.10.0.15: icmp: echo request 53.274528 LAN_DC out 10.255.98.37 -> 172.10.0.15: icmp: echo request 53.274529 x5 out 10.255.98.37 -> 172.10.0.15: icmp: echo request 53.274781 x5 in 172.10.0.15 -> 10.255.98.37: icmp: echo reply 53.274781 LAN_DC in 172.10.0.15 -> 10.255.98.37: icmp: echo reply 53.274785 DC-XXX-SASE1 out 172.10.0.15 -> 10.255.98.37: icmp: echo reply
Below are the SLA failures from FortiSASE:
interface="hub2" member="102" serviceid=1000 service="to_hub" gateway=213.42.76.130 metric="latency" msg="Member link is available. Start forwarding traffic. " date=2024-11-08 time=05:46:41 eventtime=1731044801654312378 tz="+0000" logid="0113022933" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN SLA notification" eventtype="Health Check" healthcheck="sdwan_hub_hc" interface="hub2" probeproto="ping" oldvalue="dead" newvalue="alive" msg="SD-WAN health-check member changed state." date=2024-11-08 time=05:46:40 eventtime=1731044799658319676 tz="+0000" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1000 service="to_hub" seq="101" msg="Service prioritized by SLA will be redirected in sequence order." date=2024-11-08 time=05:46:40 eventtime=1731044799658310365 tz="+0000" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" interface="hub2" member="102" serviceid=1000 service="to_hub" gateway=213.42.76.130 metric="latency" msg="Member link is unreachable or miss threshold. Stop forwarding traffic. " date=2024-11-08 time=05:46:40 eventtime=1731044799658257691 tz="+0000" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Health Check" healthcheck="sdwan_hub_hc" slatargetid=1 oldvalue="2" newvalue="1" msg="Number of pass member changed."
SLA threshold can be adjusted on each Pop. Follow the below to edit the SLA thresholds, increase the packet loss to 5%, and monitor, in case the issue is still the same, open a ticket with Fortinet support to increase the Interval rate to 5000 ms.
Go to Network -> Secure Private Access -> Service Connections. Select a service connection, select Health as shown below, choose a PoP, and then select Edit to update the SLA thresholds.
![Health and VPN Tunnel status.png Health and VPN Tunnel status.png](/t5/image/serverpage/image-id/66638i58EA590783611119/image-dimensions/890x285/is-moderation-mode/true?v=v2)
![Edit SLA Thresholds.png Edit SLA Thresholds.png](/t5/image/serverpage/image-id/66637i6FE31B059A46D396/image-dimensions/903x292/is-moderation-mode/true?v=v2)
|