Created on
‎01-07-2026
11:59 PM
Edited on
‎01-12-2026
10:13 PM
By
Jean-Philippe_P
| Description |
This article describes the different states that an SD-WAN member interface can be in (according to Performance SLAs) and the timers that are used to determine how long a transition takes from Dead to Alive, as well as Out-of-SLA to In-SLA. Notably, these states are used to determine if an SD-WAN member will be used to handle outgoing traffic, and they also determine how much time must pass before the link will be utilized.
Important: the calculation of these transition timers differs very significantly between FortiOS v7.4 and earlier vs. FortiOS v7.6 and later. In particular, the amount of time required for transitioning from Out-of-SLA to In-SLA changes significantly and is discussed further below. |
| Scope | FortiGate (v7.6.3 and earlier, v7.6.4 and later), SD-WAN. |
| Solution |
SD-WAN Rules on the FortiGate have two separate sets of states used to determine if the interface is eligible to be used for traffic forwarding: Dead or Alive and Out-of-SLA or In-SLA. Both sets of states are determined using SD-WAN Performance SLAs (aka health checks), but are evaluated using different mechanisms within the health checks:
Dead/Alive indicates the state of the SD-WAN member's upstream network connectivity, according to probes sent out from the Performance SLA/health check. If no responses are received multiple times in a row, then the interface is set to the Dead state; otherwise, the interface is considered to be Alive.
Out-of-SLA/In-SLA indicates if an SD-WAN member is exceeding the acceptable performance thresholds set for latency, jitter, and/or packet-loss (which are measured using the probes sent out by the Performance SLA. If these metrics exceed the SLA threshold, then an interface is considered to be Out-of-SLA (i.e., link quality is considered to be poorer than expected), otherwise, it is considered to be In-SLA if it is within the set threshold.
Important: SD-WAN members can have separate states for each set. For example, an interface can be both Alive and also Out-of-SLA if the upstream health check server is reachable, but the measured latency exceeds the threshold set in the Performance SLA. Additionally:
Calculating state transition time: For reference, state transition timers are based on the following crucial parameters, which are configured within the SD-WAN Performance SLA settings:
config system sdwan config health-check edit <name> set server <IP/FQDN> set interval <20 - 3600, default = 500 milliseconds> set failtime <1 - 3600, default = 5> set recoverytime <1 - 3600, default = 5> next end end
Important: in FortiOS v7.6.3 and earlier, the interval and recoverytime settings impact both the Dead to Alive transition timer (#2) and the Out-of-SLA to In-SLA transition timer (#4). Be cautious of setting these to high values, as they can add significant delays to the amount of time that must pass before an SD-WAN interface fully recovers to a state of Alive + In-SLA.
The following state transition timers utilize a combination of the above factors, as described by the following sections:
The following calculation can be used to determine the number of probe failures and the period of time that must pass for an SD-WAN member to be marked as Dead. Note that the failures must be consecutive to trigger this transition:
Alive to Dead Timer (in seconds) = failtime * interval/1000
To transition back from the Dead state to the Alive state, use the following calculation. Like the Alive to Dead transition, the FortiGate must receive multiple successful probe responses in a row to trigger this transition:
Dead to Alive Timer (in seconds) = recoverytime * interval/1000
To transition from In-SLA to Out-of-SLA, the SD-WAN member interface must a) meet/exceed the configured SLA threshold (i.e., too much latency, jitter, and/or packet-loss), and also b) continue to exceed the SLA threshold for an additional period of time (as opposed to briefly exceeding and then falling below the threshold). Note that this scenario applies to cases where the member interface does not go to the Dead state.
First, the FortiGate must calculate a measurement for the latency, jitter, and packet-loss metrics. Each metric is an average based on a sliding window of the most recent health check probes:
**Adjustable via the probe-count option found in the Performance SLA settings (affects latency and jitter only, see config system sdwan for more information).
To exceed the SLA threshold, the FortiGate must receive enough poor results from the health check probes to increase the measurement. For latency and jitter, this can be difficult to calculate exactly since it depends on how significantly the measurements change from baseline, but packet-loss is much simpler to calculate since it is measured simply in terms of successful vs. failed probe responses:
Time to reach packet-loss threshold (in seconds) = |packetloss-threshold - CPL| * interval/1000, where CPL = Current Packet Loss measurement
Once the SLA threshold is initially exceeded, it must continue to be exceeded for a period approximately equal to the following calculation:
SLA threshold-exceeded timer (in seconds) = failtime * interval/1000
Note: The 'Time to reach packet-loss threshold' timer calculates the absolute value of packetloss-threshold minus the current packet-loss measured, so it can be used for both increasing and decreasing packet-loss approaching the set threshold.
Additionally, packet-loss percentage is calculated based on the results of the past 100 health check probes, so ‘bad’ probe results must be fully replaced with ‘good’ results for the packet-loss percentage to decrease. To transition from non-zero packet-loss to 0% packet-loss, use the following calculation:
Non-zero to 0% packet-loss (in seconds) = 100 * interval/1000
For example, if 4 probe failures are received initially, then packet-loss will register at 4% and remain at this level until 96 probe successes have been received, after which packet-loss will tick downward 1% at a time until a total of 100 probe successes are received in total. Note as well that probes are always being sent regardless of the interface being Alive or Dead, and so measured packet-loss can start to decrease even when the interface is in the Dead state.
Important: In FortiOS v7.4 and earlier, a timer delay exists when transitioning from Out-of-SLA to In-SLA that is separate and occurs in parallel to the actual calculation of the SLA metrics. This timer is similar to but fully separate from the Dead to Alive timer described in Section 2 above, and is calculated as follows:
Out-of-SLA to In-SLA delay timer (in seconds) = recoverytime * interval/1000
This timer is triggered when the member interface is both Alive and SLA metrics are below thresholds, and the purpose of this timer is to prevent the SD-WAN member from flapping between SLA states. This timer is not typically an issue when using relatively short values for recoverytime and interval, but setting these values too high can result in excessive delays (see Scenario 2 in the Conclusion section below).
As of FortiOS v7.6.4 and later (and Change #1142171), this timer was optimized so that a member that has transitioned from Dead to Alive will immediately go to In-SLA as soon as the metrics fall below the threshold, rather than needing to also wait for the Out-of-SLA to In-SLA delay timer.
Demonstration: The following example will demonstrate the timers involved with an interface transitioning through the following states: Alive -> Dead -> Alive + Out-of-SLA -> Alive + In-SLA.
The following Performance SLA configuration will be used as an example to demonstrate how all of the above timers function. Assume that packet-loss goes to 100% during the initial Alive to Dead transition:
config health-check edit 'Example_SLA' set server '8.8.8.8' set interval 2000 set failtime 2 set recoverytime 60 config sla edit 1 set link-cost-factor packet-loss set packetloss-threshold 15 next end next end
With a failtime of 2 and an interval of 2000, it will take 2 consecutive probe successes in a 4-second-long period for the interface to be considered Dead.
Alive to Dead Timer (in seconds) = failtime * interval/1000 = 2 * 2000/1000 = 4 seconds
With a recoverytime of 60 and an interval of 2000, it will take 60 consecutive probe successes in a 120-second-long period for the interface to transition from Dead to Alive + Out-of-SLA:
Dead to Alive Timer (in seconds) = recoverytime * interval/1000 = 60 * 2000/1000 = 120 seconds
To transition from Alive + Out-of-SLA to Alive + In-SLA, the SLA metric (in this case, packet-loss) must first fall below the configured threshold of 15% packet-loss. Notably, the packet-loss counter is always being measured (even during the Dead state), and so the measured packet-loss will actually tick downward during the previous Step 2 and could potentially drop below the SLA threshold, depending on the starting conditions and the configured settings.
With a failtime of 2, an interval of 2000ms, and a packetloss-threshold of 15, it will require 85 consecutive probe successes (170 seconds total) for packet-loss to go from 100% down to below 15%:
Time to reach packet-loss threshold (in seconds) = |packetloss-threshold - CPL| * interval/1000 = |15 - 100| * 2000/1000 = 85 * 2 = 170 seconds
As soon as the measured packet-loss falls below the SLA threshold, the 'Out-of-SLA to In-SLA delay timer' is triggered. As noted in Section 4 above, a recoverytime of 60 and an interval of 2000 results in a flat timer of 120 seconds that must expire before the interface may be marked as In-SLA (v7.6.3 and earlier):
Out-of-SLA to In-SLA delay timer (in seconds) = recoverytime * interval/1000 = 60 * 2000/1000 = 120 seconds
Diagrams and Visualization: The following diagram displays the timers in sequence using the same settings from the above demonstration:
As described above, the interface transitions from Alive to Dead within 2 failed probes (4 seconds), and it recovers from Dead to Alive + Out-of-SLA within roughly 120 seconds. Manual and Best Quality rules may use this interface since it is Alive, but Lowest Cost (SLA) rules would generally not, since it is Out-of-SLA (packet-loss is still not quite below the SLA threshold).
Once the measured packet-loss is below the SLA threshold, the 'Out-of-SLA to In-SLA delay timer' is started. This results in a 120-second period where packet-loss is below the SLA threshold, but the member interface is not actually considered as In-SLA from an SD-WAN rules perspective until the timer has elapsed.
For comparison, consider this alternative scenario that uses the same settings except for recoverytime, which is increased to a value of 90 instead of 60:
With an increased recoverytime, the 'Dead to Alive Timer' and 'Out-of-SLA to In-SLA delay timer' both increase to 180 seconds. Notably, packet-loss in this scenario actually drops to below the 15% threshold while the interface is still Dead, and so the 'Out-of-SLA to In-SLA delay timer' is started as soon as the interface transitions from Dead to Alive + Out-of-SLA. This is important to understand because setting recoverytime and interval to excessively large values can result in scenarios where SLA metrics are well below thresholds, and yet the interface may not be usable for Lowest-Cost (SLA) rules since it is not considered to be In-SLA yet.
One more alternative scenario to consider is the case of an SD-WAN member staying Alive but transitioning from In-SLA to Out-of-SLA and then back (this is a scenario where the 'threshold-exceeded timer' comes into play). The following scenario has a significantly different set of settings, with interval=2000, failtime increased to 20, recoverytime set back to 60, and packetloss-threshold decreased to 5%:
Note how measured packet-loss exceeds the 5% threshold for a significant period of time (40 seconds) due to the 'threshold-exceeded timer'. As a reminder, this timer is failtime * interval/1000, and so setting an excessively high failtime can result in delays before an interface is transitioned from In-SLA to Out-of-SLA.
Reminder regarding FortiOS v7.6.4 and later: As a final reminder, FortiOS v7.6.4 and later have removed the 'Out-of-SLA to In-SLA delay timer' when transitioning back from Dead to Alive + Out-of-SLA, so as soon as an interface is both Alive and has measured SLA metrics (latency, jitter, packet-loss) below the configured thresholds, it is immediately transitioned to Alive + In-SLA.
Related documents: |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2026 Fortinet, Inc. All Rights Reserved.