Skip to main content
akileshc
Staff
Staff
February 26, 2026

Troubleshooting Tip: SD-WAN health check failure due to high latency or large probe sequence number

  • February 26, 2026
  • 0 replies
  • 791 views
Description This article describes a scenario where the SD-WAN health check is marked as down due to high latency and excessively large probe sequence numbers, even though probe packets are successfully received and no packet loss is observed. This occurs when probe responses exceed the configured timeout threshold or sequence handling affects probe validation, causing the link to be incorrectly marked as down.
Scope FortiGate.
Solution

Symptoms:

  • SD-WAN health check intermittently shows 'down' status.
  • Latency is significantly high (for example, ~500 ms or Higher).
  • No probe packet loss is observed.
  • Fail count may increment unexpectedly.
  • Debug logs show delayed probe responses.

 

Example debug log:

 

--
2024-11-21 10:35:51 lnkmtd::monitor_proto_peer_send_request(638): ---> HUB-13-VIRTUAL_WAN_LINK-13(10.117.141.1:ping) send probe packet, fail count(4)
2024-11-21 10:35:51 lnkmtd::ping_send_msg(435): ---> ping 10.117.141.1 seq_no=55331, icmp id=5348, send 20 bytes
--
2024-11-21 10:35:51 lnkmtd::ping_do_addr_up(136): ---> HUB-13-VIRTUAL_WAN_LINK-13->10.117.141.1(10.117.141.1), rcvd
2024-11-21 10:35:51 lnkmtd::monitor_peer_recv(2152): ---> can not find probe for monitor HUB-13-VIRTUAL_WAN_LINK-13, seq_num 686.
--
2024-11-21 10:35:52 lnkmtd::monitor_ppeer_fail(1847): ---> HUB-13-VIRTUAL_WAN_LINK-13(10.117.141.1 ping) is dead.
2024-11-21 10:35:52 lnkmtd::monitor_proute_cmdb_set(1121): ---> policy routes or internet service routes related to the monitor(HUB-13-VIRTUAL_WAN_LINK-13) may be removed

This indicates probe replies are received but with high latency (~519 ms).

 

Root cause:

 

The SD-WAN health check mechanism uses probe packets with sequence numbers and timeout validation. If:

  • The probe latency exceeds the configured 'probe-timeout', or:
  • The probe sequence number grows excessively large (greater than 32768), or:
  • The network latency exceeds acceptable thresholds:

 

The health check may mark the link as down even though responses are received.

This occurs because the probe response is considered invalid if received outside the configured timeout window.

 

Note: The ping sequence is a 16-bit integer. When it increments from 0 to 32768, it is interpreted as a negative number.
This causes the sequence to be considered outdated, and the system mistakenly thinks the valid packet is lost.

 

Solution:

 

Step 1: Verify SD-WAN health check status.

 

Run a health-check status check as described in SD-WAN related diagnose commands.

 

Check for:

  • Packet loss.
  • Latency.
  • Jitter.
  • Health status.

 

Step 2: Verify probe packet transmission and reception.

 

Run packet capture:

 

diagnose sniffer packet <interface_name> "host x.x.x.x and icmp" 4 0 l

Verify:

  • ICMP echo request is sent.
  • ICMP echo reply is received.
  • No packet drops.

 

Step 3: Check probe sequence behavior.

 

Enable debugging:

 

diagnose debug reset
diagnose debug console timestamp enable

diagnose debug application sdwan -1
diagnose debug application link-monitor -1
diagnose debug enable

Check sequence numbers and probe timing.

If sequence numbers exceed 32768 or probe replies are delayed beyond timeout, health check may fail.

 

Disable debugging after collecting logs:

 

diagnose debug disable
diagnose debug reset

 

Step 4: Increase probe timeout values.

 

If latency is high, increase probe timeout and failtime:


config system sdwan
    config health-check
        edit "health_check_name"
            set probe-timeout XYZ <Default : 500>
        next
    end
end

Verification:

 

Run the following command:

 

diagnose sys sdwan health-check

Expected result:

  • Health check status: alive.
  • Fail count: 0.
  • Latency visible but within acceptable timeout range.

 

Example expected behavior:

  • Probe replies received.
  • Fail count not increasing.
  • Link remains operational despite high latency.

 

SD-WAN health checks may fail due to high latency exceeding configured probe timeout values, even when probe responses are received. Increasing probe-timeout ensures reliable health check operation in high-latency environments.