Troubleshooting Tip: SD-WAN health check failure due to high latency or large probe sequence number
| Description | This article describes a scenario where the SD-WAN health check is marked as down due to high latency and excessively large probe sequence numbers, even though probe packets are successfully received and no packet loss is observed. This occurs when probe responses exceed the configured timeout threshold or sequence handling affects probe validation, causing the link to be incorrectly marked as down. |
| Scope | FortiGate. |
| Solution | Symptoms:
Example debug log:
-- This indicates probe replies are received but with high latency (~519 ms).
Root cause:
The SD-WAN health check mechanism uses probe packets with sequence numbers and timeout validation. If:
The health check may mark the link as down even though responses are received. This occurs because the probe response is considered invalid if received outside the configured timeout window.
Note: The ping sequence is a 16-bit integer. When it increments from 0 to 32768, it is interpreted as a negative number.
Solution:
Step 1: Verify SD-WAN health check status.
Run a health-check status check as described in SD-WAN related diagnose commands.
Check for:
Step 2: Verify probe packet transmission and reception.
Run packet capture:
diagnose sniffer packet <interface_name> "host x.x.x.x and icmp" 4 0 l Verify:
Step 3: Check probe sequence behavior.
Enable debugging:
diagnose debug reset diagnose debug application sdwan -1 Check sequence numbers and probe timing. If sequence numbers exceed 32768 or probe replies are delayed beyond timeout, health check may fail.
Disable debugging after collecting logs:
diagnose debug disable
Step 4: Increase probe timeout values.
If latency is high, increase probe timeout and failtime:
Verification:
Run the following command:
diagnose sys sdwan health-check Expected result:
Example expected behavior:
SD-WAN health checks may fail due to high latency exceeding configured probe timeout values, even when probe responses are received. Increasing probe-timeout ensures reliable health check operation in high-latency environments. |
