Technical Tip: Failover to Standby ZTNA Real Server Fails When Blackhole Route is Configured to Real Servers
| Description | This article describes an issue where failover to a standby real server fails when one of the active real servers goes down. The issue occurs when a blackhole route is configured for the real server's IP address. |
| Scope | FortiGate v7.4.4, v7.4.5 |
| Solution | ZTNA server is configured with multiple real servers, each with a health check enabled, and a blackhole route is applied to the real server IPs, as shown in the configuration below: config firewall access-proxy edit "ztna_server_http" set vip "ztna_server_http" config api-gateway edit 1 config realservers edit 1 set ip 172.16.200.207 set health-check enable next edit 2 set ip 172.16.200.209 set status standby set health-check enable set holddown-interval disable next config router static edit 0 set status enable set dst 172.16.200.207 255.255.255.255 set blackhole enable next end When the link monitor detects that the primary real server (172.16.200.207) is in a 'Dead' state, the failover to the standby real server (172.16.200.209) fails, causing connectivity issues and leading to a 504 error. Despite the health check marking the real server as dead, the operational status in the WAD debugs shows as 'alive'. diag wad access-proxy health-check status Link Monitor: AP-1-1-1, Status: dead, Server num(1), cfg_version=0 Flags=0x1 init, Create time: Wed Apr 10 15:56:30 2024 VRF: 0 Interval: 1000 ms Service-detect: disable Diffservcode: 000000 Class-ID: 0 Transport-Group: 0 Class-ID: 0 Peer: 172.16.200.207(172.16.200.207) protocol: ping(443), state: dead Packet lost: 15.000% MOS: 4.397 Number of out-of-sequence packets: 0 Recovery times(0/5) Fail Times(1/5) Packet sent: 178, received: 156, Sequence(sent/rcvd/exp): 179/162/163 Link Monitor: AP-1-1-2, Status: alive, Server num(1), cfg_version=0 Flags=0x1 init, Create time: Wed Apr 10 15:56:30 2024 VRF: 0 Interval: 1000 ms Service-detect: disable Diffservcode: 000000 Class-ID: 0 Transport-Group: 0 Class-ID: 0 Peer: 172.16.200.209(172.16.200.209) protocol: ping(443), state: alive Latency(Min/Max/Avg): 0.106/0.155/0.131 ms Jitter(Min/Max/Avg): 0.001/0.028/0.009 ms Packet lost: 0.000% MOS: 4.404 Number of out-of-sequence packets: 0 Fail Times(0/5) Packet sent: 178, received: 178, Sequence(sent/rcvd/exp): 179/179/180 wad_ui_config_update_global_pre_vd(worker-handle) vd='' global gen=5 flags=AccessXHealth| [0x00000000000000000000000000000000100000] [I][p:4075] wad_vs_server_oper_status_set :6505 1:ztna_server_http:1: server 172.16.200.207:443 old oper status alive, new oper status alive <<<<<<<<<<< [I][p:4075] wad_vs_server_oper_status_set :6505 1:ztna_server_http:1: server 172.16.200.209:443 old oper status alive, new oper status alive [I][p:4075] wad_vs_gwy_get_servers_nop :3429 1:ztna_server_http:4294967295: trace wad_ui_update_vd(worker-handle) vd=root gen=0 flags= [0x00000000000000000000000000000000000000] wad_ui_update_vd(worker-handle) vd=vdom1 gen=0 flags= [0x00000000000000000000000000000000000000] [V][p:4075] wad_worker_handle_config_change :1188 WadTest@WorkerConfDone [V][p:4079][s:130][r:184549378] wad_fw_policy_match_dev_grp :5034 pol_id = 1 matched dev id = 5 [V][p:4079][s:130][r:184549378] wad_fw_policy_match_dev :5095 pol_id = 1 matched = 1 [I][p:4079][s:130][r:184549378] wad_fw_policy_async_match :5315 pol_ctx:th|A|7|=d [I][p:4079][s:130][r:184549378] wad_http_req_policy_set :11177 match policy-id=1(pol_ctx:th|A|7|=d) vd=1(ses_ctx:ct|Pv|Me|H|C|A1|O) (10.1.100.11:37402@10 -> 172.16.200.207:443@-1) <<<<<<< [I][p:4079][s:130][r:184549378] __wad_http_build_replmsg_resp :757 Generating replacement message. 504 error repmsg_id 7 This issue has been resolved in v7.6.0 Logs required by FortiGate TAC for investigation: diagnose wad filter src <source_IP> diagnose wad debug enable all <----- WAD debugs must not be run without a filter. diagnose wad debug enable category all diagnose wad debug enable level verbose diagnose wad access-proxy health-check status diagnose debug enable Once the data is captured at the time of the issue, reset the debugs using the command 'diagnose debug reset' and capture the TAC report using the configuration 'execute tac report'. |