Skip to main content
ssanga
Staff & Editor
Staff & Editor
October 21, 2024

Technical Tip: Failover to Standby ZTNA Real Server Fails When Blackhole Route is Configured to Real Servers

  • October 21, 2024
  • 0 replies
  • 193 views
Description This article describes an issue where failover to a standby real server fails when one of the active real servers goes down. The issue occurs when a blackhole route is configured for the real server's IP address.
Scope FortiGate v7.4.4, v7.4.5
Solution
ZTNA server is configured with multiple real servers, each with a health check enabled, and a blackhole route is applied to the real server IPs, as shown in the configuration below: 
 
config firewall access-proxy
    edit "ztna_server_http"
        set vip "ztna_server_http"
        config api-gateway
            edit 1
                config realservers
                    edit 1
                        set ip 172.16.200.207
                        set health-check enable
                    next
                    edit 2
                        set ip 172.16.200.209
                        set status standby
                        set health-check enable
                        set holddown-interval disable
                    next
 
config router static
    edit 0
        set status enable
        set dst 172.16.200.207 255.255.255.255
        set blackhole enable
    next
end
 
When the link monitor detects that the primary real server (172.16.200.207) is in a 'Dead' state, the failover to the standby real server (172.16.200.209) fails, causing connectivity issues and leading to a 504 error. 
 
Despite the health check marking the real server as dead, the operational status in the WAD debugs shows as 'alive'.
 
diag wad access-proxy health-check status
Link Monitor: AP-1-1-1, Status: dead, Server num(1), cfg_version=0 Flags=0x1 init, Create time: Wed Apr 10 15:56:30 2024
VRF: 0
Interval: 1000 ms
Service-detect: disable
Diffservcode: 000000
Class-ID: 0
Transport-Group: 0
Class-ID: 0
  Peer: 172.16.200.207(172.16.200.207)
        protocol: ping(443), state: dead
                Packet lost: 15.000%
                MOS: 4.397
                Number of out-of-sequence packets: 0
                Recovery times(0/5) Fail Times(1/5)
                Packet sent: 178, received: 156, Sequence(sent/rcvd/exp): 179/162/163
 
Link Monitor: AP-1-1-2, Status: alive, Server num(1), cfg_version=0 Flags=0x1 init, Create time: Wed Apr 10 15:56:30 2024
VRF: 0
Interval: 1000 ms
Service-detect: disable
Diffservcode: 000000
Class-ID: 0
Transport-Group: 0
Class-ID: 0
  Peer: 172.16.200.209(172.16.200.209)
        protocol: ping(443), state: alive
                Latency(Min/Max/Avg): 0.106/0.155/0.131 ms
                Jitter(Min/Max/Avg): 0.001/0.028/0.009 ms
                Packet lost: 0.000%
                MOS: 4.404
                Number of out-of-sequence packets: 0
                Fail Times(0/5)
                Packet sent: 178, received: 178, Sequence(sent/rcvd/exp): 179/179/180
 
wad_ui_config_update_global_pre_vd(worker-handle) vd='' global gen=5 flags=AccessXHealth| [0x00000000000000000000000000000000100000]
[I][p:4075] wad_vs_server_oper_status_set :6505 1:ztna_server_http:1: server 172.16.200.207:443 old oper status alive, new oper status alive <<<<<<<<<<<
[I][p:4075] wad_vs_server_oper_status_set :6505 1:ztna_server_http:1: server 172.16.200.209:443 old oper status alive, new oper status alive
[I][p:4075] wad_vs_gwy_get_servers_nop :3429 1:ztna_server_http:4294967295: trace
wad_ui_update_vd(worker-handle) vd=root gen=0 flags= [0x00000000000000000000000000000000000000]
wad_ui_update_vd(worker-handle) vd=vdom1 gen=0 flags= [0x00000000000000000000000000000000000000]
[V][p:4075] wad_worker_handle_config_change :1188 WadTest@WorkerConfDone
 
[V][p:4079][s:130][r:184549378] wad_fw_policy_match_dev_grp :5034 pol_id = 1 matched dev id = 5
[V][p:4079][s:130][r:184549378] wad_fw_policy_match_dev :5095 pol_id = 1 matched = 1
[I][p:4079][s:130][r:184549378] wad_fw_policy_async_match :5315 pol_ctx:th|A|7|=d
[I][p:4079][s:130][r:184549378] wad_http_req_policy_set :11177 match policy-id=1(pol_ctx:th|A|7|=d) vd=1(ses_ctx:ct|Pv|Me|H|C|A1|O) (10.1.100.11:37402@10 -> 172.16.200.207:443@-1) <<<<<<<
[I][p:4079][s:130][r:184549378] __wad_http_build_replmsg_resp :757 Generating replacement message. 504 error repmsg_id 7
 
This issue has been resolved in  v7.6.0
 
Logs required by FortiGate TAC for investigation:
 
diagnose wad filter src <source_IP>
diagnose wad debug enable all                    <----- WAD debugs must not be run without a filter.
diagnose wad debug enable  category all
diagnose wad debug enable level verbose
diagnose wad access-proxy health-check status
diagnose debug enable
 
Once the data is captured at the time of the issue, reset the debugs using the command 'diagnose debug reset' and capture the TAC report using the configuration 'execute tac report'.