| Solution |
Topology:

In this topology, the design of the GRE tunnel connection failover between FortiGate HA Cluster and AWS Transit Gateway is as below:
- When FGT01 becomes the primary, the GRE tunnel tgw will be active to route traffic between FortiGate and the application subnet 192.168.50.0/24.
- When FGT02 becomes the primary, the GRE tunnel tgw1 will be active to route traffic between FortiGate and the application subnet 192.168.50.0/24.
- Server Load Balancing was configured on FortiGate to perform DNAT to internal/application servers:
The connection worked fine when FGT02 remained the primary unit. The following statistics show the routing table, GRE tunnel status, and virtual server status when FGT02 acts as the primary:
FGTVM-Connect-2 # get router info routing-table all Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area V - BGP VPNv4 * - candidate default Routing table for VRF=0 S* 0.0.0.0/0 [10/0] via 10.0.0.65, port1, [1/0] S 1.0.0.0/8 [10/0] via 10.0.0.81, port2, [1/0] C 10.0.0.64/28 is directly connected, port1 C 10.0.0.80/28 is directly connected, port2 C 169.254.102.0/29 is directly connected, tgw1 C 169.254.102.1/32 is directly connected, tgw1 C 169.254.120.0/29 is directly connected, tgwc C 169.254.120.1/32 is directly connected, tgwc S *> 172.20.1.0/24 [10/0] via IPSec1 tunnel x.x.x.x, [1/0] S *> 172.20.2.0/24 [10/0] via IPSec2 tunnel y.y.y.y, [1/0] B 192.168.50.0/24 [20/100] via 169.254.102.2 (recursive is directly connected, tgw1), 00:01:02, [1/0] B 192.168.100.0/24 [20/100] via 169.254.102.2 (recursive is directly connected, tgw1), 00:01:02, [1/0]
FGTVM-Connect-2 # diagnose sys gre list
IPv4:
vd=0 devname=tgw1 devindex=4 ifindex=31 saddr=10.0.0.86 daddr=1.0.0.69 rpdb=0 ref=0 key=0/0 flags=0/0 dscp-copy=0 diffservcode=000000 RX bytes:30217 (29.5 kb) TX bytes:14937 (14.5 kb); RX packets:348, TX packets:174, TX carrier_err:0 collisions:0 npu-info: asic_offload=0, enc/dec=0/0, enc_bk=0/0/0/0, dec_bk=0/0/0/0 rpdb-ver: ffffffff rpdb-gwy: 0.0.0.0 rpdb-oif: 0
vd=0 devname=tgwc devindex=4 ifindex=15 saddr=10.0.0.21 daddr=1.0.0.68 rpdb=0 ref=0 key=0/0 flags=0/0 dscp-copy=0 diffservcode=000000 RX bytes:200059 (195.3 kb) TX bytes:55811 (54.5 kb); RX packets:2358, TX packets:667, TX carrier_err:4 collisions:0 npu-info: asic_offload=0, enc/dec=0/0, enc_bk=0/0/0/0, dec_bk=0/0/0/0 rpdb-ver: ffffffff rpdb-gwy: 0.0.0.0 rpdb-oif: 0 total tunnel = 2
FGTVM-Connect-2 # diagnose firewall vip realserver list alloc=2 ------------------------------ vf=0 name=VS_UBUNTU/1 class=4 type=0 192.168.64.43:(22-22), protocol=6 total=1 alive=1 power=1 ptr=635488 ip=192.168.50.177-192.168.50.177/22 adm_status=0 holddown_interval=300 max_connections=0 weight=1 option=01 alive=1 total=1 enable=00000001 alive=00000001 power=1 src_sz=0 id=0 status=up ks=0 us=0 events=5 bytes=0 rtt=0
However, when FGT01 takes over as the primary role, the traffic fails, and no response is observed from the server:
FGTVM-Connect # diagnose sniffer packet any 'host 192.168.64.43' 4 0 l Using Original Sniffing Mode interfaces=[any] filters=[host 192.168.64.43] 2025-11-07 01:22:06.887152 IPSec1 in 172.20.1.1.55608 -> 192.168.64.43.22: syn 312950751 2025-11-07 01:22:08.634742 IPSec2 in 172.20.2.1.55371 -> 192.168.64.43.22: syn 1710839447 2025-11-07 01:22:09.907403 IPSec1 in 172.20.1.1.55608 -> 192.168.64.43.22: syn 312950751 2025-11-07 01:22:11.647050 IPSec2 in 172.20.2.1.55371 -> 192.168.64.43.22: syn 1710839447
In the flow debug log capture, it was observed that traffic from the IPSec tunnel enters the FortiGate without issue; however, FortiGate is not performing the DNAT when FGT01 takes over the primary role:
2025-11-07 01:06:20 id=65308 trace_id=98 func=print_pkt_detail line=6005 msg="vd-root:0 received a packet(proto=6, 172.20.1.1:54955->192.168.64.43:22) tun_id=x.x.x.x from IPSec1. flag [S], seq 707309299, ack 0, win 62727" 2025-11-07 01:06:20 id=65308 trace_id=98 func=ipsec_spoofed4 line=243 msg="src ip 172.20.1.1 match selector 0 range 0.0.0.0-255.255.255.255" 2025-11-07 01:06:20 id=65308 trace_id=98 func=init_ip_session_common line=6204 msg="allocate a new session-009af359" 2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_check line=5481 msg="in-[IPSec1], out-[]" 2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_tree_check line=824 msg="len=1" 2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_dnat_policy line=5346 msg="checking gnum-100000 policy-16064" 2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_check line=5506 msg="result: skb_flags-02000008, vid-16064, ret-no-match, act-accept, flag-00000100" 2025-11-07 01:06:20 id=65308 trace_id=98 func=__vf_ip_route_input_rcu line=1989 msg="find a route: flag=80000000 gw-0.0.0.0 via root" 2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_access_proxy_check line=458 msg="in-[IPSec1], out-[], skb_flags-02000008, vid-16064" 2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2404 msg="gnum-100017, check-ffffffffa002cb97" ... 2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_policy line=2140 msg="checked gnum-10000e policy-4294967295, ret-no-match, act-accept" ... 2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_policy line=2374 msg="policy-4294967295 is matched, act-drop" 2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2421 msg="gnum-10000e check result: ret-matched, act-drop, flag-00000000, flag2-00000000" 2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_policy_group_check line=4903 msg="after check: ret-matched, act-drop, flag-00000000, flag2-00000000" 2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2404 msg="gnum-10000f, check-ffffffffa002cb97" ... 2025-11-07 01:06:21 id=65308 trace_id=98 func=__iprope_check_one_policy line=2374 msg="policy-4294967295 is matched, act-drop" 2025-11-07 01:06:21 id=65308 trace_id=98 func=__iprope_check line=2421 msg="gnum-10000f check result: ret-matched, act-drop, flag-00000800, flag2-00000000" 2025-11-07 01:06:21 id=65308 trace_id=98 func=iprope_policy_group_check line=4903 msg="after check: ret-matched, act-drop, flag-00000800, flag2-00000000" 2025-11-07 01:06:21 id=65308 trace_id=98 func=fw_local_in_handler line=620 msg="iprope_in_check() check failed on policy 0, drop"
Further investigation shows that the real server is showing as down on the FortiGate when FGT01 takes the primary role:
FGTVM-Connect # diagnose firewall vip realserver list alloc=2 ------------------------------ vf=0 name=VS_UBUNTU/1 class=4 type=0 192.168.64.43:(22-22), protocol=6 total=1 alive=0 power=0 ptr=536555 ip=192.168.50.177-192.168.50.177/22 adm_status=0 holddown_interval=300 max_connections=0 weight=1 option=01 alive=0 total=1 enable=00000001 alive=00000000 power=0 src_sz=0 id=0 status=down ks=0 us=0 events=14 bytes=0 rtt=0
Further checking on the health-check configured for the VIP indicates that the source IP was configured:
config firewall ldb-monitor
edit "Ubuntu"
set type tcp
set port 22
set src-ip 10.0.0.86
next
end
The IP 10.0.0.86 is the IP configured for FGT02 GRE tunnel. Hence, removing the src-ip setting in the ldb-monitor configuration will resolve the issue.
It is also recommended to use a single GRE tunnel interface and ensure that system.gre-tunnel is being excluded from HA synchronization for seamless failover.
FGTVM-Connect-2 # show system gre-tunnel
config system gre-tunnel
edit "tgwc"
set interface "port2"
set remote-gw 1.0.0.69
set local-gw 10.0.0.86
next
end
FGTVM-Connect # show system gre-tunnel
config system gre-tunnel
edit "tgwc"
set interface "port2"
set remote-gw 1.0.0.68
set local-gw 10.0.0.21
next
end
FGTVM-Connect # show system vdom-exception
config system vdom-exception
edit 1
set object system.inteface
next
edit 2
set object router.static
next
edit 3
set object router.bgp
next
edit 4
set object system.gre-tunnel
next
end
When the src-ip is not specified, the health check connection will be initiated using the interface IP configured for the GRE tunnel. In this demonstration, the source IP used to perform the health check depends on which FortiGate takes the master role:
FGT01: 169.254.120.1:
FGTVM-Connect # show system interface tgwc config system interface
edit "tgwc"
set vdom "root"
set ip 169.254.120.1 255.255.255.255
set type tunnel
set remote-ip 169.254.120.2 255.255.255.248
set snmp-index 8
set interface "port2"
next
end
FGT02: 169.254.102.1:
FGTVM-Connect-2 # show system interface tgwc config system interface
edit "tgwc"
set vdom "root"
set ip 169.254.102.1 255.255.255.255
set type tunnel
set remote-ip 169.254.102.2 255.255.255.248
set snmp-index 8
set interface "port2"
next
end
|