FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
kcheng
Staff & Editor
Staff & Editor
Article Id 418007
Description This article describes the troubleshooting steps for a FortiGate high availability issue after a firmware upgrade. The issue occurs when the primary FortiGate fails to forward traffic to the virtual server after becoming the primary unit.
Scope FortiGate Cloud (AWS/Azure).
Solution

Topology:


aws_lb.png

 

In this topology, the design of the GRE tunnel connection failover between FortiGate HA Cluster and AWS Transit Gateway is as below:

  • When FGT01 becomes the primary, the GRE tunnel tgw will be active to route traffic between FortiGate and the application subnet 192.168.50.0/24.
  • When FGT02 becomes the primary, the GRE tunnel tgw1 will be active to route traffic between FortiGate and the application subnet 192.168.50.0/24.
  • Server Load Balancing was configured on FortiGate to perform DNAT to internal/application servers:

The connection worked fine when FGT02 remained the primary unit. The following statistics show the routing table, GRE tunnel status, and virtual server status when FGT02 acts as the primary:


FGTVM-Connect-2 # get router info routing-table all
Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP
O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
V - BGP VPNv4
* - candidate default
Routing table for VRF=0
S* 0.0.0.0/0 [10/0] via 10.0.0.65, port1, [1/0]
S 1.0.0.0/8 [10/0] via 10.0.0.81, port2, [1/0]
C 10.0.0.64/28 is directly connected, port1
C 10.0.0.80/28 is directly connected, port2
C 169.254.102.0/29 is directly connected, tgw1
C 169.254.102.1/32 is directly connected, tgw1
C 169.254.120.0/29 is directly connected, tgwc
C 169.254.120.1/32 is directly connected, tgwc
S *> 172.20.1.0/24 [10/0] via IPSec1 tunnel x.x.x.x, [1/0]
S *> 172.20.2.0/24 [10/0] via IPSec2 tunnel y.y.y.y, [1/0]
B 192.168.50.0/24 [20/100] via 169.254.102.2 (recursive is directly connected, tgw1), 00:01:02, [1/0]
B 192.168.100.0/24 [20/100] via 169.254.102.2 (recursive is directly connected, tgw1), 00:01:02, [1/0]

FGTVM-Connect-2 # diagnose sys gre list

 

IPv4:

vd=0 devname=tgw1 devindex=4 ifindex=31
saddr=10.0.0.86 daddr=1.0.0.69 rpdb=0 ref=0
key=0/0 flags=0/0 dscp-copy=0 diffservcode=000000
RX bytes:30217 (29.5 kb) TX bytes:14937 (14.5 kb);
RX packets:348, TX packets:174, TX carrier_err:0 collisions:0
npu-info: asic_offload=0, enc/dec=0/0, enc_bk=0/0/0/0, dec_bk=0/0/0/0
rpdb-ver: ffffffff rpdb-gwy: 0.0.0.0 rpdb-oif: 0

vd=0 devname=tgwc devindex=4 ifindex=15
saddr=10.0.0.21 daddr=1.0.0.68 rpdb=0 ref=0
key=0/0 flags=0/0 dscp-copy=0 diffservcode=000000
RX bytes:200059 (195.3 kb) TX bytes:55811 (54.5 kb);
RX packets:2358, TX packets:667, TX carrier_err:4 collisions:0
npu-info: asic_offload=0, enc/dec=0/0, enc_bk=0/0/0/0, dec_bk=0/0/0/0
rpdb-ver: ffffffff rpdb-gwy: 0.0.0.0 rpdb-oif: 0
total tunnel = 2

FGTVM-Connect-2 # diagnose firewall vip realserver list
alloc=2
------------------------------
vf=0 name=VS_UBUNTU/1 class=4 type=0 192.168.64.43:(22-22), protocol=6
total=1 alive=1 power=1 ptr=635488
ip=192.168.50.177-192.168.50.177/22 adm_status=0 holddown_interval=300 max_connections=0 weight=1 option=01
alive=1 total=1 enable=00000001 alive=00000001 power=1
src_sz=0
id=0 status=up ks=0 us=0 events=5 bytes=0 rtt=0

However, when FGT01 takes over as the primary role, the traffic fails, and no response is observed from the server:


FGTVM-Connect # diagnose sniffer packet any 'host 192.168.64.43' 4 0 l
Using Original Sniffing Mode
interfaces=[any]
filters=[host 192.168.64.43]
2025-11-07 01:22:06.887152 IPSec1 in 172.20.1.1.55608 -> 192.168.64.43.22: syn 312950751
2025-11-07 01:22:08.634742 IPSec2 in 172.20.2.1.55371 -> 192.168.64.43.22: syn 1710839447
2025-11-07 01:22:09.907403 IPSec1 in 172.20.1.1.55608 -> 192.168.64.43.22: syn 312950751
2025-11-07 01:22:11.647050 IPSec2 in 172.20.2.1.55371 -> 192.168.64.43.22: syn 1710839447


In the flow debug log capture, it was observed that traffic from the IPSec tunnel enters the FortiGate without issue; however, FortiGate is not performing the DNAT when FGT01 takes over the primary role:


2025-11-07 01:06:20 id=65308 trace_id=98 func=print_pkt_detail line=6005 msg="vd-root:0 received a packet(proto=6, 172.20.1.1:54955->192.168.64.43:22) tun_id=x.x.x.x from IPSec1. flag [S], seq 707309299, ack 0, win 62727"
2025-11-07 01:06:20 id=65308 trace_id=98 func=ipsec_spoofed4 line=243 msg="src ip 172.20.1.1 match selector 0 range 0.0.0.0-255.255.255.255"
2025-11-07 01:06:20 id=65308 trace_id=98 func=init_ip_session_common line=6204 msg="allocate a new session-009af359"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_check line=5481 msg="in-[IPSec1], out-[]"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_tree_check line=824 msg="len=1"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_dnat_policy line=5346 msg="checking gnum-100000 policy-16064"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_check line=5506 msg="result: skb_flags-02000008, vid-16064, ret-no-match, act-accept, flag-00000100"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__vf_ip_route_input_rcu line=1989 msg="find a route: flag=80000000 gw-0.0.0.0 via root"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_access_proxy_check line=458 msg="in-[IPSec1], out-[], skb_flags-02000008, vid-16064"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2404 msg="gnum-100017, check-ffffffffa002cb97"
...
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_policy line=2140 msg="checked gnum-10000e policy-4294967295, ret-no-match, act-accept"
...
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_policy line=2374 msg="policy-4294967295 is matched, act-drop"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2421 msg="gnum-10000e check result: ret-matched, act-drop, flag-00000000, flag2-00000000"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_policy_group_check line=4903 msg="after check: ret-matched, act-drop, flag-00000000, flag2-00000000"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2404 msg="gnum-10000f, check-ffffffffa002cb97"
...
2025-11-07 01:06:21 id=65308 trace_id=98 func=__iprope_check_one_policy line=2374 msg="policy-4294967295 is matched, act-drop"
2025-11-07 01:06:21 id=65308 trace_id=98 func=__iprope_check line=2421 msg="gnum-10000f check result: ret-matched, act-drop, flag-00000800, flag2-00000000"
2025-11-07 01:06:21 id=65308 trace_id=98 func=iprope_policy_group_check line=4903 msg="after check: ret-matched, act-drop, flag-00000800, flag2-00000000"
2025-11-07 01:06:21 id=65308 trace_id=98 func=fw_local_in_handler line=620 msg="iprope_in_check() check failed on policy 0, drop"

Further investigation shows that the real server is showing as down on the FortiGate when FGT01 takes the primary role:


FGTVM-Connect # diagnose firewall vip realserver list
alloc=2
------------------------------
vf=0 name=VS_UBUNTU/1 class=4 type=0 192.168.64.43:(22-22), protocol=6
total=1 alive=0 power=0 ptr=536555
ip=192.168.50.177-192.168.50.177/22 adm_status=0 holddown_interval=300 max_connections=0 weight=1 option=01
alive=0 total=1 enable=00000001 alive=00000000 power=0
src_sz=0
id=0 status=down ks=0 us=0 events=14 bytes=0 rtt=0

Further checking on the health-check configured for the VIP indicates that the source IP was configured:

 

config firewall ldb-monitor

edit "Ubuntu"

set type tcp

set port 22

set src-ip 10.0.0.86

next

end

The IP 10.0.0.86 is the IP configured for FGT02 GRE tunnel. Hence, removing the src-ip setting in the ldb-monitor configuration will resolve the issue.

 

It is also recommended to use a single GRE tunnel interface and ensure that system.gre-tunnel is being excluded from HA synchronization for seamless failover.

FGTVM-Connect-2 # show system gre-tunnel

    config system gre-tunnel

     edit "tgwc"

      set interface "port2"

      set remote-gw 1.0.0.69

      set local-gw 10.0.0.86

    next

    end

FGTVM-Connect # show system gre-tunnel

    config system gre-tunnel

     edit "tgwc"

      set interface "port2"

      set remote-gw 1.0.0.68

      set local-gw 10.0.0.21

    next

    end


FGTVM-Connect # show system vdom-exception

    config system vdom-exception

     edit 1

     set object system.inteface

    next

        edit 2

         set object router.static

    next

        edit 3

         set object router.bgp

    next

        edit 4

         set object system.gre-tunnel

        next

        end

Contributors