FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
kcheng
Staff & Editor
Staff & Editor
Article Id 418007
Description This article describes the troubleshooting steps for a FortiGate high availability issue after a firmware upgrade. The issue occurs when the primary FortiGate fails to forward traffic to the virtual server after becoming the primary unit.
Scope FortiGate Cloud (AWS/Azure).
Solution

Topology:


aws_lb.png

 

In this topology, the design of the GRE tunnel connection failover between FortiGate HA Cluster and AWS Transit Gateway is as below:

  • When FGT01 becomes the primary, the GRE tunnel tgw will be active to route traffic between FortiGate and the application subnet 192.168.50.0/24.
  • When FGT02 becomes the primary, the GRE tunnel tgw1 will be active to route traffic between FortiGate and the application subnet 192.168.50.0/24.
  • Server Load Balancing was configured on FortiGate to perform DNAT to internal/application servers:

The connection worked fine when FGT02 remained the primary unit. The following statistics show the routing table, GRE tunnel status, and virtual server status when FGT02 acts as the primary:


FGTVM-Connect-2 # get router info routing-table all
Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP
O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
V - BGP VPNv4
* - candidate default
Routing table for VRF=0
S* 0.0.0.0/0 [10/0] via 10.0.0.65, port1, [1/0]
S 1.0.0.0/8 [10/0] via 10.0.0.81, port2, [1/0]
C 10.0.0.64/28 is directly connected, port1
C 10.0.0.80/28 is directly connected, port2
C 169.254.102.0/29 is directly connected, tgw1
C 169.254.102.1/32 is directly connected, tgw1
C 169.254.120.0/29 is directly connected, tgwc
C 169.254.120.1/32 is directly connected, tgwc
S *> 172.20.1.0/24 [10/0] via IPSec1 tunnel x.x.x.x, [1/0]
S *> 172.20.2.0/24 [10/0] via IPSec2 tunnel y.y.y.y, [1/0]
B 192.168.50.0/24 [20/100] via 169.254.102.2 (recursive is directly connected, tgw1), 00:01:02, [1/0]
B 192.168.100.0/24 [20/100] via 169.254.102.2 (recursive is directly connected, tgw1), 00:01:02, [1/0]

FGTVM-Connect-2 # diagnose sys gre list

 

IPv4:

vd=0 devname=tgw1 devindex=4 ifindex=31
saddr=10.0.0.86 daddr=1.0.0.69 rpdb=0 ref=0
key=0/0 flags=0/0 dscp-copy=0 diffservcode=000000
RX bytes:30217 (29.5 kb) TX bytes:14937 (14.5 kb);
RX packets:348, TX packets:174, TX carrier_err:0 collisions:0
npu-info: asic_offload=0, enc/dec=0/0, enc_bk=0/0/0/0, dec_bk=0/0/0/0
rpdb-ver: ffffffff rpdb-gwy: 0.0.0.0 rpdb-oif: 0

vd=0 devname=tgwc devindex=4 ifindex=15
saddr=10.0.0.21 daddr=1.0.0.68 rpdb=0 ref=0
key=0/0 flags=0/0 dscp-copy=0 diffservcode=000000
RX bytes:200059 (195.3 kb) TX bytes:55811 (54.5 kb);
RX packets:2358, TX packets:667, TX carrier_err:4 collisions:0
npu-info: asic_offload=0, enc/dec=0/0, enc_bk=0/0/0/0, dec_bk=0/0/0/0
rpdb-ver: ffffffff rpdb-gwy: 0.0.0.0 rpdb-oif: 0
total tunnel = 2

FGTVM-Connect-2 # diagnose firewall vip realserver list
alloc=2
------------------------------
vf=0 name=VS_UBUNTU/1 class=4 type=0 192.168.64.43:(22-22), protocol=6
total=1 alive=1 power=1 ptr=635488
ip=192.168.50.177-192.168.50.177/22 adm_status=0 holddown_interval=300 max_connections=0 weight=1 option=01
alive=1 total=1 enable=00000001 alive=00000001 power=1
src_sz=0
id=0 status=up ks=0 us=0 events=5 bytes=0 rtt=0

However, when FGT01 takes over as the primary role, the traffic fails, and no response is observed from the server:


FGTVM-Connect # diagnose sniffer packet any 'host 192.168.64.43' 4 0 l
Using Original Sniffing Mode
interfaces=[any]
filters=[host 192.168.64.43]
2025-11-07 01:22:06.887152 IPSec1 in 172.20.1.1.55608 -> 192.168.64.43.22: syn 312950751
2025-11-07 01:22:08.634742 IPSec2 in 172.20.2.1.55371 -> 192.168.64.43.22: syn 1710839447
2025-11-07 01:22:09.907403 IPSec1 in 172.20.1.1.55608 -> 192.168.64.43.22: syn 312950751
2025-11-07 01:22:11.647050 IPSec2 in 172.20.2.1.55371 -> 192.168.64.43.22: syn 1710839447


In the flow debug log capture, it was observed that traffic from the IPSec tunnel enters the FortiGate without issue; however, FortiGate is not performing the DNAT when FGT01 takes over the primary role:


2025-11-07 01:06:20 id=65308 trace_id=98 func=print_pkt_detail line=6005 msg="vd-root:0 received a packet(proto=6, 172.20.1.1:54955->192.168.64.43:22) tun_id=x.x.x.x from IPSec1. flag [S], seq 707309299, ack 0, win 62727"
2025-11-07 01:06:20 id=65308 trace_id=98 func=ipsec_spoofed4 line=243 msg="src ip 172.20.1.1 match selector 0 range 0.0.0.0-255.255.255.255"
2025-11-07 01:06:20 id=65308 trace_id=98 func=init_ip_session_common line=6204 msg="allocate a new session-009af359"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_check line=5481 msg="in-[IPSec1], out-[]"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_tree_check line=824 msg="len=1"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_dnat_policy line=5346 msg="checking gnum-100000 policy-16064"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_dnat_check line=5506 msg="result: skb_flags-02000008, vid-16064, ret-no-match, act-accept, flag-00000100"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__vf_ip_route_input_rcu line=1989 msg="find a route: flag=80000000 gw-0.0.0.0 via root"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_access_proxy_check line=458 msg="in-[IPSec1], out-[], skb_flags-02000008, vid-16064"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2404 msg="gnum-100017, check-ffffffffa002cb97"
...
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_policy line=2140 msg="checked gnum-10000e policy-4294967295, ret-no-match, act-accept"
...
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check_one_policy line=2374 msg="policy-4294967295 is matched, act-drop"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2421 msg="gnum-10000e check result: ret-matched, act-drop, flag-00000000, flag2-00000000"
2025-11-07 01:06:20 id=65308 trace_id=98 func=iprope_policy_group_check line=4903 msg="after check: ret-matched, act-drop, flag-00000000, flag2-00000000"
2025-11-07 01:06:20 id=65308 trace_id=98 func=__iprope_check line=2404 msg="gnum-10000f, check-ffffffffa002cb97"
...
2025-11-07 01:06:21 id=65308 trace_id=98 func=__iprope_check_one_policy line=2374 msg="policy-4294967295 is matched, act-drop"
2025-11-07 01:06:21 id=65308 trace_id=98 func=__iprope_check line=2421 msg="gnum-10000f check result: ret-matched, act-drop, flag-00000800, flag2-00000000"
2025-11-07 01:06:21 id=65308 trace_id=98 func=iprope_policy_group_check line=4903 msg="after check: ret-matched, act-drop, flag-00000800, flag2-00000000"
2025-11-07 01:06:21 id=65308 trace_id=98 func=fw_local_in_handler line=620 msg="iprope_in_check() check failed on policy 0, drop"

Further investigation shows that the real server is showing as down on the FortiGate when FGT01 takes the primary role:


FGTVM-Connect # diagnose firewall vip realserver list
alloc=2
------------------------------
vf=0 name=VS_UBUNTU/1 class=4 type=0 192.168.64.43:(22-22), protocol=6
total=1 alive=0 power=0 ptr=536555
ip=192.168.50.177-192.168.50.177/22 adm_status=0 holddown_interval=300 max_connections=0 weight=1 option=01
alive=0 total=1 enable=00000001 alive=00000000 power=0
src_sz=0
id=0 status=down ks=0 us=0 events=14 bytes=0 rtt=0

Further checking on the health-check configured for the VIP indicates that the source IP was configured:

 

config firewall ldb-monitor

edit "Ubuntu"

set type tcp

set port 22

set src-ip 10.0.0.86

next

end

The IP 10.0.0.86 is the IP configured for FGT02 GRE tunnel. Hence, removing the src-ip setting in the ldb-monitor configuration will resolve the issue.

 

It is also recommended to use a single GRE tunnel interface and ensure that system.gre-tunnel is being excluded from HA synchronization for seamless failover.

FGTVM-Connect-2 # show system gre-tunnel

    config system gre-tunnel

     edit "tgwc"

      set interface "port2"

      set remote-gw 1.0.0.69

      set local-gw 10.0.0.86

    next

    end

FGTVM-Connect # show system gre-tunnel

    config system gre-tunnel

     edit "tgwc"

      set interface "port2"

      set remote-gw 1.0.0.68

      set local-gw 10.0.0.21

    next

    end


FGTVM-Connect # show system vdom-exception

    config system vdom-exception

     edit 1

     set object system.inteface

    next

        edit 2

         set object router.static

    next

        edit 3

         set object router.bgp

    next

        edit 4

         set object system.gre-tunnel

        next

        end

 

When the src-ip is not specified, the health check connection will be initiated using the interface IP configured for the GRE tunnel. In this demonstration, the source IP used to perform the health check depends on which FortiGate takes the master role:

 

FGT01: 169.254.120.1:

 

FGTVM-Connect # show system interface tgwc
config system interface

edit "tgwc"

set vdom "root"

set ip 169.254.120.1 255.255.255.255

set type tunnel

set remote-ip 169.254.120.2 255.255.255.248

set snmp-index 8

set interface "port2"

next

end

 

FGT02: 169.254.102.1:

 

FGTVM-Connect-2 # show system interface tgwc
config system interface

edit "tgwc"

set vdom "root"

set ip 169.254.102.1 255.255.255.255

set type tunnel

set remote-ip 169.254.102.2 255.255.255.248

set snmp-index 8

set interface "port2"

next

end

Contributors