IPsec hub-and-spoke VPN only working in one direction

Zylan · ‎08-25-2022

Hello,

I'm trying to connect 3 FortiGates (1 hub, 2 spokes) together with this topology:

                     ┌──────────────────────┐
                     │                      │
                     │  Hub                 │
                     │  OS 6.4.9            │
                     │                      │
                     │  Tunnel              │
┌──────────────┐     │  10.255.254.254      │     ┌──────────────┐
│              │     │                      │     │              │
│ Spoke        │     │  Public static IPv4  │     │ Spoke        │
│ Site A       │     │                      │     │ Site B       │
│ OS 7.0.6     │     └───────────┬──────────┘     │ OS 6.0.14    │
│              │                 │                │              │
│ Tunnel       │                 │                │ Tunnel       │
│ 10.255.254.1 │     ┌───────────┴──────────┐     │ 10.255.254.4 │
│              ├─────┤          WAN         ├─────┤              │
└──────────────┘     └──────────────────────┘     └──────────────┘

Both spoke FortiGates are behind NAT. The weird problem I'm seeing is that while site B is able to reach every other device in the VPN subnet, neither site A nor the hub can reach site B.

SiteB # exec traceroute 10.255.254.254
traceroute to 10.255.254.254 (10.255.254.254), 32 hops max, 3 probe packets per hop, 72 byte packets
1  10.255.254.254  2.789 ms  2.567 ms  2.377 ms

SiteB # exec traceroute 10.255.254.1
traceroute to 10.255.254.1 (10.255.254.1), 32 hops max, 3 probe packets per hop, 72 byte packets
1  10.255.254.254  2.594 ms  2.891 ms  3.148 ms
2  10.255.254.1  4.284 ms  4.165 ms  5.277 ms

Hub # exec ping 10.255.254.1
PING 10.255.254.1 (10.255.254.1): 56 data bytes
64 bytes from 10.255.254.1: icmp_seq=0 ttl=255 time=2.2 ms
--- 10.255.254.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 2.2/2.2/2.2 ms

Hub # exec ping 10.255.254.4
PING 10.255.254.4 (10.255.254.4): 56 data bytes
--- 10.255.254.4 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

There is a policy on the hub to allow both spoke-to-spoke and spoke-to-hub traffic, as well as policies on the spoke devices themselves to allow traffic to go through and the tunnel to be up. All IPsec interfaces are showing "up".

Here are the relevant configurations for all devices:

Hub

config vpn ipsec phase1-interface
    edit "hub"
        set type dynamic
        set interface "wan1"
        set ike-version 2
        set authmethod signature
        set peertype peergrp
        set net-device disable
        set proposal aes256-sha256 aes256gcm-prfsha384
        set add-route disable
        set dpd on-idle
        set dhgrp 18
        set certificate "s2s-hub-certificate"
        set peergrp "s2s-clients"
        set dpd-retryinterval 5
    next
end
config vpn ipsec phase2-interface
    edit "hub"
        set phase1name "hub"
        set proposal aes256-sha512 aes256-sha256 aes256gcm
        set dhgrp 18
    next
end
config system interface
    edit "hub"
        set ip 10.255.254.254 255.255.255.255
        set allowaccess ping
        set type tunnel
        set remote-ip 10.255.254.253 255.255.255.0
        set mtu-override enable
        set mtu 1260
        set interface "wan1"
    next
end

Site A

config vpn ipsec phase1-interface
    edit "site-a"
        set interface "wan1"
        set ike-version 2
        set authmethod signature
        set peertype any
        set net-device enable
        set proposal aes256gcm-prfsha384
        set dpd on-idle
        set dhgrp 18
        set remote-gw xxx.xxx.xxx.xxx
        set certificate "s2s-site-a-cert"
        set dpd-retryinterval 5
    next
end
config vpn ipsec phase2-interface
    edit "site-a"
        set phase1name "site-a"
        set proposal aes256gcm
        set dhgrp 18
        set auto-negotiate enable
    next
end
config system interface
    edit "site-a"
        set ip 10.255.254.1 255.255.255.255
        set allowaccess ping
        set type tunnel
        set remote-ip 10.255.254.254 255.255.255.0
        set interface "wan1"
    next
end

Site B

config vpn ipsec phase1-interface
    edit "site-b"
        set interface "wan1"
        set ike-version 2
        set authmethod signature
        set peertype any
        set proposal aes256gcm-prfsha384
        set dpd on-idle
        set dhgrp 18
        set remote-gw xxx.xxx.xxx.xxx
        set certificate "s2s-site-b-cert"
        set dpd-retryinterval 5
    next
end
config vpn ipsec phase2-interface
    edit "site-b"
        set phase1name "site-b"
        set proposal aes256-sha512
        set dhgrp 18
        set auto-negotiate enable
    next
end
config system interface
    edit "site-b"
        set ip 10.255.254.4 255.255.255.255
        set allowaccess ping
        set type tunnel
        set tcp-mss 1260
        set remote-ip 10.255.254.254 255.255.255.0
        set interface "wan1"
    next
end

What could be causing this problem?

sw2090 · ‎08-25-2022

two possible causes that just pop into my mind:

- the spoke FGTs have to have policies that allow traffic to the other spokes too

- on the spoke FGTs there has to be static routing to the other spoke subnets that has the hub as gateway

- if there is no mode config in use on the ipsecs also the hub needs to have static routing to the spoke subnets.

--

"It is a mistake to think you can solve any major problems just with potatoes." - Douglas Adams

sw2090 · ‎08-25-2022

You could excute some flow trace debugging on cli on your spokes (anbd mybe the hub) to see what happens to your traffic to other spoke.

maybe it gives you a clue finally...

--

"It is a mistake to think you can solve any major problems just with potatoes." - Douglas Adams

sagha · ‎08-25-2022

Hi Zylan,

I would suggest enabling NAT-T if the spoke devices are behind NAT. This should be done on Hub and Spoke.

# config vpn ipsec phase1-interface
edit "tunnel-name"
set nattraversal forced

end

In addition to this, you can also try and trace the traffic on the FGTs involved:

diag sniffer packet any 'host x.x.x.x and icmp' 4 0 a

Initiate the traffic by using a ping and see if the traffic is leaving the correct interface and whether it is received on the Spoke or not.

Thanks,

Shahan

sw2090 · ‎08-25-2022

yeah I forgot about that Shahan, thanks for mentioning it. Yes NAT-T should be enabled if soke/hub are behind NAT.

--

"It is a mistake to think you can solve any major problems just with potatoes." - Douglas Adams

akristof · ‎08-25-2022

Hello,

I would also add one thing. I am not sure what FortiOS version you are using and how routing is done (static or dynamic). I recommend to configure one setting under phase1-interface (on HUB and on spokes):

exchange-interface-ip enable

I am suspecting that HUB doesn't know how to reach next-hop of SiteB. I've might be wrong, because I would need more information, but this is often problem.

Adrian

Zylan · ‎08-28-2022

Hello,

I have tried enabling NAT-T, however, there was no observable change. The problem still remains the same. I've also made sure to add a policy that allows all spoke to spoke traffic on both site A and site B.

My eventual goal is to use BGP inside this tunnel so that the sites' subnets will be able to reach each other. If I'm not mistaken, wouldn't everything in the VPN subnet (10.255.254.0/24) already be reachable without any additional routing?

Just to be sure, I tried exchange-interface-ip, which resulted in the tunnel breaking (traffic between site A and the hub also stopped).

Packet sniffing revealed something interesting. If I do a ping from site B to the hub, I can see the traffic leaving site B using the VPN interface, and it's received by the hub:

SiteB # diag sniffer packet any 'icmp' 4 0 a
interfaces=[any]
filters=[icmp]
xx:xx:10.995688 site-b out 10.255.254.4 -> 10.255.254.1: icmp: echo request
xx:xx:11.000227 site-b in 10.255.254.1 -> 10.255.254.4: icmp: echo reply

Hub # diag sniffer packet hub 'icmp' 4 0 a
interfaces=[hub]
filters=[icmp]
xx:xx:01.507321 hub -- 10.255.254.4 -> 10.255.254.254: icmp: echo request
xx:xx:01.507346 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo reply

However, a reverse ping from the hub to site B shows something weird. At site B, no packets are received by a sniffer that captures all ICMP. At the hub itself, there is way too many ICMP echo request packets in a very short time span:

Hub # diag sniffer packet hub 'icmp' 4 0 a
interfaces=[hub]
filters=[icmp]
xx:xx:57.609818 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.612726 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.612739 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.614702 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.614707 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.616440 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.616443 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:57.618208 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request

This behavior is also observed when a ping is initiated from site A to site B. There would be no packets received at site B, but a dump on site A would show a large amount of ICMP echo requests being sent out (more than what should've been sent out by the ping tool).

At the hub, this is how my routing table looks like for 10.255.254.0/24:

Hub # get router info routing-table all
Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP
       O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
       * - candidate default

Routing table for VRF=0
C       10.255.254.0/24 is directly connected, hub
C       10.255.254.254/32 is directly connected, hub

At site A (the currently working site), this is how it looks like:

SiteA # get router info routing-table all
Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP
       O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
       * - candidate default

Routing table for VRF=0
S       10.255.254.0/24 [5/0] via site-a tunnel xxx.xxx.xxx.xxx, [1/0]
C       10.255.254.1/32 is directly connected, site-a

Interestingly, at site B (the broken site), 10.255.254.0/24 is not showing the route as being routed "via site-b tunnel":

SiteB # get router info routing-table all
Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP
       O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area
       * - candidate default

Routing table for VRF=0
C       10.255.254.0/24 is directly connected, site-b
C       10.255.254.4/32 is directly connected, site-b

Are we looking at some sort of VPN interface configuration or routing issue in this case?

tthrilok · ‎08-29-2022

Hi Zylan,

Could you share the sniffer output from both Hub and Spoke-B simultaneously when you are pinging from the Hub to spoke-B.

Thank you!

Zylan · ‎09-05-2022

Hi tthrilok,

Sorry, just saw this. I've tried doing another dump as you've mentioned, and something strange is happening. When I'm pinging from the hub to site B, this is what the hub sees:

Hub # diag sniffer packet hub 'icmp' 4 0 a
interfaces=[hub]
filters=[icmp]

xx:xx:xx.341560 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.347366 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.347379 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.349627 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.349633 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
...
yy:yy:yy.198998 hub -- 10.255.254.1 -> 10.255.254.254: icmp: time exceeded in-transit
yy:yy:yy.194550 hub -- 10.255.254.1 -> 10.255.254.254: icmp: time exceeded in-transit

Site B is not seeing any packets at all:

SiteB # diag sniffer packet any 'icmp' 4 0 a
interfaces=[any]
filters=[icmp]

0 packets received by filter
0 packets dropped by kernel

However, it seems that site A is getting packets destined for site B. Perhaps this is the problem? I'm not sure how to fix it though:

SiteA # diag sniffer packet site-a 'icmp' 4 0 a
interfaces=[site-a]
filters=[icmp]

xx:xx:xx.202653 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.203629 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.205826 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.205979 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.207825 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request
xx:xx:xx.207883 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request

Toshi_Esumi · ‎08-28-2022

Since you're using "dynamic" tunnel interface on the HUB and two tunnels are sharing it, I'm thinking it's more like a "net-device" issue.

https://docs.fortinet.com/document/fortigate/6.4.2/administration-guide/239039/dynamic-tunnel-interf...

Try enabling "net-device" to have a dynamic interrface for each tunnel.

You seem to be just testing the design at this moment. But I'm not sure your strategy of routing for the subnets behind the tunnel interface IP. Since it's one interface on the HUB, you need to use either "add-route" from phase2 selectors or routing protocol. And, since it's dynamic interface, you can't use static routes.
But because you disabled add-route and no specific selectors in phase2, if you disable net-device, the HUB FGT can't know which tunnel to route to for the real destinations.

Regardless, this is a site-to-site situation, not real dialup client VPN tunnels. I prefer configuring two different phase1 interfaces and use "peerid/localid" to bind a specific peer to each. And you can configure static routes and "remote-ip" on each tunnel interface at the HUB.

Toshi

IPsec hub-and-spoke VPN only working in one direction

Nominate a Forum Post for Knowledge Article Creation

You are leaving our website