Skip to main content
Zylan
Visitor III
August 25, 2022
Question

IPsec hub-and-spoke VPN only working in one direction

  • August 25, 2022
  • 11 replies
  • 7632 views

Hello,

 

I'm trying to connect 3 FortiGates (1 hub, 2 spokes) together with this topology:

 

                     ┌──────────────────────┐                      │                      │                      │  Hub                 │                      │  OS 6.4.9            │                      │                      │                      │  Tunnel              │ ┌──────────────┐     │  10.255.254.254      │     ┌──────────────┐ │              │     │                      │     │              │ │ Spoke        │     │  Public static IPv4  │     │ Spoke        │ │ Site A       │     │                      │     │ Site B       │ │ OS 7.0.6     │     └───────────┬──────────┘     │ OS 6.0.14    │ │              │                 │                │              │ │ Tunnel       │                 │                │ Tunnel       │ │ 10.255.254.1 │     ┌───────────┴──────────┐     │ 10.255.254.4 │ │              ├─────┤          WAN         ├─────┤              │ └──────────────┘     └──────────────────────┘     └──────────────┘

 

 

Both spoke FortiGates are behind NAT. The weird problem I'm seeing is that while site B is able to reach every other device in the VPN subnet, neither site A nor the hub can reach site B.

 

SiteB # exec traceroute 10.255.254.254 traceroute to 10.255.254.254 (10.255.254.254), 32 hops max, 3 probe packets per hop, 72 byte packets 1  10.255.254.254  2.789 ms  2.567 ms  2.377 ms  SiteB # exec traceroute 10.255.254.1 traceroute to 10.255.254.1 (10.255.254.1), 32 hops max, 3 probe packets per hop, 72 byte packets 1  10.255.254.254  2.594 ms  2.891 ms  3.148 ms 2  10.255.254.1  4.284 ms  4.165 ms  5.277 ms
Hub # exec ping 10.255.254.1 PING 10.255.254.1 (10.255.254.1): 56 data bytes 64 bytes from 10.255.254.1: icmp_seq=0 ttl=255 time=2.2 ms --- 10.255.254.1 ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 2.2/2.2/2.2 ms  Hub # exec ping 10.255.254.4 PING 10.255.254.4 (10.255.254.4): 56 data bytes --- 10.255.254.4 ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss

 

There is a policy on the hub to allow both spoke-to-spoke and spoke-to-hub traffic, as well as policies on the spoke devices themselves to allow traffic to go through and the tunnel to be up. All IPsec interfaces are showing "up".

 

Here are the relevant configurations for all devices:

 

Hub

config vpn ipsec phase1-interface     edit "hub"         set type dynamic         set interface "wan1"         set ike-version 2         set authmethod signature         set peertype peergrp         set net-device disable         set proposal aes256-sha256 aes256gcm-prfsha384         set add-route disable         set dpd on-idle         set dhgrp 18         set certificate "s2s-hub-certificate"         set peergrp "s2s-clients"         set dpd-retryinterval 5     next end config vpn ipsec phase2-interface     edit "hub"         set phase1name "hub"         set proposal aes256-sha512 aes256-sha256 aes256gcm         set dhgrp 18     next end config system interface     edit "hub"         set ip 10.255.254.254 255.255.255.255         set allowaccess ping         set type tunnel         set remote-ip 10.255.254.253 255.255.255.0         set mtu-override enable         set mtu 1260         set interface "wan1"     next end

 

Site A

 

config vpn ipsec phase1-interface     edit "site-a"         set interface "wan1"         set ike-version 2         set authmethod signature         set peertype any         set net-device enable         set proposal aes256gcm-prfsha384         set dpd on-idle         set dhgrp 18         set remote-gw xxx.xxx.xxx.xxx         set certificate "s2s-site-a-cert"         set dpd-retryinterval 5     next end config vpn ipsec phase2-interface     edit "site-a"         set phase1name "site-a"         set proposal aes256gcm         set dhgrp 18         set auto-negotiate enable     next end config system interface     edit "site-a"         set ip 10.255.254.1 255.255.255.255         set allowaccess ping         set type tunnel         set remote-ip 10.255.254.254 255.255.255.0         set interface "wan1"     next end

 

 

Site B

 

config vpn ipsec phase1-interface     edit "site-b"         set interface "wan1"         set ike-version 2         set authmethod signature         set peertype any         set proposal aes256gcm-prfsha384         set dpd on-idle         set dhgrp 18         set remote-gw xxx.xxx.xxx.xxx         set certificate "s2s-site-b-cert"         set dpd-retryinterval 5     next end config vpn ipsec phase2-interface     edit "site-b"         set phase1name "site-b"         set proposal aes256-sha512         set dhgrp 18         set auto-negotiate enable     next end config system interface     edit "site-b"         set ip 10.255.254.4 255.255.255.255         set allowaccess ping         set type tunnel         set tcp-mss 1260         set remote-ip 10.255.254.254 255.255.255.0         set interface "wan1"     next end

 

 

What could be causing this problem?

11 replies

sw2090
SuperUser
SuperUser
August 25, 2022

two possible causes that just pop into my mind:

 

- the spoke FGTs have to have policies that allow traffic to the other spokes too

- on the spoke FGTs there has to be static routing to the other spoke subnets that has the hub as gateway

- if there is no mode config in use on the ipsecs also the hub needs to have static routing to the spoke subnets.

sw2090
SuperUser
SuperUser
August 25, 2022

You could excute some flow trace debugging on cli on your spokes (anbd mybe the hub) to see what happens to your traffic to other spoke.

 

maybe it gives you a clue finally...

 

sagha
Staff
Staff
August 25, 2022

Hi @Zylan

 

I would suggest enabling NAT-T if the spoke devices are behind NAT. This should be done on Hub and Spoke.

 

# config vpn ipsec phase1-interface
    edit "tunnel-name"
        set nattraversal forced

end

 

In addition to this, you can also try and trace the traffic on the FGTs involved: 

diag sniffer packet any 'host x.x.x.x and icmp' 4 0 a

 

Initiate the traffic by using a ping and see if the traffic is leaving the correct interface and whether it is received on the Spoke or not. 

 

Thanks, 

Shahan

 

sw2090
SuperUser
SuperUser
August 25, 2022

yeah I forgot about that Shahan, thanks for mentioning it. Yes NAT-T should be enabled if soke/hub are behind NAT.

akristof
Staff
Staff
August 26, 2022

Hello,

I would also add one thing. I am not sure what FortiOS version you are using and how routing is done (static or dynamic). I recommend to configure one setting under phase1-interface (on HUB and on spokes):

exchange-interface-ip enable

I am suspecting that HUB doesn't know how to reach next-hop of SiteB. I've might be wrong, because I would need more information, but this is often problem.

Zylan
ZylanAuthor
Visitor III
August 28, 2022

Hello,

 

I have tried enabling NAT-T, however, there was no observable change. The problem still remains the same. I've also made sure to add a policy that allows all spoke to spoke traffic on both site A and site B.

 

My eventual goal is to use BGP inside this tunnel so that the sites' subnets will be able to reach each other. If I'm not mistaken, wouldn't everything in the VPN subnet (10.255.254.0/24) already be reachable without any additional routing?

 

Just to be sure, I tried exchange-interface-ip, which resulted in the tunnel breaking (traffic between site A and the hub also stopped).

 

Packet sniffing revealed something interesting. If I do a ping from site B to the hub, I can see the traffic leaving site B using the VPN interface, and it's received by the hub:

 

 

SiteB # diag sniffer packet any 'icmp' 4 0 a interfaces=[any] filters=[icmp] xx:xx:10.995688 site-b out 10.255.254.4 -> 10.255.254.1: icmp: echo request xx:xx:11.000227 site-b in 10.255.254.1 -> 10.255.254.4: icmp: echo reply
Hub # diag sniffer packet hub 'icmp' 4 0 a interfaces=[hub] filters=[icmp] xx:xx:01.507321 hub -- 10.255.254.4 -> 10.255.254.254: icmp: echo request xx:xx:01.507346 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo reply

 

 

However, a reverse ping from the hub to site B shows something weird. At site B, no packets are received by a sniffer that captures all ICMP. At the hub itself, there is way too many ICMP echo request packets in a very short time span:

 

 

Hub # diag sniffer packet hub 'icmp' 4 0 a interfaces=[hub] filters=[icmp] xx:xx:57.609818 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.612726 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.612739 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.614702 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.614707 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.616440 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.616443 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:57.618208 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request

 

 

This behavior is also observed when a ping is initiated from site A to site B. There would be no packets received at site B, but a dump on site A would show a large amount of ICMP echo requests being sent out (more than what should've been sent out by the ping tool).

 

At the hub, this is how my routing table looks like for 10.255.254.0/24:

 

 

Hub # get router info routing-table all Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP        O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area        * - candidate default  Routing table for VRF=0 C       10.255.254.0/24 is directly connected, hub C       10.255.254.254/32 is directly connected, hub

 

 

At site A (the currently working site), this is how it looks like:

 

 

SiteA # get router info routing-table all Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP        O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area        * - candidate default  Routing table for VRF=0 S       10.255.254.0/24 [5/0] via site-a tunnel xxx.xxx.xxx.xxx, [1/0] C       10.255.254.1/32 is directly connected, site-a

 

 

Interestingly, at site B (the broken site), 10.255.254.0/24 is not showing the route as being routed "via site-b tunnel":

 

 

SiteB # get router info routing-table all Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP        O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area        * - candidate default  Routing table for VRF=0 C       10.255.254.0/24 is directly connected, site-b C       10.255.254.4/32 is directly connected, site-b

 

 

Are we looking at some sort of VPN interface configuration or routing issue in this case?

tthrilok
Staff
Staff
August 29, 2022

Hi Zylan,

 

Could you share the sniffer output from both Hub and Spoke-B simultaneously when you are pinging from the Hub to spoke-B.

 

Thank you!

Zylan
ZylanAuthor
Visitor III
September 5, 2022

Hi tthrilok,

 

Sorry, just saw this. I've tried doing another dump as you've mentioned, and something strange is happening. When I'm pinging from the hub to site B, this is what the hub sees:

Hub # diag sniffer packet hub 'icmp' 4 0 a interfaces=[hub] filters=[icmp]  xx:xx:xx.341560 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.347366 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.347379 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.349627 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.349633 hub -- 10.255.254.254 -> 10.255.254.4: icmp: echo request ... yy:yy:yy.198998 hub -- 10.255.254.1 -> 10.255.254.254: icmp: time exceeded in-transit yy:yy:yy.194550 hub -- 10.255.254.1 -> 10.255.254.254: icmp: time exceeded in-transit

 

Site B is not seeing any packets at all:

 

SiteB # diag sniffer packet any 'icmp' 4 0 a interfaces=[any] filters=[icmp]  0 packets received by filter 0 packets dropped by kernel

 

However, it seems that site A is getting packets destined for site B. Perhaps this is the problem? I'm not sure how to fix it though:

 

SiteA # diag sniffer packet site-a 'icmp' 4 0 a interfaces=[site-a] filters=[icmp]  xx:xx:xx.202653 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.203629 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.205826 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.205979 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.207825 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request xx:xx:xx.207883 site-a -- 10.255.254.254 -> 10.255.254.4: icmp: echo request

 

Toshi_Esumi
SuperUser
SuperUser
August 28, 2022

Since you're using "dynamic" tunnel interface on the HUB and two tunnels are sharing it, I'm thinking it's more like a "net-device" issue.

https://docs.fortinet.com/document/fortigate/6.4.2/administration-guide/239039/dynamic-tunnel-interface-creation

Try enabling "net-device" to have a dynamic interrface for each tunnel.

You seem to be just testing the design at this moment. But I'm not sure your strategy of routing for the subnets behind the tunnel interface IP. Since it's one interface on the HUB, you need to use either "add-route" from phase2 selectors or routing protocol. And, since it's dynamic interface, you can't use static routes.
But because you disabled add-route and no specific selectors in phase2, if you disable net-device, the HUB FGT can't know which tunnel to route to for the real destinations.

 

Regardless, this is a site-to-site situation, not real dialup client VPN tunnels. I prefer configuring two different phase1 interfaces and use "peerid/localid" to bind a specific peer to each. And you can configure static routes and "remote-ip" on each tunnel interface at the HUB.

 

Toshi

 

Zylan
ZylanAuthor
Visitor III
August 28, 2022

Just to confirm, net-device should be set on the hub, is this correct? I've tried it and the result was that I couldn't ping any tunnel interface IP from anywhere (hub, site A, or site B). On the hub, with net-device enabled, the routing table shows:

 

Hub # get router info routing-table all Codes: K - kernel, C - connected, S - static, R - RIP, B - BGP        O - OSPF, IA - OSPF inter area        N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2        E1 - OSPF external type 1, E2 - OSPF external type 2        i - IS-IS, L1 - IS-IS level-1, L2 - IS-IS level-2, ia - IS-IS inter area        * - candidate default  Routing table for VRF=0 C       10.255.254.253/32 is directly connected, hub_0                           is directly connected, hub_1 C       10.255.254.254/32 is directly connected, hub_0                           is directly connected, hub_1

 

(I'm not sure if this is something that needs to be enabled on both sides - both spoke and hub - but if so, while this looks possible with site A (OS 7.0), this doesn't seem to be possible with site B's legacy FGT (OS 6.0))

 

Yes, I'm just testing things out. Eventually, BGP will be used to route subnets behind the tunnel interface IPs, but I would think that IPs within the VPN subnet needs to be reachable first for BGP to work - please correct me if I'm wrong (I did try configuring BGP, but no routes were exchanged with an error of "malformed AS-PATH").

 

As for multiple phase1 interfaces, I haven't tried this out yet but would that not create multiple logical interfaces that then needs security policies applied to all of them? Either way, I'm trying to avoid having to configure too many things at the hub if possible...

Toshi_Esumi
SuperUser
SuperUser
August 29, 2022

The remote side is "static" ipsec. The net-device config doesn't do anything. Only in case dynamic/dialup is configured (hub), it would allow creating those hub_0, hub_1 dynamic interface as you saw in the routing table. Since you are configuring one remote-ip on the tunnel interface, it's showing on both side as a connected route. I don't think dynamic/dialup IPsec config on the hub side allows you to configure BGP neighbors. Because you don't know which one becomes hub_0 and which becomes hub_1.

I think you have to have separate phase1-interfaces for each then you can specify a remote IP like 10.255.254.1 or .4 on each interface. Then you can configure BGP neighbors.

I might be wrong and there might be a way with dialup(one phase1-interface on hub) but somebody from FTNT should be able to tell if that's the case.

 

<edit>

For the policies on the hub if you have individual phase1s, you could put two remote interfaces in a zone then allow "intrazone" traffic if you don't want to restrict accesses each others, or deny it if no remote-to-remote is allowed. For hub<->remotes, you probably have to have some kind of policies anyway so you at least need two policies for both directions.

</edit>

 

Toshi

Toshi_Esumi
SuperUser
SuperUser
August 29, 2022

Looks like your set up is quite similar to ADVPN, which I don't have any experience with. I'll stand down and let somebody else helping you. The doc below says you should "disable" net-device at the hub.

https://docs.fortinet.com/document/fortigate/6.2.11/cookbook/820072/advpn-with-bgp-as-the-routing-protocol

Toshi