NAT-T and IKEv1 rekey problems

rosdev · ‎04-22-2022

Dear Users,

I can establish a tunnel to a FortiGate device correctly, but FortiGate's behavior on IKEv1 rekey events is strange. To summarize: a NAT'ed initiator establishes the tunnel to FortiGate, then after a configured period the initiator starts rekeying IKE normally. It proceeds through the 3 Main Mode exchanges. After that the new IKE SA is established as shown by this log entry:

2022-03-31 10:31:39.439141 ike 0:my-vpn2_1:646: established IKE SA edd08d7df99a2ac1/046a0210066d38c5

Immediately after that Fortigate deems the new tunnel a duplicate of a previous one, logs the "twin connections detected message" and deletes both its previous IPsec SA and IKE SA:

2022-03-31 10:31:39.439161 ike 0:my-vpn2_1:646: check peer route: if_addr4_rcvd=0, if_addr6_rcvd=0
2022-03-31 10:31:39.439187 ike 0:my-vpn2_1: twin connections detected
2022-03-31 10:31:39.439194 ike 0:my-vpn2_0: deleting
2022-03-31 10:31:39.439280 ike 0:my-vpn2_0: flushing

2022-03-31 10:31:39.441122 ike 0:my-vpn2_0:645: sent IKE msg (IPsec SA_DELETE-NOTIFY): 10.255.248.7:4500->xx.xx.xx.xx:51290, len=108, id=1d6cec7ed4ec0f25/2ff0d11d92da2d0f:c6267c37

2022-03-31 10:31:39.441252 ike 0:my-vpn2_0:645: sent IKE msg (ISAKMP SA DELETE-NOTIFY): 10.255.248.7:4500->xx.xx.xx.xx:51290, len=124, id=1d6cec7ed4ec0f25/2ff0d11d92da2d0f:21ed22e3

I don't think it's the right behavior for rekeys. It should establish the new IKE SA and let the previous one expire gracefully. It may expire on its end first or on the initiator's end first, depending on settings. In the latter case the initiator will notify Fortigate about deleting its SA and Fortigate will do the same. Tearing down the IPsec SA is an overkill, too (IKEv1 allows IPsec SA to persist without a corresponding IKE SA).

The adverse effect of this immediate SA removal is that Fortigate informs the initiator about it, the initiator deletes its SAs (the tunnel is effectively torn down as there is no IPsec SA anymore). The initiator also notifies notifies Fortigate that it has deleted its previous IKE SA. Fortigate just picks its newly created SA and tears it down. So there are no SAs for a period of time, and its the initiator that reinitializes the process from scratch.

Does anyone know how to prevent this from happening? I read this information about NAT traversal and twin connections:

https://community.fortinet.com/t5/FortiGate/Technical-Tip-NAT-traversal-and-twin-connections-in-IPse...

NAT-T is enabled on both ends though and the initiator does switch to ports 4500:4500 during the third IKE (ID) exchange when the tunnel is set up. The parties proceed using ports 4500 afterwards. The initiator switches to port 500 only when it initiates the IKE rekey sequence (i.e. the new Main Mode exchange, again, switching to 4500 at the third exchange).

Thanks,

Oleg

pminarik · ‎04-22-2022

The expectations (based on some internal notes) are as follows:

FortiGate lets the old IKEv1 SA expire (deleted once its hard timeout is reached), before it times out it co-exists with the new IKE SA
The existing IPsec SA is moved under the new IKE SA (not torn down)

This raises the question of why your description of what happens in your environment is different. Can you share with us some details of the configuration and environment? Such as:

FortiGate model and firmware version
phase1 type (static, dynamic, ddns)
PSK or certificate-authentication
XAUTH used?
Who is the remote peer?

[ corrections always welcome ]

rosdev · ‎04-23-2022

Thank you for your interest and support. As for your questions:

The Fortigate model is (VM) AWS on-demand. OS version 6.4.8 (I don't control that system so my info might be inaccurate/incomplete at the moment, but I can learn all the necessary details if needed).
phase1 type = dynamic
PSK is used
XAUTH is used
The remote peer is Libreswan (tried two different versions, this behavior is identical regardless of the peer version).

We took a look at this post as well:

https://community.fortinet.com/t5/FortiGate/Technical-Tip-Allowing-multiple-IPSec-dial-up-connection...

The symptom described there is identical to what we are experiencing and our phase2 setting is currently: "set route-overlap use-new". My guess is this setting might cause FortiGate to scrap existing SA(s) if there is an identical one (with the same routes) being established. The fact that it's a phase2 setting is slightly misleading since this is happening right after installing the IKE SA, prior to any phase2 exchanges.

I'll try the new setting "set route-overlap allow" on Monday. In the meantime, could you confirm or deny that this particular setting will make a difference? Or maybe there is something else to look at? I'd prefer to make an informed decision and not use the first setting that just "happens/appears to work".

Thanks,

Oleg

pminarik · ‎04-24-2022

I don't think it's a good idea to do "allow" there, might cause further problems. How would the FortiGate know which tunnel to use out of the now-two? I would keep this at use-new (default).

I did some further digging, and I have one question for you: When the libreswan clients initiates re-key, does it send the first packet to FortiGate's port 500 or 4500?

Based on what I've read, we seem to expect the rekey to use port 4500, otherwise if port 500 is targeted, it is assumed to be a new negotiation (which is when the initial phase1&2 would just get torn down due to "twin connection").

Can you please check this and confirm?

[ corrections always welcome ]

rosdev · ‎04-24-2022

Thanks for further guidelines.

I did turn attention to the port Libreswan uses on rekeys. The first two Libreswan rekey-triggered Main Mode exchanges use port 500, the switch to 4500 is made during the 3rd Main Mode exchange. So the port usage is the same as for the initial IKE SA establishment. This is probably what causes the problem.

I even asked Libreswan developers whether Libreswan is supposed to use port 500 or 4500 when it initiates IKE rekey; the (preliminary) answer is that it should use 500.

I read RFC 3947 (NAT-T). Section 4 contains a discussion of port usage. It says the *responder* should use port 4500 while initiating a rekey though there is no similar explicit statement about an initiator-triggered rekey. Hence, probably, the interoperability issues.

If you confirm that FortiGate requires that the rekey initiator use port 4500 right from the first Main Mode exchange for the exchange to be considered a rekey, I will follow up with the Libreswan team to see they can do about it (or my client is misconfigured somehow).

Thanks,

Oleg

pminarik · ‎04-25-2022

I read RFC 3947 (NAT-T)....

Looks like we're mirroring each other's steps. :)

I've read that too, and while there is no explicit mention that the client must do rekey on port 4500, it also says:

Once port change has occurred, if a packet is received on port 500 [...]
If the packet is a new Main Mode or Aggressive exchange, then it is processed normally (the other end might have rebooted, and this is starting new exchange).

This seems like a plausible explanation why FortiOS understands new negotiation on port 500 to be either a different client, or an existing client negotiating from scratch, hence why it then drops the existing.

If you confirm that FortiGate requires that the rekey initiator use port 4500 right from the first Main Mode exchange for the exchange to be considered a rekey

For what it's worth, my current understanding, based on all info I was able to find (external and internal), is: "yes, FortiGate needs the NAT-T-using peer to use UDP/4500 to consider the new IKEv1 negotiation as a rekey attempt".

If you could get the libreswan peer to enable rekey to start on UDP/4500 (if they have such a config option), that should do the trick, as far as I can tell.

Another option would be to switch to IKEv2. It doesn't have any such ambiguity for rekeying, so in that regard using IKEv2 would be an immediate improvement. :)

[ corrections always welcome ]

rosdev · ‎04-25-2022

Thanks for your reply.

IKEv2 isn't an option at this point, unfortunately. IKEv2 would mandate public keys/certificates (at least with Libreswan), which, in turn, requires a PKI. And my other stakeholders aren't ready to embrace PKI yet (sigh...).

I'm studying Libreswan docs/source code at the moment to see if it has an option to influence the port used during a rekey (as initiator). Chances are that it doesn't. I'll investigate further and get in touch with Libreswan developers again.

As a potential workaround, could you tell me what "set route-overlap" controls, exactly? When FortiGate detects twin connections, it logs this entry:

2022-03-31 10:31:39.440739 ike 0:my-vpn2_0:97134: del route 10.10.10.1/255.255.255.255 oif my-vpn2_0 (314) metric 1 priority 0

and shortly afterwards:

2022-03-31 10:31:39.440739 ike 0:my-vpn2_0: mode-cfg release 10.10.10.1/255.255.255.255

10.10.10.1 is the client's mode-cfg-assigned IP address (not an internal IP address behind NAT). The "use-new" setting clearly causes deletion of the previous IKE/IPsec SAs. The "keep-old" setting will probably prevent the new IKE SA from being established. What are the possible side effects of "allow"?

During rekey, both IKE SAs will coexist for a while. Lingering previous IKE SA isn't a problem on the FortiGate's end if configured to expire at the same time as the client's SA, so the overlap window will be short (up a few minutes or so). Traffic to the client will still be reaching that particular client (NAT mappings will remain the same so the NAT device will be able to demultiplex incoming traffic). What I'm not certain about is this:

1) whether the new IKE SA will retain the same mode-cfg address as the previous one (need to check);

2) what happens if this client becomes unavailable (reboots?) and another one behind the same NAT connects first?

Could you please describe the "route-overlap" option in more detail?

Thanks,

Oleg

pminarik · ‎04-25-2022

route-overlap is used when "add-route" is enabled in phase1 or phase2 (by default enabled in phase1, and by default phase2 refers to phase1's setting). "add-route" dynamically injects negotiated phase2 selectors from remote spokes/clients into the routing table of the FortiGate when the tunnel is of type=dynamic. Typically you either populate the routes to spokes' networks with add-route=enable, or you set up dynamic routing such as BGP, OSPF, or RIP to handle this (add-route=disable).

"route-overlap" handles what to do when a newly connecting spoke negotiates a selector (~route) which overlaps with an existing one. There are three options:

"use-new"(default) - the older phase1&2 are torn down. The newly negotiated one is used instead. Recommended for hub&spoke with dialup clients behind NAT, since it lets clients reconnect if they drop out for whatever reason.

"use-old" - reverse of the above. New client will not be able to connect if it negotiates the same selector. (potentially a big problem if a spoke/client crashes and wants to re-connect)

"allow" - Both old and new selectors and tunnels are allowed to coexist. FortiGate will use ECMP to spread sessions across these two routes/tunnels. If one of these tunnels is functionally dead on the other side, this may result in some traffic being "sent into the void" temporarily until the dead phase2 either expires, or DPD tears the tunnel down for not responding.

As for mode-config:

mode-cfg is not done after rekey. XAUTH is not redone during rekey either, unless you manually enable "reauth" in phase1 config.

[ corrections always welcome ]

rosdev · ‎04-25-2022

Thanks for the detailed explanation.

Since route-overlap governs behavior both for phase 1 and 2, it's risky to enable it since I'd prefer to have dynamic routing for both phases while changing the behavior for just phase 1, which I think is impossible.

Libreswan is different, i.e. it combines FortiGate's "use-new" and "allow". It has a notion of "the newest IKE/IPsec SA", uses the newest one and lets the previous one(s) expire as configured.

Glad you mentioned mode-cfg behavior. I think this will be the true deal-breaker for me. As far as I recall, Libreswan will try to reauthenticate XAUTH *and* repeat mode-cfg on every rekey.

Do you confirm that there is no way to enable mode-cfg after an IKE rekey, regardless of the "reauth" or any other setting?

Thanks,

Oleg

pminarik · ‎04-25-2022

1, XAUTH - From what I've found, this is handled by the FortiGate sending a CFG message saying that XAUTH is completed. If libreswan can digest that, you should be fine. (XAUTH is triggered from the dialup server's side)

2, mode-cfg: mode-cfg is AFAIK initiated by the client sending a request for various information it needs (address, mask, routes, etc.). You could try checking the IKE debug to see what happens in your scenario. I couldn't find any further notes on this. (maybe the FortiGate will be nice enough to re-send the same settings if the client unexpectedly asks again? No clue.)

"Libreswan is different, i.e. it combines FortiGate's "use-new" and "allow". It has a notion of "the newest IKE/IPsec SA", uses the newest one and lets the previous one(s) expire as configured."

Perhaps some further clarification is needed here. If it is superfluous, excuse me. "route-overlap" handles what to do with the new phase1&2 only when the selectors are overlapping.

Under normal circumstances (no overlap), rekey is handled the same way as you described: new phase1/phase2 SA is used for outgoing packets, inbound packets are accepted with encryption by either old/new SA, old SA is left to reach its timeout before deletion. Existing phase2 SA is moved "under" the new IKE SA.

[ corrections always welcome ]

NAT-T and IKEv1 rekey problems

Nominate a Forum Post for Knowledge Article Creation

You are leaving our website