VPN Phase 2 stops working after Azure FortiGate-VM Live Migration
Trying to understand what happened and how to prevent it in the future:
- Running FortiGate-VM in an Azure VM.
- This FG has a custom site-to-site IPSec tunnel to on-prem. This effectively connects the virtual data centre to the on-premises data centre. Tunnel is initiated from Azure.
- Suddenly, the tunnel no longer works. Phase 2 will not go up.
- The first sign of trouble is this:
Unavailable : Live Migration (Unplanned)
At Thursday, October 13, 2022 at 7:29:19 PM EDT, the Azure monitoring system received the following information regarding your Virtual machine:
This virtual machine was paused for 0.675000 seconds due to a memory-preserving Live Migration operation. No additional action is required from you at this time.
No action is required
- A couple of minutes after this, alerts start going off that connectivity has been lost.
- After some trouble shooting, pinging, checking routes, connectivity, rebooting, firmware upgrade, etc. it is determined that Phase 2 simply won't go up. There are timeouts and retries, but no other obvious cause. Config has not changed anywhere, everything else seems to work just fine, it's just this phase 2 that won't work.
- I decide to recreate the tunnel on the originating side, on the FG-VM. Same exact parameters as the previous one, I literally copy / paste everything.
- Voila, tunnel immediately works again.
- It was not a config change issue.
- It was not an actual connectivity issue.
It appears as if that live migration of the VM broke something. My best bet is that there's some persisted entropy, encryption key, salt, or something like that, tied to the hardware or the environment. When the live migration occurred, something stopped working because the environment changed. On physical platforms, coding something that for example uses the MAC key as a "salt" isn't a big deal, as it would never change. But on a VM, it's a problem.
1) Am I right? Or not? Could there be some other explanation as to why a tunnel needs to be re-created? If so what might be the reasons?
2) If I am right, it's now a bug, as this should not happen! VM's can move in all sorts of ways, regardless of the hosting platform (Azure, VSphere, etc.). Can't have a tunnel completely stop working and need to be recreated when a basic virtualization operation occurs.
I had looked at that ... the word "migration" does not feature anywhere in the document, that I could find (i.e. CTRL-F "migration" yields nothing), doing a search for the word on the web version of that doc yielded nothing either. I did the same in a few other documents without any luck.
Can you clarify what topics you saw that cover Live VM Migration?
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.