Azure FortiGate LAN interface loses L2 Connectivity
Running a pair of VM02 virtual gates in HA (A-P) in Azure on version 6.4.3 and I've got an ipsec tunnel connecting an on-prem gate over ExpressRoute to the LAN port on the virtual gate. Multiple times a day the tunnel goes down and a backup vpn over the public Internet has to come up. I connect to the gate in Azure and in the cli I run 'get system arp', and sure enough, there are no entries on the LAN vnet. When things are working there are the normal arp entries such as the Microsoft reserved IP's (.1). What on earth could be causing this behavior? Hard to know if it's Microsoft or Fortinet. Does this sound like a bug? I'm working with Microsoft now, but I was curious if any other admins had seen similar behavior. Are any of you running FortiGates in Azure? Thanks!
As a follow up folks, I'm hearing other customers with FortiGates in Azure are experiencing arp table entries dropping out. Ran a sniffer trace on the gate for arp and see the messages going out, but nothing being returned. This is wreaking havoc with our setup, to the point that I can't even manage my active gate because it lost it's arp entries for the management network. IP sec tunnels are collapsing routinely. Hoping to get Microsoft to fix this. Any others suffering too?
No I can't. Classic "he said, she said" situation. I'm working with MS support and they say it's an NVA problem. NVA stands for Network Virtual Appliance which is what they call any device like the FortiGate vm. I've redeployed both vm's which puts them on different back-end Azure hosts, but no luck. The next step is to shut down the vm's and remove NIC1 and then add a new one. At least they'll both have new MAC addresses. We are also going to enable accelerated networking which is a feature in Azure that removes a layer of virtualization. We may also delete the cluster and focus on making this work outside of HA. My last ditch effort will be to delete both vm's, and build a single vm with an earlier version of FortiOS, not 6.4.3. Maybe there is a bug in 6.4.3 that I'm hitting.
Finally some progress. On advice from Microsoft, I enabled Accelerated Networking on port2 which was the interface failing to update it's arp table. Below is an excerpt from the Azure Cookbook that relates to AN. It seems to me that maybe "bypassing the hypervisor" in Azure somehow avoids this flaky L2 problem we've been seeing. You have to use powershell or Cloud shell in Azure to enable this feature on the NIC. The vm must be shut down too. I've been running almost two day now with no loss of connectivity. Keeping my fingers crossed.
"Azure supports SR-IOV, which accelerates networking by allowing VM NICs to bypass the hypervisor and go directly to the PCIe card underneath. FortiOS must understand when it is using SR-IOV and change networking to accommodate SR-IOV. Azure refers to SR-IOV as Accelerated Networking."
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.