I'm planning to place the slave unit of a Fortigate HA cluster into a remote location. There is a leased line (layer 2) for the HA connect. Can anybody confirm that I can run the HA traffic across a VLAN between the access switches on each side of the line?
I know that HA traffic uses a non-standard ethertype, and I've tested that HA traffic is transfered unchanged over that line. But now there will be VRRP traffic between 2 routers on this line as well, and I'd like to isolate the HA traffic on a VLAN of it's own.
There is the option to enable authentication and encryption of the HA traffic but this will cost performance. Though it will isolate the traffic I guess.
I appreciate any advice, esp. from someone who has already separated a HA cluster geographically.
"Latency is important" did not fully bring my point across, 15 ms is more than enough, even 100 ms would do. Depending on the setup of the customer and the quality of the leased line, a situation could occur in which some heartbeat packets are not send out quickly enough or some are missed by the other node and an active-active split brain situation occurs, which causes all traffic to be dropped. This could happen because of:
- congestion on the leased line
- other provider issues or maintenance
- being targeted by a ddos attack
- a higher amount of incoming/outgoing traffic than expected
- inspecting more traffic than anticipated or the unit can handle, causing high CPU load which might prevent handling of the HB packets
- the amount of sessions being synced between the units and whether sessions-less sessions are synced (udp and icmp)
When one of these points occurs some traffic will be affected but not all of it, but when HB packets are missed and a split brain situation happens all traffic is pretty much over until the nodes see each other again and the cluster is restored. The chances of this actually happening is very low. Things to look out for is the System/HA logging and look for "HB interface lost" messages. Depending on the cause of these issues, different solutions might apply. However, if you want the cluster to be more lenient when missing some HB packets, fine tuning is possible of the following settings in the "config system ha" configuration:
hb-lost-threshold <threshold_integer> default value = 6 (which allows 5 packets to be missed before the HB interface is marked as "lost", at the 6th missed HB packet the interface is marked as "lost")
hb-interval <interval_integer> default value = 2 (which makes it 200 ms)
We can calculate the time in which the FortiGate marks a HB interface as lost by combining these values: 6 x 200 ms = 1 second and 200 ms. Depending on timing this can be slightly less or higher. Only change these values after investigating HB interface lost messages and you are certain this is the right thing to do, as this can be caused by other factors (e.g. the patch cable to the switch could be broken)
I was wildly guessing whether non-standard ethertype traffic can be tagged in a VLAN without trouble. Although, both cluster members will only see untagged traffic as the switches at the head and tail of the line will tag/untag.
I don't see why not unless the HA traffic is all on some proprietary Fortinet protocol(now i'm playing). One main thing to look at as alluded to by mike is the link speed. definitely want to factor that in.
Should be essentially a physical link between 2 fortigates as you would do if they were side by side, just running across a distance on same subnet. Mind letting us know how that goes?
A physical link ("dark copper") would be ideal but this connection is across the center of a big city. At least, it's a dedicated line. From a previous test I know it will transport HA traffic (which does use a proprietary protocol, FCGP, plus some FGSP, FRUP, ...) but that was across a non-VLAN connection.
I've used separated cluster units with HA traffic going over one or multiple vlan's over access/mgmt switches many times in different setups without problems.The mediums used (for long distances) vary from MPLS, VPLS, dark fiber and all work just fine. Even recently placed a new cluster for a provider in which the units are geographically separated over 120 KM, the multiple heartbeat and production interfaces are transported over a VPLS environment.
General best practices apply such as:
- Give each HB link a dedicated VLAN
- No other traffic should go over the same VLAN
- Have enough bandwidth available, especially if production traffic is passing the same dedicated line.
- As mentioned by others, latency is important. The HB hello timers can be decreased if necessary, but only with caution and as last resort.
The specific ethertype frames (0x8890) the FortiGates use for the heartbeat traffic is still standard traffic that can be tagged by switches, with the only caveat that some switches such as the Nexus 5k use the same ethertype traffic for some of their own functions and could drop the heartbeat traffic. The ethertype frames can be changed to a different ethertype on the FortiGate if this occurs.