"Latency is important" did not fully bring my point across, 15 ms is more than enough, even 100 ms would do. Depending on the setup of the customer and the quality of the leased line, a situation could occur in which some heartbeat packets are not send out quickly enough or some are missed by the other node and an active-active split brain situation occurs, which causes all traffic to be dropped. This could happen because of:
- congestion on the leased line
- other provider issues or maintenance
- being targeted by a ddos attack
- a higher amount of incoming/outgoing traffic than expected
- inspecting more traffic than anticipated or the unit can handle, causing high CPU load which might prevent handling of the HB packets
- the amount of sessions being synced between the units and whether sessions-less sessions are synced (udp and icmp)
When one of these points occurs some traffic will be affected but not all of it, but when HB packets are missed and a split brain situation happens all traffic is pretty much over until the nodes see each other again and the cluster is restored. The chances of this actually happening is very low. Things to look out for is the System/HA logging and look for "HB interface lost" messages. Depending on the cause of these issues, different solutions might apply. However, if you want the cluster to be more lenient when missing some HB packets, fine tuning is possible of the following settings in the "config system ha" configuration:
hb-lost-threshold <threshold_integer> default value = 6 (which allows 5 packets to be missed before the HB interface is marked as "lost", at the 6th missed HB packet the interface is marked as "lost")
hb-interval <interval_integer> default value = 2 (which makes it 200 ms)
We can calculate the time in which the FortiGate marks a HB interface as lost by combining these values: 6 x 200 ms = 1 second and 200 ms. Depending on timing this can be slightly less or higher. Only change these values after investigating HB interface lost messages and you are certain this is the right thing to do, as this can be caused by other factors (e.g. the patch cable to the switch could be broken)
More information at http://kb.fortinet.com/kb/documentLink.do?externalID=10043