I just recently setup HA load-balancing between two FACs and wanted to share some notes regarding the ordeal.
Both systems are VMs, running 4.3.2. The first system (master) has been in production for about a year now, and I finally got the chance to configure a HA slave for it. They are in remote data centers, with no L2 inter-connect so the load-balancing method was chosen since it allows for L3 connectivity. Primary use case is remote authentication using FortiToken Mobile, with both remote RADIUS and LDAP users. The FACs act as RADIUS servers for FortiGates.
For both units you will need the following licensing:
[ul]Base VM licenseSupport licenseAdditional user licenses, if required[/ul]
You need a single license for these. Ie, these are registered to the master unit only:[ul]FortiToken licenses (mobile or hardware)[/ul]
Communication Between HA Units[ul]Dedicated vNICs (port2) on each unit for HA communication[ul]No administration services enabledThe HA interface lives in the same zone/VLAN as the primary services interface (port1). This zone has a secondary private/internal subnet configured on the firewall, which was used for the HA communication[/ul]Static host route (/32) on each FAC, for its HA partnerPorts required are UDP-721 and UDP-1194, bi-directional initiation[ul]For the most part this has been the only traffic seen between the two systemHowever, while having some issues during the first try (elaborated a bit below) the primary initiated IP (protocol 1) communication to the slave, not UDP (protocol 17). Once I get system working the second try, this traffic was not seen[/ul]No administration services enabled[/ul]
Settings Not Synchronized
The admin guide does state in HA load-balancing deployment, not all settings are sync'd. It states:
Only the following authentication related features can be synchronized:
> Token and seeds
> Local user database
> Remote user database
> Group mappings
> Token and user mappings
Other features, such as FSSO and certificates, cannot be synchronized between devices.
Since the Remote user DB is synchronized, one would assume that the remote authentication servers these users source from would also be synchronized, but this did not happen on my system.
After establishing the HA cluster the following things did not automatically sync up:
Remote LDAP Users
LDAP Group Membership
RADIUS Group Membership
Group RADIUS Attributes
They all showed as synced with anomalies. The error details for the User Groups were:
Insert operation failed: Foreign key error: Entry for name=DOMAIN.COM not found in ldap_remoteldap on local server.
Insert operation failed: Foreign key error: Entry for name=RADIUS-SERVER not found in nas_radiusserver on local server.
After manually defining my remote RADIUS and LDAP servers all of these errors cleared.
BUT - that was not enough to get the HA system ready for traffic.
These are all of the things I had to manually redefine on the HA slave:[ul]Remote authentication servers (RADIUS and LDAP)RealmsRADIUS Clients[/ul]
After replicating those things manually, the HA slave did work for remote authentication.
Issues Seen The First Try
The first time I setup the HA cluster, the primary unit did not work for remote authentication requests. Other than the HA sync errors detailed above, the logs indicated that the remote RADIUS secret was invalid. Even after manually changing on both sides (Microsoft NPS and FortiAuthenticator) the error persisted.
I ended up restoring from backup on the master and rebooting, and it worked again.
Two things are notable here:[ul]The secondary unit was on 4.3.2 from the get-go, but the primary unit was on 4.3.1. I upgraded the primary to 4.3.2 before I built the HA cluster, or so I thought. The upgrade didn't stick and I didn't notice until after I started.Because I couldn't find what ports are required for HA load-balancing, I opted to only permit PING and watch traffic. Then I added those UDP ports mentioned above.[/ul]
Suggestions for HA Deployment for In-Production Systems
The second time I tried, it worked fine (other than the things that didn't automatically sync up). These are the steps I took that worked, and what I would suggest if you are going to turn a single-threaded FAC system already in production into a HA load-balancing cluster.
Preparation:[ul]Unassign and delete any trial tokens[ul]These are not compatible with HAThey must not be assigned to a user when you enable HAOnce you enable HA, the system will delete them[/ul]Backup your configuration on the FACTake a VMware snapshot (if applicable)Jot down remote authentication details just to be safe. Remote RADIUS and LDAP IPs and credentialsEnsure routing and firewall rules are in place[ul]Also ensure the secondary FAC can reach FortiGuard[/ul]Ensure both systems are on the same exact firmware version[/ul]
Deployment:[ul]Configure the slave for HA first[ul]Setup the secondary vNIC and static routeEnable HA as a slave, point to the master[/ul]Configure the master for HA[ul]Setup the secondary vNIC and static routeEnable HA as a master, point to the slave[/ul]Manually redefine the things that don't auto-synchronizeConfirm HA status on both unitsTest[/ul]
The only outstanding issue I can see right now is an error about FTM server credentials. It is only happening on the slave unit, not the master. I will open a ticket with FTAC and report back if I find a solution.
The log message is:
FTM server credentials: Update Failed
FortiGuard FTM Push Notification Update
Logs communication with FortiGuard regarding FTM push notification services
And is happening about every hour. I confirmed the system can reach out to FortiGuard, and the second system is licensed properly, so I am not sure what the issue is.