Behaviour when restoring config on an A-P cluster

Robin_Svanberg · ‎04-23-2015

Hi,

we are going to move all ports on a Fortigate 600 HA A-P Cluster to LACP ports instead and to be able to do that we need to modify the configuration and restore it.

What´s the behaviour when doing this on an A-P cluster? We don´t want both units to reboot :)

BR Robin

Robin Svanberg Network Consultant @ Ethersec AB in Östersund, Sweden

robin.svanberg@ethersec.se

Robin_Svanberg · ‎04-23-2015

The firewalls are running version 5.2.2.

Robin Svanberg Network Consultant @ Ethersec AB in Östersund, Sweden

robin.svanberg@ethersec.se

AndreaSoliva · ‎04-23-2015

Hi

in a Active Passive Cluster the config is always written from master to slave. This means whatever it will be done master overwrites the slave. From this point of view it is intressting to look to the system "who" will be master and why. For this have a look at following picture:

Fortinet-336.jpg

This is the normal behaviour a FGT evaluates the master. The tricky thing on this is the Age which is the age of the uptime of Active Passive (not the uptime of device). This means if you have a Active/Passive and Master goes defect and comes back and in the meantime some config was done on the slave it is absolutly neccessary to understand what is configured and who overwrittes which device (Slave). For this reason my configuration or lets say may suggestion is following:

# config system ha # set override enable # end

This command deactivates the "Age" completly and will not be anymore part of the decision regarding overwritting the Slave. This means the config in a HA which has the higher Priority (if all interfaces a correctly connected) will overwrite the device with lover priority. From this point of view let's say I will do a upgrade or a big modification like yours I would do following:

1. Remove the Slave (leave the Master untouched)

2. Locally not on the network connected do your modification (does not matter if with same release etc.) but use "override enable" and do not use Monitoring interface.

3. At point you are ready shutdown the Master (you will have a outage)

4. Come up with your new config based on whatever OS and fully test the new constellation/configuration

5. If all is ready on the new Master make a backup of the new Master

6. Setup the old Master with same relase and matching exact the same setup as the new Master

7. Do a full restore on the old Master

8. Go to the Gui and change: Node Name, Priority for Slave and if you have the IP of the Mgmt. Interface defined under HA Gui

9. Shutdown the new Slave and bring it to the network as connect all cables correctly

10. Start the new Slave and wait on Console until following appears (can take some minutes):

login: slave's external files are not in sync with master, sequence:0. (type CERT_LOCAL) slave's external files are not in sync with master, sequence:1. (type CERT_LOCAL) slave's external files are not in sync with master, sequence:2. (type CERT_LOCAL) slave's external files are not in sync with master, sequence:3. (type CERT_LOCAL) slave's external files are not in sync with master, sequence:4. (type CERT_LOCAL) slave succeeded to sync external files with master

Check your interface config meaning duplex etc. with:

# diagnose netlink device list

If no errors etc. activate Port Monitoring. Thats it...! If you do a risky update or a riski config is the only way I would go.

hope this helps

have fun

Andrea

AndreaSoliva · ‎04-23-2015

Hi

another hint....in this scenario upgrade in any case to 5.2.3 because all what is lower is full of bugs!

:)

Andrea

emnoc · ‎04-23-2015

No matter what you do this will be service impacting. I know this should be obvious, but either way; "backup up the now cfg" b4 you make the changes. You will probably need to touch all interfaces/firewall-policy/firewall-address/sys-dhcp settings. So police the cfg for any dependencies in these area.

Go slow, sit back and have fun ;)

PCNSE

NSE

StrongSwan

Sean_Toomey_FTNT · ‎04-27-2015

There are ways you can minimize impact but you will take a brief downtime no matter what you do.

You can try this.. admin down all ports going to slave on the switch, and then disconnect HA link between the units. This will break HA so that you can work with one unit at a time. Configure the LACP ports on both the switch and slave FGT but do NOT bring the ports live yet. Ensure HA override is on and the priority on the slave is higher. Once you are satisfied that all the config is correct, you will have to flip over. Do this by admin down all ports on the active FGT on the switch, and then bring up all ports for the slave FGT on the switch. Please note that no sessions will survive this obviously, but if you have done everything properly you will see less than a minute downtime plus session re-establish time.

At this point you work with the other FGT unit and configure it likewise with LACP. Ensure HA override is enabled and priority is lower than the active unit. Connect the HA links and wait for HA to sync up. Once that has completed, you can bring the switch ports live. You should now have the cluster back on the network with new config.

Doing it this way allows you to switch quickly from "old" config to "new" config in case something with LACP doesn't work out and also avoids any split brain scenarios.

Incidentally, if this is something you think you'll be doing with any regularity you may wish to use VDOM's. Root VDOM for management and then one other VDOM for all your policies. The reason why is that you can reload config for a VDOM (as opposed to the whole system) without rebooting, thus it's an "instant" reconfigure.

-- Sean Toomey, CISSP FCNSP Consulting Security Engineer (CSE) FORTINETâ€” High Performance Network Security

Robin_Svanberg · ‎04-28-2015

Sean_Toomey_FTNT wrote:
There are ways you can minimize impact but you will take a brief downtime no matter what you do.

You can try this.. admin down all ports going to slave on the switch, and then disconnect HA link between the units. This will break HA so that you can work with one unit at a time. Configure the LACP ports on both the switch and slave FGT but do NOT bring the ports live yet. Ensure HA override is on and the priority on the slave is higher. Once you are satisfied that all the config is correct, you will have to flip over. Do this by admin down all ports on the active FGT on the switch, and then bring up all ports for the slave FGT on the switch. Please note that no sessions will survive this obviously, but if you have done everything properly you will see less than a minute downtime plus session re-establish time.

At this point you work with the other FGT unit and configure it likewise with LACP. Ensure HA override is enabled and priority is lower than the active unit. Connect the HA links and wait for HA to sync up. Once that has completed, you can bring the switch ports live. You should now have the cluster back on the network with new config.

Doing it this way allows you to switch quickly from "old" config to "new" config in case something with LACP doesn't work out and also avoids any split brain scenarios.

Incidentally, if this is something you think you'll be doing with any regularity you may wish to use VDOM's. Root VDOM for management and then one other VDOM for all your policies. The reason why is that you can reload config for a VDOM (as opposed to the whole system) without rebooting, thus it's an "instant" reconfigure.

Thanks for all suggestions.

We did the configuration last week as Sean_Toomey_FTNT suggested.

Robin Svanberg Network Consultant @ Ethersec AB in Östersund, Sweden

robin.svanberg@ethersec.se

JRdiaz · ‎04-29-2015

i have that same setup (2 600c in A-P mode) here and i've restored my config a lot of times now... ive done it in two ways: first was to take the primary unit off the cluster reset it to factory default, restore the configuration then rejoin the unit to the cluster making sure it has better priority. the new configuration will be downloaded to the secondary after that.

another way is to do it while in the cluster, i think this one will cause some downtime though.

Behaviour when restoring config on an A-P cluster

Nominate a Forum Post for Knowledge Article Creation