edit
TL;DR: If you upgrade a HA cluster by inserting a USB stick into one of the members and running 'exec restore image usb', the cluster will upgrade but will then likely to crash due to 'different hdisk status'. The cause seems to be that the usb stick is mounted after reboot and the other member gets sad because it doesn't have one too. Removing the USB and rebooting all units restores the cluster, but that's not cool if you're hundreds of kilometres away.
Note internal storage was suspected but is not actually relevant to the problem.
Workaround - install identical USB sticks in each cluster member (only tested with a cluster of two).
/edit.
I've previously posted regarding `system internal storage` issues when loading configurations using exec restore config usb onto a standalone FGT60D. In that case the issue is merely annoying. Our current preference is to delete this section completely.
We have two methods we use to build a cluster and both result in a working configuration.
We always prep each new unit by formatting from the boot menu and loading firmware via tftp.
After this step, each unit has a system storage entry.
We always make sure that the hardware revision of the cluster members matches.
Method 1 (legacy due to issues with repeatability, included only for completeness):
View master default config, add system storage to target configuration file, especially the partition value.
Load config into master, set override and priority.
Make slight modifications to the slave unit (primarily to convert to interface mode), configure HA without override and priority
Connect, allow config to sync to slave.
Check cluster formed OK.
Method 2:
Load config (with no system storage) into master, set override and priority.
Load config (with no system storage) into master, unset override and priority.
Connect.
Check cluster formed OK.
Beer.
Following method 2, the cluster looks fine and both units report no system storage. There is so far no known way to identify that an upgrade might fail...
Then upgrade the firmware from 5.2.3 to 5.2.5 (upgrade path to 5.4)
The result is that when the master upgrade completes, the slave unit shuts down due to different hdisk status.
Once the slave is disconnected and restarted, it has an entry for system storage, but the master does not.
We do not know why the system storage for the slave has re-appeared.
The failure is repeatable and we don't actually know how to restore the cluster other than using a workaround from TAC. The
TAC workaround is exec ha ignore-hardware-revision enable on both units but that's not very satisfactory since we can't see any difference beforehand.
If the TAC workaround is applied, the firmware upgrade succeeds. If the workaround is removed from the upgraded cluster (because there seems to be no need for it), we see the same result - slave shutdown after the restart.
Does anyone know how to avoid this?
Nominating a forum post submits a request to create a new Knowledge Article based on the forum post topic. Please ensure your nomination includes a solution within the reply.
I now believe that the problem is the usb used for the firmware upgrade, awaiting confirmation from TAC.
The hypothesis is that the FGT mounts any USB detected at startup and therefore when the cluster tries to form, one has a mounted usb and one does not, they have a hdisk mismatch and it's game over.
Presuming this is the case, does anyone know a way to prevent a USB being mounted at startup, whilst allowing normal usb usage? Is there a command to do this? Note that if the usb is inserted whilst the FGT is running, fnsysctl df does not show the USB is mounted but it is still available.
Backup genius idea - install identical USBs in both cluster members *runs off to try it*
*happy dance* installing identical USB sticks did the trick. See edit to original post.
Still taking offers for a better way to prevent this issue...
Select Forum Responses to become Knowledge Articles!
Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.
User | Count |
---|---|
1732 | |
1106 | |
752 | |
447 | |
240 |
The Fortinet Security Fabric brings together the concepts of convergence and consolidation to provide comprehensive cybersecurity protection for all users, devices, and applications and across all network edges.
Copyright 2024 Fortinet, Inc. All Rights Reserved.