Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
Ralph1973
Contributor

Fortigate 240D cluster out of sync every time

Hello,

One of our customers has a Fortigate 240d cluster with one unit in Datacenter A and one in Datacenter B. 

 Software is FortiOS 5.2.2

I have checked with commands

diag debug application hasync -1
diag debug application hatalk -1
diag debug enable

On the master I entered
execute ha synchronize start
diag sys ha status
diag sys ha showcsum
diag sys ha showcsum 1
diag sys ha showcsum 2
diag sys ha showcsum 3

The units get out of sync each time a number of (small) changes are made.
Are there people familar with this issue, is it FortiOS5.2.2 related perhaps?

Kind regards,
Ralph Willemsen
15 REPLIES 15
vjoshi_FTNT
Staff
Staff

Hello,

 

- Does the checksum value match on both the Master and Slave devices?

- Config is synced only when you run the command 'execute ha synchronize start' or they automatically sync after some time?

 

Verify if the below settings is enabled or not

config system ha

set sync-config enable   --->>> this is enabled by default, verify the same if it is not changed by mistake

end

 

If the above in place, then look for CPU usage of the Fortigate when the changes are made. If so, the sync will be given least preference over the other tasks handled by the CPU at that time.

 

Cheers!

Ralph1973
Contributor

Hello, thank you for your reply. The cluster is out of sync and stays out of sync now, even after executing the command "execute ha synchronize start"

See attached the checksums.

Fortinet suggests

Power down the slave unit. Connect only the heartbeat cables to the slave. Turn ON the power of the slave unit. Login to the master web Gui; go to > system > config > HA > if there are 2 images of Fortigate units > HA is formed after few minutes check with below command on BOTH units and compare the result of Master with Slave, if it is same slave is fully synchronized #diag sys ha showcs - If the checksums are same, - then you can connect the LAN/WAN cables to the slave I would recommend to remove the slave unit from cluster, execute factory reset and then plug it back to cluster.

emnoc
Esteemed Contributor III

How your NTP settings? And the status?

 

e.g

 

diag  sys ntp status

 

PCNSE 

NSE 

StrongSwan  

PCNSE NSE StrongSwan
vjoshi_FTNT
Staff
Staff

Hello Ralph, Based on the output, you have few things to be verified on : 1) Checksum is different - Yes, resetting the unit to factory defaults and re-building the HA may resolve the issue, but that should be last step - A reboot can be tried - I see there are no VDOMs enabled, it is default VDOM 'root', so below command should help : #diag sys ha showcsum 01 root This command will list the checksum for all the config and objects (like firewall policies, interface settings etc.,). You can get the output from both the devices and use any compare tool to verify which specific config is not syncing and there should be something wrong with that specific setting which can be corrected manually(if needed) 2) The device with lower priority (100) is the master now. - Did you trigger a manual failover to the slave unit? Check if the physical connectivity(monitored ports) on both the Fortigate units is same

 

As 'emnoc' mentioned in the previous post, check the system time also to be sure on that front.

Ralph1973
Contributor

Hello, thank you for your answers/suggestions.

This morning I executed a factory reset of the slave device and did a rebuilt of the cluster.

The slave says after a while that it synchronized successfully with the master, but after 10 or 20 seconds it is out of sync again.

I did the entire procedure once more (factory reset) but the problem stays the same.

The time is synched via fortiguard ntp services and appears to be correct.May it be caused by the software or does anyone have suggestions?

attached the comparision of master and slave ha checksums which are completely different :(

 

Thanks for your help,

Ralph

vjoshi_FTNT

Ralph1973 wrote:

Hello, thank you for your answers/suggestions.

This morning I executed a factory reset of the slave device and did a rebuilt of the cluster.

The slave says after a while that it synchronized successfully with the master, but after 10 or 20 seconds it is out of sync again.

I did the entire procedure once more (factory reset) but the problem stays the same.

The time is synched via fortiguard ntp services and appears to be correct.May it be caused by the software or does anyone have suggestions?

attached the comparision of master and slave ha checksums which are completely different :(

 

Thanks for your help,

Ralph

Checksum confirms that there are several differences in the config.

Now, you can compare the config of both the Master and Slave units with any diff tool like (notepad ++) and verify the settings which are not in sync. See if those settings(like address objects) are custom and has anything unusual like a special character or a space or even in the name or even in the comments section.

 

Also, as emnoc said, just try an addition of a test object and see if that reflects on the slave.

 

 

 

emnoc
Esteemed Contributor III

I personally would not go by that checksum, but capture the conf file from both units and run a "diff" and/or  "md5 checksum". You can tell if the 2 are in sync , by doing a same change on the master and ensure it's populated to the slave.

 

i.e

 

config firewall address

     edit  test123

           set subnet 192.0.2.1/32

           set comment " this is a test "

  end

 

And then jump on the  slave and show firewall address test123

 

Unless your unit is highly CPU/MEM, the changes should be push by the time you execute execute ha man1

 

PCNSE 

NSE 

StrongSwan  

PCNSE NSE StrongSwan
Ralph1973
Contributor

Hello, when extracting the config on master and slave via putty and comparing them, I only notice that entries are in a different place. The config itself looks identical. Also when adding an object on the master, you will see it back on the slave as well.

How strange is it.

Now Fortinet says to factory reset the other unit as well and rebuild the HA. I doubt if this would ever solve this sync issue. I think it may be caused by FortiOS 5.2.2

Thanks for your support.

Ralph

ede_pfau
SuperUser
SuperUser

Ralph,

 

did you ever find out the reason for this? I'm about to form a new cluster with just the same hardware (but probably v5.2.3) and would appreciate your input.

Ede Kernel panic: Aiee, killing interrupt handler!
Ede Kernel panic: Aiee, killing interrupt handler!
Announcements

Select Forum Responses to become Knowledge Articles!

Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.

Labels
Top Kudoed Authors