Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
AaronChih
New Contributor

HA sync failed in 3 times

Hi all, If you can not understand what i'm saying,I apologize for my poor English.

 

I have a ha cluster using 1500D with version 5.0.14 and without ntp server.

When I adjust system time to real time.My slave forti starting appear error log like

 

date=2016-11-07 time=11:16:55 logid=0100035010 type=event subtype=system level=error vd="root" msg="HA slave sync failed in 3 turns"

 

Logs like this appear 2-3 times daily.

After I config time before change,this log still appear.

I've tried re-build HA cluster but looks like didn't work.

 

Really need help.Thanks

 

I've show checksum in both of forti,and they are the same

3 Solutions
ede_pfau
Esteemed Contributor III

I see that it's flagged as an 'error' but if the checksums are identical then there's nothing to worry about. Time keeping is essential with clusters as some decisions are based on i.e. the uptime count. So cluster members should be in sync with a time source.

Depending on the firmware version you could make one member an NTP server and hook the other up to it. Absolute time wouldn't necessarily be correct but relative time difference would be minimal.

I'm not sure v5.0 has got that feature, but that's easy to spot in the GUI (Status - setting time): there should be a checkbox for 'act as NTP server'. In v5.2 it's included, for sure.


Ede

"Kernel panic: Aiee, killing interrupt handler!"

View solution in original post

Ede"Kernel panic: Aiee, killing interrupt handler!"
ede_pfau
Esteemed Contributor III

This should not happen at all, I agree.

As the Release Notes for v5.0.14 state, "there are no known issues for this release".

It's up to you to open a support ticket with FTNT. Depends on how often the fault occurs, or how often you switch HA mode.

Before fiddling around with HA mode I always reboot both members, just to be sure. I know that might be difficult in a production environment.


Ede

"Kernel panic: Aiee, killing interrupt handler!"

View solution in original post

Ede"Kernel panic: Aiee, killing interrupt handler!"
Toshi_Esumi
Esteemed Contributor III

Aaron, have you compared the entire config ("diff") between them? You might find something if you do. That how we found an HA issue (not harmful) with 5.2.9.

View solution in original post

9 REPLIES 9
ede_pfau
Esteemed Contributor III

I see that it's flagged as an 'error' but if the checksums are identical then there's nothing to worry about. Time keeping is essential with clusters as some decisions are based on i.e. the uptime count. So cluster members should be in sync with a time source.

Depending on the firmware version you could make one member an NTP server and hook the other up to it. Absolute time wouldn't necessarily be correct but relative time difference would be minimal.

I'm not sure v5.0 has got that feature, but that's easy to spot in the GUI (Status - setting time): there should be a checkbox for 'act as NTP server'. In v5.2 it's included, for sure.


Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
AaronChih

Hi Ede,

Thanks for reply.

I know that system time will be sync by cluster,I've been check via CLI get system status.

But I'm still wonder why sync failed after change system time?

Uptime is calculator by system time,I change both time on Forti but make sync fail.

This really confused me,how should I fixed it ?

 

 

BTW,I'm not consider build NTP server right now

ede_pfau
Esteemed Contributor III

As I said, I don't see any reason to change anything at the moment. Yes, there is a log message and no, it doesn't mean the cluster operation is in danger.

Do you change the local time often? If you manually adjust the time, is that on the master unit only?


Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
AaronChih

Hi Ede,

I think I only change time twice in this three days,first time is adjust real time (forward 1 hour), second is fallback time (backward 1 hour).

I was using web GUI to change time,so I think it's on master unit?

but when I login slave via CLI and check get system status , it's shows the same with master.

 

Thanks for kindly reply :)

ede_pfau
Esteemed Contributor III

Yep, GUI is the master unit (except for if you use management ports).

 

Setting up a HA cluster means you run all kinds of tests, like change an option on master (via CLI) and see if that propagates to the slave, or shutting down the master (failover to slave), restarting master (failback from slave)...

If all of this is working there's nothing to worry about.


Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
AaronChih

Hi Ede ,

Need help again :(

I receive this log after set ha mode standalone today:

date=2016-11-10 time=14:57:18 logid=0100032546 type=event subtype=system level=warning vd="root" action=crash msg="Pid: 02750, application: hasync, Firmware: FortiGate-1500D v5.0.14,build0323b323,160713 (Release), Signal 11 received, Backtrace: [0x007aefb1] [0x006a1183] [0x0069367c] [0x00693e49] [0x006a6e48] [0x0043c2d0] [0x0043bdb3] [0x0043a601] [0x0043ba14] [0x00439913] [0x2a95e46475] [0x00439b11]"

 

Is that means the process "hasync" crashed? I've never receive log like this after quit ha cluster.

ede_pfau
Esteemed Contributor III

This should not happen at all, I agree.

As the Release Notes for v5.0.14 state, "there are no known issues for this release".

It's up to you to open a support ticket with FTNT. Depends on how often the fault occurs, or how often you switch HA mode.

Before fiddling around with HA mode I always reboot both members, just to be sure. I know that might be difficult in a production environment.


Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
Toshi_Esumi
Esteemed Contributor III

Aaron, have you compared the entire config ("diff") between them? You might find something if you do. That how we found an HA issue (not harmful) with 5.2.9.

Aditya1
New Contributor

HA sync failed in 3 times.

HI all,

Good morning!!

Same issue i am facing with V6.4.9 build 1966(GA).  checksum cluster is same. we check with reboot both primary and secondary, also try with checksum recalculate but still showing not in syn. kindly suggest into this.

I have also upgraded into 6.4.12 but still same issue. 

 

 

 



 

 

 

 

Labels
Top Kudoed Authors