Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
deltasoft
New Contributor

Unexpected HA failover issues

Hello all, i have an issue with two Fortigate 60B configured in HA active-passive mode heartbeat interfaces: - WAN1 connected through a switch with dedicated vlan ports (untagged) - WAN2 connected directly with a cross cable Randomly several times a day the cluster start an HA failover with these logs: Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:27 devname=company-fw2 device_id=FGT60B3908650580 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=slave hbdn_reason=neighbor info lost devintfname=wan2 Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:27 devname=company-fw2 device_id=FGT60B3908650580 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=slave hbdn_reason=neighbor info lost devintfname=wan1 Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:28 devname=company-fw1 device_id=FGT60B3908670675 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=master hbdn_reason=neighbor info lost devintfname=wan2 Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:28 devname=company-fw1 device_id=FGT60B3908670675 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=master hbdn_reason=neighbor info lost devintfname=wan1 - no power outage (firewalls and swithes are connected to an ups, switches are always online) - no switch problems (no evidence of restart or problems in their logs) I' ve tried to enable alternatively only one heartbeat interface, first wan1 then wan2, with no success. When the HA failover occurr, clients inside lan lost their internet connection and all vpn tunnels are brought down causing big connectivity troubles Initially there was only one firewall connected, working perfectly. When i added the second firewall in HA mode the problems started immediatley. In the past I' ve configured several others units in HA mode with no problems. I cannot explain the reason of this malfunctioning. I opened a support ticket more than one month ago, only to discovered that the technical support is very poor (one answer every 4-5 days) and it' s totally useless because they don' t have any idea how to solve the problem. Thanks in advance for your support, you' re my last chance :)
Bye Gianf
Bye Gianf
15 REPLIES 15
deltasoft

Ok thanks all, l' ll let you know What about upgrading to 4.0 MR3? Is it yet ready for production?
Bye Gianf
Bye Gianf
rwpatterson
Valued Contributor III

Personally, I would stick with MR2. Get the kinks ironed out, then move up if you feel you need the newer features.

Bob - self proclaimed posting junkie!
See my Fortigate related scripts at: http://fortigate.camerabob.com

Bob - self proclaimed posting junkie!See my Fortigate related scripts at: http://fortigate.camerabob.com
deltasoft

Thanks Bob i' ll follow your advice :)
Bye Gianf
Bye Gianf
deltasoft
New Contributor

Ok, yesterday i' ve upgraded the firmware of the cluster to 4.0 MR2 Patch 12 I' ll wait to see if the problem still persist. In the meantime there' s a new problem, the units do not synchronize between them. Here is the console log: company-fw2 login: slave' s external files are not in sync with master, sequence:0. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:0. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:1. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:2. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:3. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:4. (type CERT_LOCAL) slave succeeded to sync external files with master slave' s configuration is not in sync with master' s, sequence:0 slave' s configuration is not in sync with master' s, sequence:1 slave' s configuration is not in sync with master' s, sequence:2 slave' s configuration is not in sync with master' s, sequence:3 slave' s configuration is not in sync with master' s, sequence:4 slave starts to sync with master logout all admin users slave failed to sync with master, will try again in a moment slave' s configuration is not in sync with master' s, sequence:0 slave' s configuration is not in sync with master' s, sequence:1 slave' s configuration is not in sync with master' s, sequence:2 slave' s configuration is not in sync with master' s, sequence:3 slave' s configuration is not in sync with master' s, sequence:4 slave starts to sync with master logout all admin users slave failed to sync with master, will try again in a moment slave' s configuration is not in sync with master' s, sequence:0 slave' s configuration is not in sync with master' s, sequence:1 slave' s configuration is not in sync with master' s, sequence:2 slave' s configuration is not in sync with master' s, sequence:3 slave' s configuration is not in sync with master' s, sequence:4 slave starts to sync with master logout all admin users slave failed to sync with master, will try again in a moment here the synchronization stop, to resume it i need to reboot the slave unit, but the problem still persist: company-fw2 login: admin Password: ********* Welcome ! company-fw2 # execute reboot This operation will reboot the system ! Do you want to continue? (y/n)y The system is going down NOW !! System is rebooting... company-fw2 # Please stand by while rebooting theFGT60B (15:29-09.06.2007) Ver:04000006 Serial number:FGT60B9999999999 RAM activation Total RAM: 256MB Enabling cache...Done. Scanning PCI bus...Done. Allocating PCI resources...Done. Enabling PCI resources...Done. Zeroing IRQ settings...Done. Verifying PIRQ tables...Done. Boot up, boot device capacity: 64MB. Press any key to display configuration menu... ...... Reading boot image 1817002 bytes. Initializing firewall... System is started. company-fw2 login: slave' s external files are not in sync with master, sequence:0. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:1. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:2. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:3. (type CERT_LOCAL) slave' s external files are not in sync with master, sequence:4. (type CERT_LOCAL) slave succeeded to sync external files with master slave' s configuration is not in sync with master' s, sequence:0 slave' s configuration is not in sync with master' s, sequence:1 slave' s configuration is not in sync with master' s, sequence:2 slave' s configuration is not in sync with master' s, sequence:3 slave' s configuration is not in sync with master' s, sequence:4 slave starts to sync with master logout all admin users slave failed to sync with master, will try again in a moment slave' s configuration is not in sync with master' s, sequence:0 slave' s configuration is not in sync with master' s, sequence:1 slave' s configuration is not in sync with master' s, sequence:2 slave' s configuration is not in sync with master' s, sequence:3 slave' s configuration is not in sync with master' s, sequence:4 slave starts to sync with master logout all admin users slave failed to sync with master, will try again in a moment slave' s configuration is not in sync with master' s, sequence:0 slave' s configuration is not in sync with master' s, sequence:1 slave' s configuration is not in sync with master' s, sequence:2 slave' s configuration is not in sync with master' s, sequence:3 slave' s configuration is not in sync with master' s, sequence:4 slave starts to sync with master logout all admin users slave failed to sync with master, will try again in a moment and the synchronization stops again. I' ve followed this kb: http://kb.fortinet.com/kb/microsites/search.do?cmd=displayKC&docType=kc&externalId=FD31379&sliceId=1&docTypeID=DT_KCARTICLE_1_1&dialogID=34334483&stateId=0 0 34336209 without success, the command " execute ha synchronize config" does not start any synchronization. I didn' t restart the primary unit yet, i' m afraid that if i will restart it i will lose access to the cluster because i don' t know how correctly the slave unit will work.
Bye Gianf
Bye Gianf
rwpatterson
Valued Contributor III

When you log into the slave, check the firmware version. Make sure it upgraded as well.

Bob - self proclaimed posting junkie!
See my Fortigate related scripts at: http://fortigate.camerabob.com

Bob - self proclaimed posting junkie!See my Fortigate related scripts at: http://fortigate.camerabob.com
deltasoft

It seems all ok: company-fw1 # get system status Version: Fortigate-60B v4.0,build0346,120606 (MR2 Patch 12) Virus-DB: 15.00748(2012-06-24 15:28) Extended DB: 15.00734(2012-06-22 07:33) IPS-DB: 3.00203(2012-06-20 22:19) FortiClient application signature package: 1.503(2012-06-22 17:58) Serial-Number: FGT60XXXXXXXXXX BIOS version: 04000009 Log hard disk: Not available Internal Switch mode: switch Hostname: company-fw1 Operation Mode: NAT Current virtual domain: root Max number of virtual domains: 10 Virtual domains status: 1 in NAT mode, 0 in TP mode Virtual domain configuration: disable FIPS-CC mode: disable Current HA mode: a-p, master Distribution: International Branch point: 346 Release Version Information: MR2 Patch 12 System time: Mon Jun 25 11:36:55 2012 company-fw1 # execute ha manage 1 company-fw2 $ get system status Version: Fortigate-60B v4.0,build0346,120606 (MR2 Patch 12) Virus-DB: 15.00748(2012-06-24 15:28) Extended DB: 15.00734(2012-06-22 07:33) IPS-DB: 3.00203(2012-06-20 22:19) FortiClient application signature package: 1.503(2012-06-22 18:05) Serial-Number: FGT60BYYYYYYYYYYYY BIOS version: 04000006 Log hard disk: Not available Internal Switch mode: switch Hostname: company-fw2 Operation Mode: NAT Current virtual domain: root Max number of virtual domains: 10 Virtual domains status: 1 in NAT mode, 0 in TP mode Virtual domain configuration: disable FIPS-CC mode: disable Current HA mode: a-p, backup Distribution: International Branch point: 346 Release Version Information: MR2 Patch 12 System time: Mon Jun 25 11:37:20 2012 Only the BIOS version it' s different, do you think it' s a problem?
Bye Gianf
Bye Gianf
Labels
Top Kudoed Authors