Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
deltasoft
New Contributor

Unexpected HA failover issues

Hello all, i have an issue with two Fortigate 60B configured in HA active-passive mode heartbeat interfaces: - WAN1 connected through a switch with dedicated vlan ports (untagged) - WAN2 connected directly with a cross cable Randomly several times a day the cluster start an HA failover with these logs: Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:27 devname=company-fw2 device_id=FGT60B3908650580 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=slave hbdn_reason=neighbor info lost devintfname=wan2 Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:27 devname=company-fw2 device_id=FGT60B3908650580 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=slave hbdn_reason=neighbor info lost devintfname=wan1 Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:28 devname=company-fw1 device_id=FGT60B3908670675 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=master hbdn_reason=neighbor info lost devintfname=wan2 Message meets Alert condition The following critical firewall event was detected: Critical Event. date=2012-04-13 time=22:49:28 devname=company-fw1 device_id=FGT60B3908670675 log_id=0105037901 type=event subtype=ha pri=critical fwver=040010 vd=" root" msg=" Heartbeat device(interface) down" ha_role=master hbdn_reason=neighbor info lost devintfname=wan1 - no power outage (firewalls and swithes are connected to an ups, switches are always online) - no switch problems (no evidence of restart or problems in their logs) I' ve tried to enable alternatively only one heartbeat interface, first wan1 then wan2, with no success. When the HA failover occurr, clients inside lan lost their internet connection and all vpn tunnels are brought down causing big connectivity troubles Initially there was only one firewall connected, working perfectly. When i added the second firewall in HA mode the problems started immediatley. In the past I' ve configured several others units in HA mode with no problems. I cannot explain the reason of this malfunctioning. I opened a support ticket more than one month ago, only to discovered that the technical support is very poor (one answer every 4-5 days) and it' s totally useless because they don' t have any idea how to solve the problem. Thanks in advance for your support, you' re my last chance :)
Bye Gianf
Bye Gianf
15 REPLIES 15
Matthijs
New Contributor II

Please login to the cli and type the following:
 config system ha
 show full
 
It might have something todo with the timers for HA. Do you monitor the CPU usage of the units? Maybe there is a problem with one of the units causing a high cpu load? What software version do you use?
deltasoft

Hi Matthijs this is the output: config system ha set group-id 0 set group-name " IG-HA" set mode a-p set password ENC xxxxxxxxxxxxxxxxxxxxxxxxxxx set hbdev " wan1" 50 " wan2" 100 set route-ttl 10 set route-wait 0 set route-hold 10 set sync-config enable set encryption disable set authentication disable set hb-interval 4 set hb-lost-threshold 20 set helo-holddown 20 set arps 5 set arps-interval 8 set session-pickup disable set link-failed-signal disable set uninterruptable-upgrade enable set vcluster2 disable set override disable set priority 128 unset monitor unset pingserver-monitor-interface set pingserver-failover-threshold 0 set pingserver-flip-timeout 60 end sw version 4.0 MR1 Patch 10, sorry i' ve missed that According to tech support i' ve already modified the timers this way: config system ha set hb-lost-threshold 6 --> to 20 set hb-interval 2 --> to 4 end but with no success. Monitoring did not show high cpu usage. Session pickup,initially enabled, has been disabled. Fortinet scheduled update has been set once a day out of working hours (3:00 am) Tnx
Bye Gianf
Bye Gianf
ede_pfau
SuperUser
SuperUser

Why do you use interface monitoring at all? You can achieve unit failure detection with HA heartbeats alone. If your WAN device fails you can detect that via Gateway detection and re-route. Which FortiOS version?
Ede Kernel panic: Aiee, killing interrupt handler!
Ede Kernel panic: Aiee, killing interrupt handler!
deltasoft

Hi ede_pfau, so do you suggest to disable monitoring? Do you think this could be solve my problem and prevent further HA failovers? OS 4.0 MR1 P10 Tnx
Bye Gianf
Bye Gianf
deltasoft
New Contributor

About units usage: small office, ~40 users mainly web and email traffic only Fortinet web filtering enabled, no antivirus/antispam, default ips 3 ipsec vpn tunnels, 2 lan to lan and 1 roadwarrior I was thinking to upgrade the fw to 4.0 MR2 or MR3 latest patch to see if the problem could be solved, last week i' ve asked the tech support about this with no answer :(
Bye Gianf
Bye Gianf
ede_pfau
SuperUser
SuperUser

OK so you have HA linked the FGTs on WAN2 (primarily) and WAN1. You do not monitor any link failure apart from these. So no need to change the configuration. To me the configuration looks correct. If even increasing the timeout period doesn' t help...I would consider upgrading to v4.00 MR2 patch 12. To be honest I don' t see any striking errors which could solve the problem immediately. So, upgrading as a last resort. Please note that there have been changes from 4.1 to 4.2, please read the Release Notes carefully! Also, it would help if you are onsite while upgrading. If the cluster breaks during the upgrade there is a risk of loosing the slave.
Ede Kernel panic: Aiee, killing interrupt handler!
Ede Kernel panic: Aiee, killing interrupt handler!
deltasoft

Hi ede_pfau
Please note that there have been changes from 4.1 to 4.2, please read the Release Notes carefully! Also, it would help if you are onsite while upgrading. If the cluster breaks during the upgrade there is a risk of loosing the slave.
What are the relevant changes to pay attention in your opinion? I can manage 1-2 hours of internet outage, scheduling it at the end of working period I was thinking to upgrade according to these steps: 1) break the cluster and disconnect the 2nd unit 2) save the config 3) reset the 2nd unit to factory defaults 4) upgrade the 2nd unit to 4.0 MR2 5) reload the config to the 2nd unit 6) physically switch the two units 7) reset the 1st unit to factory defaults 8) upgrade the 1st unit to 4.0 MR2 9) connect the 1st unit and rejoin the cluster Do you think it' s a correct upgrade plan?
Bye Gianf
Bye Gianf
rwpatterson
Valued Contributor III

Why don' t you just try to upgrade them as a stack? If it fails, then you could break the stack and do it individually. Those issues may just be that the firmware level has problems with HA. Also, and this is important: If you upgrade the unit in the factory default state, you cannot safely restore the older config. Each backup is only right for the level of code it was made from. You should load the same older version, then restore the config and upgrade with the config in place. This will give you the best chance at a working system after it' s all done.

Bob - self proclaimed posting junkie!
See my Fortigate related scripts at: http://fortigate.camerabob.com

Bob - self proclaimed posting junkie!See my Fortigate related scripts at: http://fortigate.camerabob.com
ede_pfau
SuperUser
SuperUser

I can only support what Bob (rwpatterson) has posted. Keep it simple. Regarding the changes, all details are in the RN. Judge for yourself if they are important to your setup.
Ede Kernel panic: Aiee, killing interrupt handler!
Ede Kernel panic: Aiee, killing interrupt handler!
Announcements

Select Forum Responses to become Knowledge Articles!

Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.

Labels
Top Kudoed Authors