HA Issues

dkonate · ‎10-28-2024

Hello Everyone,

we have a problem with the configuration of our HA, the HA is well configured and synchronized but the problem is that the master works well, but as soon as there is a problem on the master and we switch to the slave there is no traffic passing through the slave and we lose all access to the internet until the master is restored.

a lacp conf has been set up (the master and the slave belong to the same LACP aggregate on the switch side).

Initially, when I plugged the ports, they were all UP, but the slave ports went down later after a LACP negotiation I guess.

https://docs.fortinet.com/document/fortigate/6.4.15/administration-guide/666376

dkonate · ‎11-20-2024

Hello,

below the output of the commands:

fw1 # diagnose sys link-monitor status

fw1 # get system ha status
HA Health Status: OK
Model: FortiGate-1101E
Mode: HA A-P
Group: 0
Debug: 0
Cluster Uptime: 0 days 19:29:18
Cluster state change time: 2024-11-19 18:07:55
Primary selected using:
<2024/11/19 18:07:55> FG10E1 is selected as the primary because it has the largest value of override pr
iority.
<2024/11/19 18:03:58> FG10E1 is selected as the primary because it's the only member in the cluster.
ses_pickup: disable
override: disable
Configuration Status:
FG10E1(updated 3 seconds ago): in-sync
FG10E1(updated 2 seconds ago): in-sync
System Usage stats:
FG10E1(updated 3 seconds ago):
sessions=89418, average-cpu-user/nice/system/idle=3%/0%/5%/90%, memory=49%
FG10E1(updated 2 seconds ago):
sessions=0, average-cpu-user/nice/system/idle=1%/0%/0%/98%, memory=32%
HBDEV stats:
FG10E1(updated 3 seconds ago):
ha: physical/1000auto, up, rx-bytes/packets/dropped/errors=241182508/586184/0/0, tx=525872520/1425049/0/0
FG10E1(updated 2 seconds ago):
ha: physical/1000auto, up, rx-bytes/packets/dropped/errors=525154181/1423780/0/0, tx=238300951/548820/0/0
MONDEV stats:
FG10E1(updated 3 seconds ago):
LAN_GENES: aggregate/00, up, rx-bytes/packets/dropped/errors=361372988422/1011492711/0/0, tx=780920810340/125
4371795/0/0
TOR-DATACENTER: aggregate/00, up, rx-bytes/packets/dropped/errors=508650687759/1102948589/0/0, tx=45573173966
9/1055327810/0/0
WAN-RENATER: aggregate/00, up, rx-bytes/packets/dropped/errors=418812907934/387705802/0/0, tx=109625066959/23
9398427/0/0
FG10E1(updated 2 seconds ago):
LAN_GENES: aggregate/00, up, rx-bytes/packets/dropped/errors=1453997498/7880114/0/0, tx=504064/3938/0/0
TOR-DATACENTER: aggregate/00, up, rx-bytes/packets/dropped/errors=2287558/9356/0/0, tx=256/2/0/0
WAN-RENATER: aggregate/00, up, rx-bytes/packets/dropped/errors=1084836/4676/0/0, tx=0/0/0/0
Primary : fw1 , FG10E1, HA cluster index = 0
Secondary : fw2 , FG10E1, HA cluster index = 1
number of vcluster: 1
vcluster 1: work 169.254.0.1
Primary: FG10E1, HA operating index = 0
Secondary: FG10E1, HA operating index = 1

fw1#

Toshi_Esumi · ‎11-21-2024

This is only for the primary one - fw1. But HA is in sync so fw2 should have a mirror imaged output.
Then how "config sys ha" is configured? Are you monitoring interfaces? Then when fw1 is down, what's in this ha status on fw2?
Mostlikely your switch's vlans are misconfigured on the fw2 side and the fw2 itself is operating as primary without any problem.
Can you ping any internal devices from fw2 when fw1 is down?

Toshi

Toshi_Esumi · ‎11-21-2024

Oh, now I see you posted the fw2's ha status. As expected, it's mirror imaged when fw1 IS UP.

dkonate · ‎11-25-2024

hello
Yes, we monitor the lan interfaces as well as the wan interfaces.
when fw1 is rebooting we have no access to fw2 with IP address to see ha status
basically if we restart fw1 we lose all access we can't even ping the firewall ip address until fw1 restarts

Toshi_Esumi · ‎11-25-2024

I think that's indicating what the problem is. When the fw1 goes down, either rebooting or shutdown, the fw2 should take over and you should be able to reach fw2 with the same IP you were using to get to fw1(active unit). I don't think the path you're using to get to the IP is not connected to fw2.

Without having either a dedicated-to/out-of-band management interface interface connections to both FGTs or (remote) console access to both FGTs, it's very difficult to troubleshoot HA problem like this. Even if you open a ticket with TAC and that TAC person tries to solve the problem, he/she would ask you this.

Toshi

dkonate · ‎11-26-2024

Hello,

thank you very much for this information, I think the best thing to do is to completely break the HA and reconfigure it again, at the same time we also created a fortinet ticket, we will see what that will give

CSOS · ‎11-26-2024

Hello,

below the output of the commands:

diag sys ha dump-by group

dkonate · ‎11-26-2024

Hello,

fw1 # diagnose sys ha dump-by group
<hatalk> HA information.
group-id=0, group-name='HA-cluster'
has_no_hmac_password_member=0
has_no_aes128_gcm_sha256_member=0

gmember_nr=2
'FGXXX3': ha_ip_idx=1, hb_packet_version=291, last_hb_jiffies=58351446, linkfails=25, weight/o=0/0, support
_hmac_password=1, support_aes128_gcm_sha256=1
hbdev_nr=1: ha(mac=e023..a3, last_hb_jiffies=58351446, hb_lost=0),
'FGXXX1': ha_ip_idx=0, hb_packet_version=3, last_hb_jiffies=0, linkfails=0, weight/o=0/0, support_hmac_pass
word=1, support_aes128_gcm_sha256=1

vcluster_nr=1
vcluster_0: start_time=1732035818(2024-11-19 18:03:38), state/o/chg_time=2(work)/2(work)/1732035838(2024-11-19 18:03:
58)
pingsvr_flip_timeout/expire=3600s/0s
mondev: LAN_GE(prio=50,is_aggr=1,status=1) TOR-DATACENTER(prio=50,is_aggr=1,status=1) WAN-TER(prio=50,
is_aggr=1,status=1)
'FGXXX3': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, uptime/reset_cnt=8/0
'FGXXX1': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, uptime/reset_cnt=257/0

HA Issues

Nominate a Forum Post for Knowledge Article Creation

You are leaving our website