pingsvr: state=up(since 1970/01/01 00:00:00), server=, ha_prio=0 - HA issue

nick1001 · ‎01-30-2020

Hi

We have a pair of Fortigate 100D firewalls in A-P mode

We had an issue where the secondary was not joing, and upon checking found multiple disk erros so ran disk repair. that issue is now fixed.

However, when I run the command "get system ha status" we see the primary is the only member in the cluster and the pingsvr state of the secondary is:

pingsvr: state=up(since 1970/01/01 00:00:00), server=, ha_prio=0

Has anyone ever come across this?

I have rebooted the secondary to no effect.

the Firmware is 6.06 rev 0272 Fortigate 100D

Toshi_Esumi · ‎01-30-2020

I would check "diag sys ha history read" first on both units, look for event when you rebooted the save unit and what both sides saw over hatalk. Since both ended up master of its own, either heartbeat connection is not working as it should, or encountered an issue after the talk not to be able to sync as a-p. You can probably see which is the case in history. But make sure the intended slave doesn't have in/out connections now other than hb. The should-be-slave is acting as master now.

nick1001 · ‎01-31-2020

Thanks, this command shows the primary is the master on both the primary and secondary:

Pri

<2020-01-30 14:56:23> FG100D3G is elected as the cluster master of 2 members <2020-01-30 14:56:23> new member 'FG100D5' joins the cluster <2020-01-30 14:56:22> hbdev ha2 link status changed: 0->1 <2020-01-30 14:56:22> hbdev ha1 link status changed: 0->1 <2020-01-30 14:56:22> port ha2 link status changed: 0->1 <2020-01-30 14:56:22> port ha1 link status changed: 0->1 <2020-01-30 14:56:19> hbdev ha2 link status changed: 1->0 <2020-01-30 14:56:19> port ha2 link status changed: 1->0 <2020-01-30 14:56:17> hbdev ha1 link status changed: 1->0 <2020-01-30 14:56:17> port ha1 link status changed: 1->0 <2020-01-30 14:55:57> hbdev ha2 link status changed: 0->1 <2020-01-30 14:55:57> hbdev ha1 link status changed: 0->1 <2020-01-30 14:55:57> port ha2 link status changed: 0->1 <2020-01-30 14:55:57> port ha1 link status changed: 0->1 <2020-01-30 14:55:24> hbdev ha2 link status changed: 1->0 <2020-01-30 14:55:24> port ha2 link status changed: 1->0 <2020-01-30 14:55:24> hbdev ha1 link status changed: 1->0 <2020-01-30 14:55:24> port ha1 link status changed: 1->0 <2020-01-30 14:55:12> FG100D3 is elected as the cluster master of 1 members

<2020-01-30 14:55:12> heartbeats from FG100D5 are lost on all hbdev <2020-01-30 14:55:12> member FG100D5lost heartbeat on hbdev ha2 <2020-01-30 14:55:12> member FG100D5ost heartbeat on hbdev ha1 <2020-01-30 11:56:54> hbdev ha2 link status changed: 0->1 <2020-01-30 11:56:54> hbdev ha1 link status changed: 0->1 <2020-01-30 11:56:54> port ha2 link status changed: 0->1 <2020-01-30 11:56:54> port ha1 link status changed: 0->1 <2020-01-30 11:56:53> FG100D3s elected as the cluster master of 2 members <2020-01-30 11:56:53> new member 'FG100D5' joins the cluster <2020-01-29 10:51:11> hbdev ha2 link status changed: 1->0 <2020-01-29 10:51:11> hbdev ha1 link status changed: 1->0 <2020-01-29 10:51:11> port ha2 link status changed: 1->0 <2020-01-29 10:51:11> port ha1 link status changed: 1->0

secondary

<2020-01-30 14:56:22> FG100D3 is elected as the cluster master of 2 members <2020-01-30 14:56:21> new member 'FG100D3' joins the cluster <2020-01-30 14:56:21> hatalk started <2020-01-30 14:55:11> hatalk exited <2020-01-30 12:26:36> port port1 link status changed: 0->1 <2020-01-30 12:26:30> port mgmt link status changed: 0->1 <2020-01-30 11:55:47> hbdev ha2 link status changed: 0->1 <2020-01-30 11:55:47> port ha2 link status changed: 0->1 <2020-01-30 11:55:46> FG100D3G is elected as the cluster master of 2 members <2020-01-30 11:55:45> new member 'FG100D3' joins the cluster <2020-01-30 11:55:45> hbdev ha1 link status changed: 0->1 <2020-01-30 11:55:45> port ha1 link status changed: 0->1

The sec (FG100D5) was rebooted around 14:50

So they are both seeing the primary as the master but still we see

Pri

PINGSVR stats: FG100D3(updated 3 seconds ago): port1: physical/1000full, up, rx-bytes/packets/dropped/errors=29886869701034/37625704239/0/0, tx=30359107323111/37528760397/0/0 pingsvr: state=up(since 1970/01/01 00:00:00), server=, ha_prio=0 FG100D5updated 2 seconds ago): port1: physical/1000full, up, rx-bytes/packets/dropped/errors=23778449/308487/0/0, tx=0/0/0/0 pingsvr: state=N/A(since 2019/11/01 05:09:41), server=, ha_prio=0

Sec

PINGSVR stats: FG100D5updated 3 seconds ago): port1: physical/1000full, up, rx-bytes/packets/dropped/errors=23787931/308609/0/0, tx=0/0/0/0 pingsvr: state=N/A(since 1970/01/01 00:00:00), server=, ha_prio=0 FG100D3(updated 4 seconds ago): port1: physical/1000full, up, rx-bytes/packets/dropped/errors=29887354037236/37626308417/0/0, tx=30359599346632/37529363638/0/0 pingsvr: state=up(since 2020/01/30 14:56:25), server=, ha_prio=0

Noting that in 1/11/19 5:09am is when the two lost contact with each other and were not rejoined until 30/1/20

Secondary has been restarted,

Primary has not

Toshi_Esumi · ‎01-31-2020

I would look into two different problems. Even on 1/30, hbdeb (two links) kept bouncing. I assume it's doing now. Fixing that problem is the first.

Then you must be using remote link-monitor to trigger fail-over because my HA cluster's "get sys ha status" doesn't show the "pingsvr" line. I would assume the "state" shouldn't be "N/A" if the destination is pingable. I would troubleshoot that next.

nick1001 · ‎01-31-2020

Hi

The reason for hbdeb bounce was becuase I reloaded the secondary - this log no longer appears now

Pingsvr issue is why I raised the post, to see if anyone had come across this before

Toshi_Esumi · ‎01-31-2020

What's in "diag sys link-monitor status" then? No loss and 0 fail times all the time? Then the "N/A" might be an expected output. To confirm, you should open a ticket at TAC.

Toshi_Esumi · ‎01-31-2020

Oh, by the way, the remote link-monitor is available only the master unit in a-p since the same outgoing interface(s) on the slave doesn't pass traffic.

https://help.fortinet.com/fos60hlp/60/Content/FortiOS/fortigate-high-availability/HA_failoverRemoteL...

So, you should ignore whatever the ping status you see on the slave unit.

pingsvr: state=up(since 1970/01/01 00:00:00), server=, ha_prio=0 - HA issue

Nominate a Forum Post for Knowledge Article Creation

You are leaving our website