Technical Tip: How to verify HA cluster members individual uptime

ESCHAN_FTNT · ‎04-02-2020

Description

This article describes how to verify HA cluster members individual uptime.

From the command 'get system ha status' FGVM04TM19-----3 is being selected as Primary due to the largest value of uptime:

Primary (global) # get system ha status
HA Health Status: OK
Model: FortiGate-VM64
Mode: HA A-P
Group: 100
Debug: 0
Cluster Uptime: 0 days 0:39:28
Cluster state change time: 2020-03-30 17:18:09
Primary selected using:
    <2020/03/30 17:18:09> FGVM04TM19-----3 is selected as the primary because it has the largest value of uptime.
    <2020/03/30 17:15:46> FGVM04TM19-----4 is selected as the primary because it has the largest value of uptime.
    <2020/03/30 16:39:47> FGVM04TM19-----3 is selected as the primary because it has the largest value of override priority.
    <2020/03/30 16:39:00> FGVM04TM19-----3 is selected as the primary because it's the only member in the cluster.
ses_pickup: enable, ses_pickup_delay=disable
override: enable
Configuration Status:
    FGVM04TM19-----3(updated 4 seconds ago): in-sync
    FGVM04TM19-----4(updated 2 seconds ago): in-sync
System Usage stats:
    FGVM04TM19-----3(updated 4 seconds ago):
        sessions=25, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=63%
    FGVM04TM19-----4(updated 2 seconds ago):
        sessions=16, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=61%
HBDEV stats:
    FGVM04TM19-----3(updated 4 seconds ago):
        port10: physical/10000full, up, rx-bytes/packets/dropped/errors=8409460/31503/0/0, tx=35385838/37462/0/0
    FGVM04TM19-----4(updated 2 seconds ago):
        port10: physical/10000full, up, rx-bytes/packets/dropped/errors=33122327/35969/0/0, tx=8409529/31458/0/0
MONDEV stats:
    FGVM04TM19-----3(updated 4 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=10471996/40295/0/0, tx=840/14/0/0
    FGVM04TM19-----4(updated 2 seconds ago):
        port3: physical/10000full, up, rx-bytes/packets/dropped/errors=4344523/15513/0/0, tx=360/6/0/0
Primary: Primary          , FGVM04TM19-----3, cluster index = 1
Secondary: Secondary           , FGVM04TM19-----4, cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGVM04TM19-----3, operating cluster index = 0
Secondary: FGVM04TM19-----4, operating cluster index = 1

However, from 'get system performance status', FGVM04TM19-----3 and FGVM04TM19----4 have identical uptime of 44 minutes, how does the HA cluster select FGVM04TM19-----3 due to having higher uptime?

Primary (global) # get system performance status | grep Uptime
Uptime: 0 days, 0 hours, 44 minutes
Secondary (global) # get system performance status | grep Uptime
Uptime: 0 days, 0 hours, 44 minutes

Scope

FortiGate.

Solution

Use the command 'diagnose sys ha dump-by group' to verify HA member individual uptime:

Primary (global) # diagnose sys ha dump-by group
            HA information.
group-id=100, group-name='fortigate'

gmember_nr=2
'FGVM04TM19-----3': ha_ip_idx=1, hb_packet_version=6, last_hb_jiffies=0, linkfails=0, weight/o=0/0
'FGVM04TM19-----4': ha_ip_idx=0, hb_packet_version=8, last_hb_jiffies=236404, linkfails=0, weight/o=0/0
        hbdev_nr=1: port10(mac=000c..05, last_hb_jiffies=236404, hb_lost=0),

vcluster_nr=1
vcluster_0: start_time=1585558721(2020-03-30 16:58:41), state/o/chg_time=2(work)/3(standby)/1585559889(2020-03-30 17:18:09)
        pingsvr_flip_timeout/expire=3600s/3572s
        mondev: port3(prio=50,is_aggr=0,status=1)
        'FGVM04TM19-----3': ha_prio/o=0/1, link_failure=0, pingsvr_failure=0, flag=0x00000001, uptime/reset_cnt=1167/1
        'FGVM04TM19-----4': ha_prio/o=1/0, link_failure=0, pingsvr_failure=0, flag=0x00000000, uptime/reset_cnt=0/1

Referring to the 'start_time', FGVM04TM19-----3 uptime is from 2020-03-30 16:58:41. Check on FGVM04TM19-----4:

Secondary (global) # diagnose sys ha dump-by group
            HA information.
group-id=100, group-name='fortigate'

gmember_nr=2
'FGVM04TM19-----3': ha_ip_idx=1, hb_packet_version=6, last_hb_jiffies=236702, linkfails=0, weight/o=0/0
        hbdev_nr=1: port10(mac=000c..4f, last_hb_jiffies=236702, hb_lost=0),
'FGVM04TM19-----4': ha_ip_idx=0, hb_packet_version=8, last_hb_jiffies=0, linkfails=0, weight/o=0/0

vcluster_nr=1
vcluster_0: start_time=1585559888(2020-03-30 17:18:08), state/o/chg_time=3(standby)/2(work)/1585559889(2020-03-30 17:18:09)
        pingsvr_flip_timeout/expire=3600s/3541s
        mondev: port3(prio=50,is_aggr=0,status=1)
        'FGVM04TM19-----3': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, uptime/reset_cnt=1167/1
        'FGVM04TM19-----4': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, uptime/reset_cnt=0/1

The uptime for FGVM04TM19------4 is actually from 2020-03-30 17:18:08. Therefore FGVM04TM19-----3 is being selected as HA primary due to higher uptime.

The time difference between both units is actually 1167 seconds.
Which is higher than the default 'ha-uptime-diff-margin' of 300 seconds, therefore FGVM04TM19-----4 will be selected as Primary when override is not enabled.

Device uptime provides information on how long the member is up.

Cluster uptime detailed how long at least one member of the cluster has been able to handle the traffic. Even if a failover occurs, this time will not be reset.

HA uptime is a timer used for the election process of the primary device in cluster A-P. This value detailed the time that a device has been primary without an event that would trigger a new election process.

A dedicated article has been written about the HA uptime for chassis devices (FortiGate 6000 and 7000) :

Technical Tip: Understanding the HA uptime for Chassis based device

Technical Tip: How to verify HA cluster members individual uptime

You are leaving our website