Skip to main content
m0j0
New Member
November 13, 2018
Solved

60E HA Configuration - Cluster won't form, both units showing as master

  • November 13, 2018
  • 1 reply
  • 8208 views

I'm setting up a pair of 60E's in HA but I'm unable to get the cluster to form, and both units think they are master.  The units were factory reset and both upgraded to 5.6.6.  I issued the following on the "primary":

config system dhcp server
delete 1
end
config firewall policy
delete 1
end
config system session-helper
delete 13
end
config system virtual-switch
delete internal
end

config system global
set hostname "ussv4gvlfw1-1"
end
config system ha
set group-name "ussv4gvlfwcluster1"
set mode a-p
set password DktJSnrGZGTVSF7v
set hbdev "internal6" 200 "internal7" 100
set override disable
set priority 110
set monitor "internal1"
end

 

And this on the "secondary":

config system dhcp server
delete 1
end
config firewall policy
delete 1
end
config system session-helper
delete 13
end
config system virtual-switch
delete internal
end
config system global
set hostname "ussv4gvlfw1-2"
end
config system ha
set group-name "ussv4gvlfwcluster1"
set mode a-p
set password DktJSnrGZGTVSF7v
set hbdev "internal6" 200 "internal7" 100
set override disable
set priority 100
set monitor "internal1"
end

 

I've connected the two internal6 interfaces and the two internal7 interfaces to each other and I have link lights (and CLI shows them all at 1000Mbps full duplex), however the HA light on both units is orange and I get the following from a "diag sys ha status":

ussv4gvlfw1-1 # diag sys ha status
HA information
Statistics
        traffic.local = s:0 p:159174 b:23507717
        traffic.total = s:0 p:159174 b:23506842
        activity.fdb = c:0 q:0

Model=60, Mode=2 Group=0 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FGT60ETK18052822: Master, serialno_prio=0, usr_priority=110, hostname=ussv4gvlfw1-1

[Kernel HA information]
vcluster 1, state=work, master_ip=169.254.0.1, master_id=0:
FGT60ETK18052822: Master, ha_prio/o_ha_prio=0/0

ussv4gvlfw1-2 # diag sys ha status
HA information
Statistics
        traffic.local = s:0 p:67 b:27520
        traffic.total = s:0 p:67 b:27520
        activity.fdb = c:0 q:0

Model=60, Mode=2 Group=0 Debug=0
nvcluster=1, ses_pickup=0, delay=0

[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FGT60ETK18054479: Master, serialno_prio=0, usr_priority=100, hostname=ussv4gvlfw1-2

[Kernel HA information]
vcluster 1, state=work, master_ip=169.254.0.1, master_id=0:
FGT60ETK18054479: Master, ha_prio/o_ha_prio=0/0

 

I thought that maybe the cluster password might have pasted incorrectly on one (or both of the units), so I did a complete factory reset of the secondary unit and reapplied the same initial config commands and I also went and manually entered the cluster password on both units to ensure they were the same.  Still no joy.

 

I've set essentially the same configuration on two other new 60E units in a different location and they seem to be working fine.  The only difference is that I didn't upgrade the software on these units and they're still running 5.6.4.  The problem I have is that I'm working from a different country so am relying on remote hands and remote console connections to make changes.  I'm not sure how to troubleshoot this further.

 

 

    Best answer by Toshi_Esumi

    Must be something to do with hearbeat connections. You're sure internal6<->internal6, and internal7<->internal7 are connected (Not 6<->7, 7<->6), right? To simplify, disconnect internal7 and use only internl6. Then check "get sys ha status", which shows more info than "diag sys ha status".

    If something is wong with those physical connections, it would show you error/warning w/ "hbdev" or something at the beginning.

    Also get in both through console (need two PCs or one PC with two USB Serial adapters) and run "diag debug app hatalk -1" on both sides. Probably you would see it's trying to communicate to the other end but can't get anything back on both ends.

    1 reply

    Toshi_Esumi
    SuperUser
    SuperUser
    November 14, 2018

    Must be something to do with hearbeat connections. You're sure internal6<->internal6, and internal7<->internal7 are connected (Not 6<->7, 7<->6), right? To simplify, disconnect internal7 and use only internl6. Then check "get sys ha status", which shows more info than "diag sys ha status".

    If something is wong with those physical connections, it would show you error/warning w/ "hbdev" or something at the beginning.

    Also get in both through console (need two PCs or one PC with two USB Serial adapters) and run "diag debug app hatalk -1" on both sides. Probably you would see it's trying to communicate to the other end but can't get anything back on both ends.

    m0j0
    m0j0Author
    New Member
    November 14, 2018

    Thanks for the response.

     

    Having never gone to the trouble to not connect HA heartbeat interfaces to the corresponding interface on the other device, I've never thought about the result of doing so and never seen the effect.  If my colleague had indeed connected 6 to 7 and 7 to 6, would that likely be the cause of my issues?

     

    The reason I ask, is that I jumped on these devices via the remote consoles I have and disabled interface7 on the primary and when I checked the secondary device, interface6 was down.  So, it would appear this is what they've done.

     

    Regards,

    Mark