Description
Scope
Solution
This article describes an issue that can occur on SLBC clusters in HA mode and explains how to troubleshoot and solve it.
If FortiController with chassis ID 1 and FortiController with chassis ID 2 are exchanged (FortiController 2 in chassis 1 - FortiController 1 in chassis 2), the communication between the FortiGate and the FortiController will not work correctly until the FortiGate blades are rebooted.
Exchanging FortiController blades is not an action performed under normal operation but it might be done for troubleshooting purposes.
If FortiController with chassis ID 1 and FortiController with chassis ID 2 are exchanged (FortiController 2 in chassis 1 - FortiController 1 in chassis 2), the communication between the FortiGate and the FortiController will not work correctly until the FortiGate blades are rebooted.
Exchanging FortiController blades is not an action performed under normal operation but it might be done for troubleshooting purposes.
Scope
SLBC cluster in HA mode.
2 chassis with at least 1 FortiGate blade in each.
Solution
One symptom seen on the FortiController with command 'diag sys ha status' reports "worker_failure=1/1" or "worker_failure=2/2". This means that the FortiController cannot communicate correctly with the FortiGate blades.
Another report is the load balance status that reports "waiting for data heartbeat":
For communication between the FortiController and the FortiGate an internal elbc-base-ctrl IP address is used. This address is assigned by the FortiController:
IP 10.147.xxx.3 is assigned to FortiGate in slot 3 by FortiController with chassis ID 1
IP 10.147.xxx.19 is assigned to FortiGate in slot 3 by FortiController with chassis ID 2
The last digit corresponds to the slot number as seen with command 'diag test application chlbd 1':
To check which address is actually assigned to a FortiGate:
If the FortiController has been exchanged between the two chassis then the FortiGate will have two elbc-base-ctrl IP addresses, one for chassis ID 1 and the second for chassis ID2:
These duplicate elbc-base-ctrl IP addresses will prevent normal SLBC cluster operation.
To clear this situation it is necessary to reboot the FortiGate blades in the master and slave chassis.
Summary of command used
FortiController:
FortiGate:
FTCtrl-1# diag sys ha status
mode: a-p
minimize chassis failover: 1
FTCtrl-1(FT503Cxxxxxxxx22), Slave(priority=1), ip=172.254.128.10, uptime=76.52, chassis=2(1)
slot: 1
sync: conf_sync=1, elbc_sync=0
session: total=0, session_sync=out of sync
state: gateway_die=0, worker_failure=1/1, lag=(total/good/down/bad-score)=2/2/0/0,
intf_state=(port up)=0, force-state(0:none)
hbdevs: local_interface= b1 best=yes
local_interface= b2 best=no
FTCtrl-2(FT503Cxxxxxxxx33), Master(priority=0), ip=172.254.128.9, uptime=407781.23, chassis=1(1)
slot: 1
sync: conf_sync=1, elbc_sync=1, conn=3(connected)
session: total=2034, session_sync=in sync
state: gateway_die=0, worker_failure=0/1, lag=(total/good/down/bad-score)=2/2/0/0,
intf_state=(port up)=0, force-state(0:none)
hbdevs: local_interface= b1 last_hb_time= 188.09 status=alive
local_interface= b2 last_hb_time= 188.09 status=alive
Another report is the load balance status that reports "waiting for data heartbeat":
FTCtrl-1# get load-balance status
ELBC Master Blade: N/A
Confsync Master Blade: N/A
Blades:
Working: 0 [ 0 Active 0 Standby]
Ready: 0 [ 0 Active 0 Standby]
Dead: 1 [ 1 Active 0 Standby]
Total: 1 [ 1 Active 0 Standby]
Slot 3: Status:Dead Function:Active
Link: Base: Up Fabric: Up
Heartbeat: Management: Good Data: Failed
Status Message:"Waiting for data heartbeat."
For communication between the FortiController and the FortiGate an internal elbc-base-ctrl IP address is used. This address is assigned by the FortiController:
IP 10.147.xxx.3 is assigned to FortiGate in slot 3 by FortiController with chassis ID 1
IP 10.147.xxx.19 is assigned to FortiGate in slot 3 by FortiController with chassis ID 2
The last digit corresponds to the slot number as seen with command 'diag test application chlbd 1':
FGT-1 (global) # diag test application chlbd 1
my service group id=1
my chassis=2
active channel=1
best active channel=1
master chassis=no
Other chassis is master=yes
my slot=19
master slot=3
other chassis master slot=3
chassis master slot=19
active slot mask=00080008(1.3,2.3)
chassis active slot mask=00080000(2.3)
update_timer is running
last_rx of update msg is 40 ago
To check which address is actually assigned to a FortiGate:
FGT-1 (elbc-mgmt) # diag ip add list | grep 10.147
IP=10.147.187.19->10.147.187.19/255.255.255.0 index=94 devname=elbc-base-ctrl
If the FortiController has been exchanged between the two chassis then the FortiGate will have two elbc-base-ctrl IP addresses, one for chassis ID 1 and the second for chassis ID2:
FGT-1 (elbc-mgmt) # diag ip add list | grep 10.147
IP=10.147.187.19->10.147.187.19/255.255.255.0 index=94 devname=elbc-base-ctrl
IP=10.147.187.3->10.147.187.3/255.255.255.0 index=94 devname=elbc-base-ctrl
These duplicate elbc-base-ctrl IP addresses will prevent normal SLBC cluster operation.
To clear this situation it is necessary to reboot the FortiGate blades in the master and slave chassis.
Summary of command used
FortiController:
diag sys ha status
get load-balance status
FortiGate:
config vdom
edit elbc-mgmt
diag ip add list | grep 10.147
end
config global
diag test application chlbd 1