DescriptionThis aticle will share common possibilities that trigger High Availability intermittence issue.
The HA design is working during the implementation based on existing units, VLAN, cabling and and so on.
However, after quite some time, the HA is not working as expected. This is a normal behavior claims by the network administrator.The normal changes that happened on network:
1) New unit or server introduced in the network.-Previously Fortigate is standalone, currently is High-Availability.-From 1 core switch to 2 core switch.-Intermediate unit is introduce (example: load balancer, proxy, traffic management and so on).2) Changes of network design/topology.- Increasing of VLAN.- Changes of routing.- LACP configuration.- Changes of cabling or ports on the Fortigate or switch.- STP re-calculation.
When changes of design happened, the traffic may not work as per first implementation anymore.
This should be expected behavior.
In certain cases, there are no changes on the network.
However, failover is not working as per expectation.
The normal behavior that happens(Or vise versa):
1) Traffic on primary unit is working as expected.
2) Failover to secondary unit.
- Many services are down
- Certain segment/interface is not working.
- Network intermittence / flapping (example: working for 10 minutes, then down for 10minutes).
From FortiGate perspective, FortiGate only process the traffic as it received.
Common issue happened due to STP (Spanning Tree Protocol) on the network level.
Frequently happening if aggregation or LACP is configured.
Basically, this issue due to the network design itself.
SolutionTroubleshooting.
If intermittence is happening, this can be check on the FortiGate as follow:
Version 6.0.Go to Log & Report -> System Events.
Version 6.2 and above.Navigate to Log & Reports -> Events -> System Events (on top right corner).Filter: Log Description : Interface status changedLook for the interface that having the problem. The interface status should showing as follow:
FortiGate only do notifications. Not an actions.
When FortiGate notice the port is down/up due to certain activity, Fortigate will generate a logs.
Activity is as follow:
1) Cable is disconnected/unplug on that port.
2) The port is shutdown/disabled on the peer devices.
The most happening scenario for HA issue is number 2 - The port is shutdown/disabled on the peer unit.
STP have a 'hold-down' timer and 're-calculation' timer to evaluate the changes on the network.
When this hold-down timer is expired, STP will refresh or recalculate the network path.
Shutdown port1, un-shutdown another port2.
Please verify the changes on the switch too.
Example scenario:
1st Fortigate unit acting as master.
2nd Fortigate unit acting as slave.
The network switch should pass all the traffic to the 1st Fortigate unit side. When intermittence happening, most probably the network is sending the traffic to the 2nd FortiGate (slave) which is not correct.
Slave unit did not process any traffic.
Refer to the 'Log & Report' as per mentioned previously.
For this kind of scenario, the issue is not on the FortiGate configuration itself.
However, this involved network solutions and integrations.
Consult respective network administrator or our professional service to assist further on this case.
Conclusion.
HA deployment require proper design in terms of physical and logical on the network level.
Related Articles
Technical Tip: High availability basic deployment design