Skip to main content
HarneferV_JICAZA
Explorer
June 2, 2026
Question

FortiGate HA-to-HA Design: Hardware Switch vs LACP Aggregate Interface

  • June 2, 2026
  • 3 replies
  • 77 views

Hello everyone,

I would like to get some feedback from the community regarding a design decision between using a Hardware Switch interface or an LACP Aggregate interface in a FortiGate HA deployment.

Scenario

I have two Active-Passive FortiGate HA clusters interconnected directly, similar to the topology below:

The objective is to maintain connectivity during a failover event on either cluster while keeping the design as simple and stable as possible.

Current Design

We are currently using a Hardware Switch interface across the participating ports. The solution has been operating correctly and failover testing has been successful.

Question

From a Fortinet best-practice perspective:

  1. Would you prefer Hardware Switch or LACP for this topology?

  2. If LACP is preferred, would you configure lacp-ha-secondary disable on both clusters?

  3. Have you experienced any MAC flapping, convergence, or failover issues when using LACP directly between HA clusters?

One of the reasons I am evaluating both options is that Fortinet documentation mentions that when lacp-ha-secondary disable is configured, the secondary unit does not participate in LACP negotiations. As a result, during a failover event the new primary must establish LACP negotiation before it can start forwarding traffic, potentially increasing convergence time.

For those who have implemented similar designs in production, have you observed any noticeable impact during failover events when using LACP compared to Hardware Switch interfaces?

I would appreciate hearing real-world experiences and design recommendations.

Thank you.

3 replies

Toshi_Esumi
SuperUser
SuperUser
June 2, 2026

Physical switch(es) in-between is always recommended. Otherwise, you would encounter some problems in HA transition. At least I wouldn’t do without it to avoid unnecessary trouble.

Toshi 

HarneferV_JICAZA
Explorer
June 2, 2026

Thanks for the feedback.

I agree that using physical switches in-between would be the preferred architecture and would simplify MAC learning, HA transitions, and LACP behavior.

In this case, however, the direct HA-to-HA interconnection is a design constraint, so I'm interested in understanding whether the community has observed any advantages or disadvantages between using a Hardware Switch interface versus an LACP aggregate in this specific scenario.

Have you seen any particular issues with either approach during failover events?

AEK
SuperUser
SuperUser
June 2, 2026

I’ve never tested it and never thought about doing it in a production env, because as far as I know this design is not officially documented & tested.

That means even if it seems fine from first look, you never know if there may be some unexpected behavior in some circumstances that will make you biting your fingers, and “probably” even Fortinet may not support you in case of issue.

In case it is due to some design constraint as you said then at least before doing it you may open a ticket to see if this design is supported by Fortinet.

AEK
christian_89_
Visitor III
June 7, 2026

The two SuperUsers are giving you the right answer and the thread is talking itself past it. Toshi's "put a switch in the middle" and AEK's "this isn't a documented/tested topology, TAC may not support you" are not hand-waving. They are the answer. Direct cluster-to-cluster bundling, whether hardware switch or LACP, is not a Fortinet-validated design, and that matters more than which of the two flavours feels cleaner.

Understand why the "constraint" is the actual problem, not a detail to engineer around. Your two clusters fail over independently. Cluster A picks its primary, cluster B picks its primary, and those decisions are uncoordinated. So the path that actually forwards is active-A-unit to active-B-unit, and which physical pair that is changes over time. With no switch between them you have two bad choices: cable only primary-to-primary and secondary-to-secondary, in which case a single-sided failover leaves the live active-to-active path not physically connected; or full-mesh all four units, in which case you have a bridged L2 loop to manage. A switch in the middle exists precisely to terminate all four units and bridge whichever two are currently active. That is the whole reason the back-to-back design isn't validated. You're not choosing between two good options, you're choosing how to paper over a missing device.

"Failover testing has been successful" is the trap, not the reassurance. A passing happy-path test on a hardware switch tells you nothing about the cases that bite in production: both clusters transitioning close together, one LAG member down at the moment of failover, or MAC flap while the virtual MACs move from one unit to the other. AEK's "seems fine at first look, then you bite your fingers" is exactly this. FGCP failover moves the same virtual MACs to a different physical port, and a bridging element with no real loop protection in the middle is where that turns into flap or a storm.

On your three questions specifically:

  1. Between the two, LACP, not Hardware Switch, if you're forced to pick without a switch. A hardware switch interface is an internal L2 bridge with no usable loop protection. Spanning it across ports to a second cluster makes the FortiGate bridge two HA domains, which is the classic recipe for L2 loops, broadcast storms and MAC flapping during transitions. LACP is at least a bounded logical link that degrades predictably. So this is "less likely to take down the segment," not "good."

  2. No, leave lacp-ha-secondary at its default (enable). Your reading of the doc is correct, and that's the argument for keeping the default, not disabling it. With it enabled the subordinate keeps its LAG members negotiated-but-standby, so the new primary forwards almost immediately on failover. disable forces the new primary to renegotiate LACP from scratch, which adds seconds and throws away the sub-second failover FGCP is supposed to give you. The only reason to disable it is a peer switch that gets confused by the subordinate's LACPDUs, and that is a single-switch-peer concern, not really your scenario. Don't disable it pre-emptively.

  3. Yes, MAC flapping and convergence are the real risks here, and they're materially worse on the hardware-switch option than on LACP. If you go ahead, test the ugly combinations explicitly, not just a clean single failover.

Bottom line: spend the money on one switch and stop fighting the topology. It is a cheap device weighed against an unsupported, edge-case-fragile core that you'll be debugging at 3am. If the constraint is genuinely immovable, use LACP with lacp-ha-secondary left at default, document that it's an unsupported design, and budget real time for convergence and flap testing before it carries production. And take AEK's advice literally: open a TAC ticket and get the "is this supported" answer in writing before you commit, because the answer will almost certainly be no.