FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
Xav_FTNT
Staff
Staff
Article Id 196150

Description


This article describes the configuration that needs to be applied to a FortiGate HA cluster and the BGP settings so that each router (the FortiGate and its peer(s)) will keep the BGP routes in their routing table(s) to avoid traffic interruption during an HA failover.

Scope

 

FortiGate running in NAT and HA mode.

 

Solution


Expectations, Requirements.


HA cluster with one or more BGP peers will failover without traffic interruption.

Configuration:

On a FortiGate HA cluster, the BGP router daemon process is only running on the Primary unit. When there is an HA failover a new BGP process will be launched on the newly elected primary.

Even though the FortiGate has all the routes, if the peer sees the FortiGate as unresponsive, it will remove all the routes from its routing table, and traffic will be interrupted :

Therefore, what needs to be done to avoid traffic interruption  can be divided into three parts, which are detailed later :

  •  Check that the remote peer will not delete the routes.
  •  Check that the FortiGate cluster will keep the BGP routes in the routing table.
  •  Fine-tune timers.

 

  1. Check that the remote peer will not delete the routes. This can be achieved with BGP graceful restart. 'Graceful Restart' is a BGP capability. It is an Internet standard defined in RFC 4724. This capability needs to be configured on both peers.


On FortiGate, Graceful Restart can be enabled at both the:

  • Global configuration level.
  • Peer level.

 

Note:

Enabling Graceful Restart is service-impacting. Enabling it at the global configuration level will cause all BGP peers to flap, and enabling it at the peer level will cause the single BGP peer to flap.

 

Configuration snapshot:

config router bgp
    set as 65111
    set graceful-restart enable
        config neighbor
            edit "10.2.3.4"
                set capability-graceful-restart enable

                set bfd enable        <----- BFD options will help to trigger it with milliseconds for OSPF or BGP connections: Use BFD with caution when the peer devices are not Fortinet devices. Some third-party platforms (such as Cisco IOS devices) may have suboptimal routing performances when graceful-restart and BFD are both configured.

                           set remote-as 65000
                set weight 20
            next
        end
        config network
            edit 1
                set prefix 172.31.0.0 255.255.0.0
            next
        end
end

 

  1. Check that the FortiGate cluster will keep the BGP route in the FIB table. When the FortiGate is configured in an HA cluster, all the routes will be synchronized to the secondary devices. The synchronized routes on the secondary will have a limited lifetime and a lower priority. The lifetime of these routes can be configured through the 'route-ttl' parameter in the system HA configuration:

 

config system ha
    set route-ttl 30
end

 

After the route times out, the new master will need to learn the route again, through route convergence. The default value is 10.

 

  1. Fine-tuning timers. The following are the main timers that can be tuned:
  • holdtime-timer (default 180): Number of seconds to mark the peer as dead.
    This is the number of seconds to wait between keepalive, update, or notification messages before considering the connection to the peer as closed.
  • graceful-restart-time(default 120): Time needed for neighbors to restart (sec).
    This is the number of seconds to wait for the OPEN message before removing the stale routes
    graceful-restart-time should be less or equal to the holdtime-timer.
  • graceful-update-delay (default 120)Route advertisement/selection delay after restart. After an HA failover, the route populated on the new primary would be delayed based on the timer applied against this setting.
  • graceful-stalepath-time(default 360): Time to hold stale paths of restarting neighbor (sec). The total maximum time that a stale route should be kept before being deleted.

 

CLI Syntax:

 

config router bgp

    set graceful-restart enable

    set graceful-restart-time <integer value> --> graceful-restart-time, Enter an integer value from <1> to <3600> (default = <120>).

    set graceful-stalepath-time <integer value> --> graceful-stalepath-time, Enter an integer value from <1> to <3600> (default = <360>).

    set holdtime-timer <integer value> --> holdtime-timer, Enter an integer value from <3> to <65535> or (special = <0>) (default = <180>).

    set graceful-update-delay <integer value> --> graceful-update-delay, Enter an integer value from <1> to <3600> (default = <120>).

end


Consider tuning  these counters with the following two criteria:

  • Time to detect a real BGP peer failure.
  • Maximum time allowed for a restart.

 

Note:

When graceful-restart is enabled, it will delay the time at which a real network/peer failure will be detected, and as a consequence, this will end up in a downtime that can be as long as the graceful-restart-time.


Therefore, it is important that those timers be configured to a value that suits the network requirements. Also, do not expect that after the failover is finished, BGP peering will continue to work with uptime as on the previous primary device. BGP will be re-established, so the BGP 'flap' will be visible. But this is expected behavior.

Verification.

In the output of  the CLI commands:

    FGT # get router info bgp neighbor a.b.c.d

Check the graceful restart capabilities :

For address family: IPv4 Unicast:


  BGP table version 1, neighbor version 0
  Index 1, Offset 0, Mask 0x2
  AF-dependant capabilities:
    Graceful restart: advertised, received


Check timers in the CLI command:

    FGT # get router info bgp neighbor a.b.c.d 

 

Routes in the FIB can be validated by using the following command:

 

diagnose ip route list

 

Or:

 

get router info kernel


Troubleshooting:

A packet capture taken with the BGP peer IP will be helpful.
Check other BGP-related information with:

    FGT # get router info bgp neighbor
    FGT # get router info bgp summary
    FGT # get router info bgp network
    FGT # get router info routing-table database

 

If a failover and subsequent failback occur within the time defined by the route-ttl parameter (configured via set route-ttl <seconds>), the event is classified as a route flap.

 

When testing failover, use 'diagnose sys ha reset-uptime' on the primary or adjust HA priority to simulate the normal HA Primary election process. See Technical Tip: Different options to trigger an HA Failover. Do not use 'exec ha failover set'.

 

In the case of a route flap:

  • The affected routes are flushed from the routing table.
  • Associated sessions are removed after the system processes five packets following the route change.