The purpose of this document is to describe how to configure the FortiGate HA cluster and the OSPF settings (using the graceful-restart feature) so that each router (the FortiGate and its peer(s)) will keep the OSPF routes in their routing table(s) to avoid traffic interruption during an HA failover.
- All FortiOS.
- FortiGate running in NAT and HA mode.
HA cluster with one or more OSPF peers will failover without traffic interruption.
On a FortiGate HA cluster, the OSPF router daemon process is only running on the Primary (Master) unit. When there is an HA failover a new OSPF process will be launched on the newly elected master.
Even though the FortiGate has all the routes, if the peer sees the FortiGate as unresponsive, it will remove all the route from its routing table and traffic will be interrupted :
Therefore what needs to be done to avoid traffic interruption can be divided in three parts, which are detailed later :
1) Check that remote peer will not delete the routes.
2) Check that the FortiGate cluster will keep the OSPF routes in the kernel ('# get router info kernel' command).
3) Fine tune timers.
This can be achieved with OSPF graceful restart. 'Graceful Restart' is a OSPF capability. It is an Internet standard defined in RFC 4724. This capability needs to be configured on both peers.
# config router ospf
2) Check that the FortiGate cluster will keep the OSPF routes in the kernel (# get router info kernel' command).When the FortiGate is configured in an HA cluster, all the routes will be synchronized to the slave units.
The synchronized routes on the slave will have a limited lifetime and a lower priority.
The lifetime of these routes can be configured through the 'route-ttl', 'route-wait' and 'route-hold' parameter in system ha configuration.
# config system ha
Controls how long HA routes are kept in the FIB of a clustr unit after it has been promoted Master
The route-ttl range is 5 to 3600 seconds. The default route-ttl time is 10 seconds.
The time the primary unit waits after receiving routing table update before sending the update to the subordinate units in the cluster.
The route-wait range is 0 to 3600 seconds. The default route-wait is 0 seconds.
The time that the primary unit waits between sending routing table updates to subordinate units in a cluster.
The route hold range is 0 to 3600 seconds. The default route hold time is 10 seconds.
3) Fine tuning timersThere are other main timers that can be tuned :
restart-period(default 120) :
Time needed for neighbours to restart(sec)
This is the number of seconds to wait for the HELLO Message before removing the routes.
restart-period should be less or equal to the route-ttl
Consider tuning these counters with the two following criteria :
- Time you want to detect a real OSPF peer failure
- Maximum time allowed for a restart time
When graceful-restart is enabled it will delay the time at which a real network/peer failure will be detected, and as a consequence this will end up in a down time that can be as long as the route-ttl
Therefore it is important that those timers be configured to a value that suits to the network requirements.
In output of the CLI commands :
FGT# get router info ospf status
Check the graceful restart capabilities :
Routing Process "ospf 0" with ID 18.104.22.168
Check timers in the CLI command :
FGT# get router info ospf neighbor a.b.c.d
OSPF process 0:
Check other OSPF related information with :
FGT# get router info ospf neighbor
FGT# get router info ospf status
FGT# get router info ospf route
FGT# get router info routing-table database
More in deep:
# diag ip router ospf level info
# diag ip router ospf all enable