Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
Not applicable

FG Session Failover not working

Hello everyone, I' ve been working on setting up my FG 1000A Active-Passive cluster for the past 2 weeks. The cluster works fine as it is failing fine on the slave but Session Pick-Up during Failover does not work completely. A ticket was opened with support but they are unable to reproduce the problem so I give a shot to the Forums. The Setup: - 2x FG1000A connected to a Dell PowerConnect 6224 Switch (Internal and External ports are connected to it). The switch is configured as a layer-2 device although it is a Layer-3 capable device. - Running 3.00 MR4-Patch2 - FG1 has higher priority (250) so it becomes Master when it is in the cluster - FG2 has default priority (128) so it becomes Slave when it is in the cluster where a Master is available. - Session pick-up is configured. - Port4 is Heartbeat and connected with a cross-over cable - Port6 is External network (Internet) and monitored by HA - Port7 is Internal network and monitored by HA - Port8 is used for management on a different subnet - Web server running IIS connected to Internal network is providing a link to a web page to download a big file (600MB) - Firewall policy configured with a VIP to permit HTTP access to the Web server. The problem: 1. When we start the download of the file through the Web server and disconnect the cable from Port6 of FG1 (acting as Master), it takes 6 seconds for the failover to occur and the session is not picked up by FG2 (acting as Slave). We lose around 12 PINGs to the cluster. So the download stops. We can reconnect and restart the download but this is not Session Failover. 2. Now the interesting part. If we restart the download of the file and we reconnect the cable from Port6 of FG1 (being Slave), the cluster reconfigures itself to have FG1 come back as Master and FG2 as Slave. There' s no PING lost and the download continues... We' ve done: - Full factory reset for both units and rebuilt the config from scratch. - reproduce the problem with 2 different brand of switches - Change the ports configured on the FG units (i.e. External from Port1 to Port6, etc) - Were at MR5 and tried MR4-Patch2 and Factory Reset the units again - I confirm the MAC Address Table is updated properly on the switches - I confirm that both units have a synced configuration. Has anybody noticed anything similar? Regards, Sylvain
2 REPLIES 2
red_adair
New Contributor III

The failover time seems ok to me. We also experience a " stop" for about 4-6 seconds. But than tcp sessions will continue. I know that " AV scanned sessions" are not replicated to the AH slave. So if you enable a protection profile for that FTP to scan for AV it likely won' t work. It' s only IPSec-VPN and " regular" tcp traffic that is mirrored to the A-P slave. For me it seems that also in MR5 they did some changes in regards to HA. I found out in my lab that the slave doesn' t need to restart when forming a cluster. So your findings may also be related to a specifiy OS Version. -R. ---------quote------------------------ The problem: 1. When we start the download of the file through the Web server and disconnect the cable from Port6 of FG1 (acting as Master), it takes 6 seconds for the failover to occur and the session is not picked up by FG2 (acting as Slave). We lose around 12 PINGs to the cluster. So the download stops. We can reconnect and restart the download but this is not Session Failover. 2. Now the interesting part. If we restart the download of the file and we reconnect the cable from Port6 of FG1 (being Slave), the cluster reconfigures itself to have FG1 come back as Master and FG2 as Slave. There' s no PING lost and the download continues...
Not applicable

Hi Red, I finally have the details and hope it may help someone in the future. Keith Li from Tech Support finally found the solution which did not come from Fortigate. In fact, my Windows server has a registry key configured Not to restransmit TCP packets after 3 attempts. Since the Fortigate failover takes about 6 seconds when it detects a Link failure, the server stopped processing TCP Retransmits so the download failed. This registry key is used to prevent Denial of Service on Windows Hosts. All our Windows servers have this value set to 3 although the default value is 5 attempts. When we put back the Default value, the failover works properly. This value is changed when you use Microsoft High Security Templates to secure your Windows hosts which we do. For the explanation as to why the fail-back from the Slave to original Master does not cause the problem is a difference in HA code. Fail-back usually takes a shorter time than failover. Because during fail-back, your original master unit didn' t abruptly stop passing traffic. So the process would occur a lot more smoothly than a failover. I hope those insights will help someone in the future. Thanks to Keith Li from Fortigate Tech Support as well for his assistance.
Announcements

Select Forum Responses to become Knowledge Articles!

Select the “Nominate to Knowledge Base” button to recommend a forum post to become a knowledge article.

Labels
Top Kudoed Authors