HTTP request timeouts when going through Virtual IP (NAT Reflection, NAT Hairpin)
I've got a really strange issue that we've spent a week on and haven't been able to get anywhere.
Here are the specs:
FortiGate 600C running 5.2.2 in a HA Active-Active
Connected to Cisco 3560X switches with LACP aggregate interfaces
We recently switched from Watchguard to Fortigate firewalls in our web environment. In our web stack, we use NAT Reflection (or NAT Hairpin) to simplify DNS management. So, internal servers (CentOS) call out to to external VIP addresses that get NAT'd back into servers on the same subnet. I know that this isn't a great thing to do, but it's worked for years and we're working to change this soon.
As soon as we switched over to the Fortigate, we started getting timeouts with requests that follow that path (App Server>VIP>Web Server). What we are seeing from the App server side is that it sends a SYN, but never gets a SYN-ACK back. The Web Server never receives the packet either. I've done a trace on the Fortigate and I'm pretty sure that the Fortigate does receive the packet, but I'm unable to tell in the trace if it's actually responding correctly. The packet seems to get lost somewhere between the app servers and Web server. The issue is random and it does not seem to increase or decrease based on load. It might happen once every hundred requests or so. When it happens, the App server will eventually retry the request and it sometimes hangs again, but it will eventually go through - sometimes up to 90 seconds later.
Here is another thing that we've noticed. We've started to see Output Drops on the switch interfaces that connect to the Fortigate. As far as we know, this was not happening before, but we did not monitor it before so hard to know. We've changes cables to make sure it's not bad cables. We've also swapped to the secondary switch and see the same thing. One other thing to note is that this does not affect traffic coming into the Web servers from external. It only affects traffic that take the NAT hairpin loop. I have suspicions that it's one of two things. 1) The Fortigate is performing the NAT and then somehow losing the VLAN tag when it puts it back out on the network. This might explain why the switch is dropping the packet if it didn't have a VLAN tag and didn't know what to do with it. 2) It might be some type of MTU issue. Our switches and firewalls are configured with the default MTU of 1500.
We've done so many things at this point that we're just about out of ideas. Here is a list of things we've tried. Some of these are based from findings from these forums.
1. Reboot firewalls
2. Shutdown secondary firewall so it's running in standalone
3. Run on secondary firewall only
4. Moved policy to top of list
5. Disable vlanforward on aggregate interface and vlan interface
6. Swapped cables between firewalls and switches
7. Enabled send-deny-packet on specific policy
8. Set tcp-mss-sender and tcp-mss-receiver to 1380 on specific policy
9. Set tcp-mss to 1380 on vlan interface and aggregate interface
I've tried to include every config I can think of below. I would appreciate help if anyone can think of anything.
config system interface edit "aggr.webprod.in" set vdom "webprod" set type aggregate set tcp-mss 1380 set member "port17" "port18" set snmp-index 71 next config system interface edit "vlan.webprod.in" set vdom "webprod" set ip 172.XXX.XXX.XXX 255.255.0.0 set allowaccess ping set tcp-mss 1380 set snmp-index 74 set secondary-IP enable set interface "aggr.webprod.in" set vlanid 55 config secondaryip edit 1 set ip 172.XXX.XXX.XXX 255.255.0.0 set allowaccess ping next edit 2 set ip 172.XXX.XXX.XXX 255.255.0.0 set allowaccess ping next end nextend config firewall policy edit 58 set srcintf "zone.webint" set dstintf "zone.webint" set srcaddr "all" set dstaddr "vip.http.aaa" "vip.http.bbb "vip.http.ccc" "vip.https.aaa" "vip.https.bbb" "vip.https.ccc" set action accept set schedule "always" set service "HTTP" "HTTPS" "DNS" set logtraffic all set match-vip enable set tcp-mss-sender 1360 set tcp-mss-receiver 1360 set timeout-send-rst enable set nat enable nextend config firewall vip edit "vip.http.aaa" set extip 67.XXX.XXX.21-67.XXX.XXX.23 set extintf "any" set portforward enable set mappedip "172.XXX.XXX.1-172.XXX.XXX.3" set extport 80 set mappedport 80 next edit "vip.http.bbb" set extip 67.XXX.XXX.25-67.XXX.XXX.27 set extintf "any" set portforward enable set mappedip "172.XXX.XXX.1-172.XXX.XXX.3" set extport 80 set mappedport 80 next edit "vip.http.ccc" set extip 67.XXX.XXX.28-67.XXX.XXX.30 set extintf "any" set portforward enable set mappedip "172.XXX.XXX.1-172.XXX.XXX.3" set extport 80 set mappedport 80 next edit "vip.https.aaa" set extip 67.XXX.XXX.21-67.XXX.XXX.23 set extintf "any" set portforward enable set mappedip "172.XXX.XXX.1-172.XXX.XXX.3" set extport 443 set mappedport 443 next edit "vip.https.bbb" set extip 67.XXX.XXX.25-67.XXX.XXX.27 set extintf "any" set portforward enable set mappedip "172.XXX.XXX.1-172.XXX.XXX.3" set extport 443 set mappedport 443 next edit "vip.https.ccc" set extip 67.XXX.XXX.25-67.XXX.XXX.27 set extintf "any" set portforward enable set mappedip "172.XXX.XXX.1-172.XXX.XXX.3" set extport 443 set mappedport 443 nextend #############Cisco Configuration############# interface Port-channel10 description ptn-fw101 webprod-int portchannel switchport trunk encapsulation dot1q switchport trunk allowed vlan 50,51,55 switchport mode trunkend interface GigabitEthernet0/17 description fw101 port 17 webprod-int switchport trunk encapsulation dot1q switchport trunk allowed vlan 50,51,55 switchport mode trunk logging event bundle-status logging event spanning-tree spanning-tree portfast trunk spanning-tree bpdufilter enable channel-group 10 mode activeendinterface GigabitEthernet0/18 description fw101 port 17 webprod-int switchport trunk encapsulation dot1q switchport trunk allowed vlan 50,51,55 switchport mode trunk logging event bundle-status logging event spanning-tree spanning-tree portfast trunk spanning-tree bpdufilter enable channel-group 10 mode activeend
