Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
mkintexas
New Contributor

Slow site to site VPN Performance

Folks,

 

Recently my company decided to save money by transitioning away from MPLS and metro ethernet based connectivity to Internet based site to site VPN's.    For our stores we are installing Time Warner and Comcast business class Internet.  Generally either 100/10 or 100/20 with one location being on a Comcast fiber based Internet circuit that is 30/30.

 

So far our experience has not been all that great.  Our data center currently has a 100/100 fiber based Internet connection (1g to be installed next week).  Our 100mb is not oversubscribed at this point.  Whenever I try to do a windows based drag and drop from the data center to the store on average I get from 1.5-2.5MB on the copy.  so basically 12-20 megabit despite the fact that my store has a 100mb download pipe.  If I try to use FTP over the VPN I get the same speed.  However if I take the same server at the DC and do a 1 to 1 NAT and then FTP to it from the same store over the Internet and not through the VPN I see close to the 100mb speed that we are subscribed to.  Interstingly whenever I copy from the store to the DC I almost always get the full 20mb upload speed.   Finally at our 30/30 store I get all 30mb both directions.  

 

My data center has a 500D and all of my stores have a 140D.  So I would think there is enough horsepower to be able to handle the occasionally large file copy.  We don't generally move a lot of data over our VPN's.  Mainly web based applications with some videos.  However when we need it, it would be nice to have a nice file copy speed.  I understand there is some overhead on VPN's but not to this degree.   I have already tried various MTU sizes on WAN interfaces at both the DC and my lab store.

 

At this point I am stumped.  Why is my VPN running so slowly?  Is it possible that TW and/or comcast throttles UDP 500/4500 or the ESP protocol?   At this point I along with our CIO is ready to abort this project and go with Fiber in all 90 locations.  But I am not quite ready to give up.  

 

Any help would be appreciated.

 

Mark

40 REPLIES 40
mkintexas

One thing I forgot.  Your Comcast router.  Is your bridge mode enabled or disabled?

mkintexas

Folks,

 

I thought I would drop in and update considering we found out a significant piece of information last night.  It would appear that version 5.4.0 and 5.4.1 uses some sort of round robin logic in order to handle data packets when handing off to CP8 chips (which is what my 140d has).   This, in some cases, causes IPSEC packets to be sent/received out of sequence GREATLY reducing performance.  The work around is to go into config system global and issue set ipsec-asic-offload disable.  Then you reboot or take down your tunnels and bring them back up.   This seems to have made a noticeable difference.  One site that was transferring at 8-12mb is now transferring at 90mb over the VPN.   This didn't speed up ALL my sites, but a good chunk of them.  Oddly I think it is a combo of this and comcast cable modems not able to handle a high stream of out of sequence protocol 50 packets.   Supposedly this mistake will be fixed in 5.4.2

nothingel
New Contributor III

Thanks for the update, your discovery is very interesting!

 

I don't think the comcast modems are at fault (at least not totally) because I can pass protocol 50 faster when the connection is comcast cable to comcast cable than when it's comcast fiber to their cable.  At the moment, I am still forcing NAT-T by bouncing all affected IPsec tunnels through a linux box performing needless NAT.  With this greater inefficiency, I obtain nearly flawless performance across the tunnels than when going direct via protocol 50.  I'm not sharing anything new here except I'm using linux now instead of the comcast modems for NAT.

 

I also realized something else -- it's important for the Fortinet's interface speeds to be matched for the ingress and egress traffic.  For example, if your internal traffic is on port#1 at 1Gbps and the port to the Internet router/modem is on port #2 at 100Mbps, traffic speeds (egress to the Internet) can be quite unexpected (very noticeable when testing via iperf).  Changing port #2 to also be 1GBps can significantly improve performance.  In case anyone is wondering, duplex has always been correct.

 

 

emnoc
Esteemed Contributor III

I have to agreed with Nothingel  my same comcast business  line handle ipsec to a host or other appliances with zero issues.

 

 

This, in some cases, causes IPSEC packets to be sent/received out of sequence GREATLY reducing performance.

 

I bet whats really happening that causes the  sequence numbers to be out of order is the fact that the  packets ESP are routed via load-share or ecmp links where one packet cause the packest to get mangled out of order. So seq#s are not 1  2 3 4 5 6 7 ..... but more like 1 3 2 4 5 6 8 7 9 11 10......

 

 

If you suspect the cp8 than place 2x FGT140 back-2-back and run a ispec tunnel over this? Do you see poor performance? if no, than it's not the cp8

 

You can do the following to monitor if you make changes

 

 

diag vpn ipsec status ( ensure the cp8 is handling the traffic )

change the proposal ciphers  that's more geared to a cp8

diag hardware deviceinfo  cp8 brief to validate  en/dec and queues 7& overflows

ensure you don't use IPS

 

My own testing seems to indicate 3des is better in performance  than AES and with dhgrp 5 or less YMMV

 

 

 

 

 

 

 

 

PCNSE 

NSE 

StrongSwan  

PCNSE NSE StrongSwan
recha
New Contributor III

mkintexas wrote:

Folks,

 

I thought I would drop in and update considering we found out a significant piece of information last night.  It would appear that version 5.4.0 and 5.4.1 uses some sort of round robin logic in order to handle data packets when handing off to CP8 chips (which is what my 140d has).   This, in some cases, causes IPSEC packets to be sent/received out of sequence GREATLY reducing performance.  The work around is to go into config system global and issue set ipsec-asic-offload disable.  Then you reboot or take down your tunnels and bring them back up.   This seems to have made a noticeable difference.  One site that was transferring at 8-12mb is now transferring at 90mb over the VPN.   This didn't speed up ALL my sites, but a good chunk of them.  Oddly I think it is a combo of this and comcast cable modems not able to handle a high stream of out of sequence protocol 50 packets.   Supposedly this mistake will be fixed in 5.4.2

Does anyone in this case tried to disable ipsec asic offload ? ^^

I have the same issue (poor ipsec performance between fiber 100/100 and fiber 1000/250, 1,5Mb/s max... out of sequence) but my appliance has 5.0.10 and i don't have this command available...

I will upgrade this HA if i had to ;)

mkintexas

I thought I would drop a note on this thread as a final update.  My company decided to discontinue the use of COAX based circuits and went all fiber.  However, occasionally we will open a new site and need to use coax due to its faster install time.  What I found was that if I fronted my two internet circuits with a pair of cisco routers and formed a GRE tunnel using the public IP's and then hid the IPSEC traffic in that GRE tunnel that I was able to get near line speed VPN on the COAX circuit.  When I tried IPSEC over the coax line without hiding it in GRE I got bad performance numbers on several of my circuits.  I think maybe Comcast has some modems out there that can't handle a large amount of ESP traffic.  Encapsulate it in GRE and the modem's didn't flake out on me.

ede_pfau
Esteemed Contributor III

Thanks for sharing. Might be a MTU issue as well, don't you think?


Ede

"Kernel panic: Aiee, killing interrupt handler!"
Ede"Kernel panic: Aiee, killing interrupt handler!"
mkintexas

Did quite a bit of experimenting with mtu and tcp_mss to no avail. The thing that pointed me to hiding traffic in a GRE was I noticed my wan1 link would bounce under load. That was connected to the Comcast router. Once I put a router between the fgt and modem and set up the GRE it no longer bounced wan1 under load (nor the Comcast interface). Once we went fiber across the board all issues went away except for this new 140E issue. That’s a head scratcher.
nothingel
New Contributor III

I thought I'd come back to this thread and update at least one detail.  From what I can tell, the issue with comcast cable is with ESP on their modems, the SMC-based ones.  Although I could not find a CPU load indicator, I noticed that the web GUI was extremely sluggish when it was pushing high volume ESP.  However, the sluggishness disappeared when switching to UDP encapsulation mode.  Admittedly it's been awhile since I was knee-deep in all of this but we now have a few newer comcast business modems (the physically large models with built-in Wifi) so I might have the opportunity to re-test again in the future and see if the results are the same.

emnoc
Esteemed Contributor III

I am afraid I don't know exactly how to see if ESP is being wrapped in UDP packets. Is there a diag command or something that will show that?

 

ESP is it's own protocol , like tcp,udp,gre,icmp  ....and is not encapsulated into another protocol. IKSAKMP uses UDP.

 

wiki or google ESP and protocol #50  or protocol #51 ( AH )

 

PCNSE 

NSE 

StrongSwan  

PCNSE NSE StrongSwan
Labels
Top Kudoed Authors