Support Forum
The Forums are a place to find answers on a range of Fortinet products from peers and product experts.
Alex_talmage
New Contributor

Packet loss over site to site IPSEC VPN tunnel causing poor Cisco Telepresence quality

Hi All,

I've got a weird issue that I've been banging my head on a break wall over for the past few weeks. Bit of background first:

[ul]
  • We have 2 sites, 1 in UK, 1 in US.
  • Each site has a 500Mbps leased line Internet connection.
  • Sites are connected via IPSEC VPN using Fortigate 800D A/P clusters running 5.4.4.
  • Among everyday file sharing and web app traffic, we run point to point Cisco Telepresence video calls over this tunnel.[/ul]

    Recently, the Cisco ix5000 telepresence devices at both end have been reporting packet loss. The web interface for the ix5000 only reports RX packet loss, and the values are usually as follows:

     

    UK RX packet Loss: 0.05%

    US RX Packet Loss: 1.5%

     

    Cisco's packet loss threshold is 0.05%, so we are seeing pretty poor quality, artificating and stuttering on the US end, but it seems fine on the UK end.

     

    I've been trying to get to the bottom of this strange packet loss, and why it is worse one way. We've replaced all ethernet cables, and I've checked all interfaces along the route to ensure we don't have a speed/duplex mismatch, or any switch ports or interfaces are reporting errors or collisions - all looks good so I don't think this is a physical issue.

     

    I've run iperf across the IPSEC tunnel to further troubleshoot and here are my results:

     

    Iperf with UK as client and US as server using UDP (18Mbps bandwidth tested as this is predicted telepresence requirement):

    iperf3.exe -c 172.16.0.10 -u -b 18M

    Connecting to host 172.16.0.10, port 5201

    [  4] local 10.158.6.40 port 64279 connected to 172.16.0.10 port 5201

    [ ID] Interval           Transfer     Bandwidth       Total Datagrams

    [  4]   0.00-1.01   sec  1.95 MBytes  16.2 Mbits/sec  250

    [  4]   1.01-2.01   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   2.01-3.01   sec  2.14 MBytes  18.0 Mbits/sec  274

    [  4]   3.01-4.02   sec  2.15 MBytes  17.8 Mbits/sec  275

    [  4]   4.02-5.01   sec  2.14 MBytes  18.2 Mbits/sec  274

    [  4]   5.01-6.01   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   6.01-7.01   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   7.01-8.01   sec  2.14 MBytes  18.0 Mbits/sec  274

    [  4]   8.01-9.01   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   9.01-10.01  sec  2.15 MBytes  18.0 Mbits/sec  275

    - - - - - - - - - - - - - - - - - - - - - - - - -

    [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datag

    rams

    [  4]   0.00-10.01  sec  21.3 MBytes  17.8 Mbits/sec  0.124 ms  2001/2703 (74%)

     

    [  4] Sent 2703 datagrams

     

    Iperf with US as client and UK as server using UDP (18Mbps bandwidth tested as this is predicted telepresence requirement):

    iperf3.exe -c 10.158.6.40 -u -b 18M

    Connecting to host 10.158.6.40, port 5201

    [  4] local 172.16.0.10 port 49868 connected to 10.158.6.40 port 5201

    [ ID] Interval           Transfer     Bandwidth       Total Datagrams

    [  4]   0.00-1.01   sec  1.95 MBytes  16.2 Mbits/sec  250

    [  4]   1.01-2.01   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   2.01-3.00   sec  2.33 MBytes  19.8 Mbits/sec  298

    [  4]   3.00-4.00   sec  1.97 MBytes  16.5 Mbits/sec  252

    [  4]   4.00-5.00   sec  2.14 MBytes  18.0 Mbits/sec  274

    [  4]   5.00-6.00   sec  2.16 MBytes  18.1 Mbits/sec  276

    [  4]   6.00-7.00   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   7.00-8.00   sec  2.14 MBytes  18.0 Mbits/sec  274

    [  4]   8.00-9.00   sec  2.15 MBytes  18.0 Mbits/sec  275

    [  4]   9.00-10.00  sec  2.15 MBytes  18.0 Mbits/sec  275

    - - - - - - - - - - - - - - - - - - - - - - - - -

    [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datag

    rams

    [  4]   0.00-10.00  sec  21.3 MBytes  17.8 Mbits/sec  0.102 ms  1070/2711 (39%)

     

    [  4] Sent 2711 datagrams

     

    Iperf results with UK as client and US as server using TCP (unlimited bandwidth):

    iperf3.exe -c 172.16.0.10

    Connecting to host 172.16.0.10, port 5201

    [  4] local 10.158.6.40 port 27775 connected to 172.16.0.10 port 5201

    [ ID] Interval           Transfer     Bandwidth

    [  4]   0.00-1.02   sec   384 KBytes  3.10 Mbits/sec

    [  4]   1.02-2.02   sec   128 KBytes  1.05 Mbits/sec

    [  4]   2.02-3.02   sec   128 KBytes  1.05 Mbits/sec

    [  4]   3.02-4.02   sec   128 KBytes  1.05 Mbits/sec

    [  4]   4.02-5.02   sec   128 KBytes  1.05 Mbits/sec

    [  4]   5.02-6.02   sec   128 KBytes  1.05 Mbits/sec

    [  4]   6.02-7.02   sec   128 KBytes  1.05 Mbits/sec

    [  4]   7.02-8.02   sec   256 KBytes  2.10 Mbits/sec

    [  4]   8.02-9.02   sec  0.00 Bytes  0.00 bits/sec

    [  4]   9.02-10.02  sec   128 KBytes  1.05 Mbits/sec

    - - - - - - - - - - - - - - - - - - - - - - - - -

    [ ID] Interval           Transfer     Bandwidth

    [  4]   0.00-10.02  sec  1.50 MBytes  1.26 Mbits/sec                  sender

    [  4]   0.00-10.02  sec  1.35 MBytes  1.13 Mbits/sec                  receiver

     

    Iperf results with US as client and UK as server using TCP (unlimited bandwidth):

    iperf3.exe -c 10.158.6.40

    Connecting to host 10.158.6.40, port 5201

    [  4] local 172.16.0.10 port 45401 connected to 10.158.6.40 port 5201

    [ ID] Interval           Transfer     Bandwidth

    [  4]   0.00-1.01   sec   640 KBytes  5.21 Mbits/sec

    [  4]   1.01-2.01   sec   896 KBytes  7.34 Mbits/sec

    [  4]   2.01-3.01   sec   768 KBytes  6.29 Mbits/sec

    [  4]   3.01-4.01   sec  1.12 MBytes  9.44 Mbits/sec

    [  4]   4.01-5.01   sec  1.25 MBytes  10.5 Mbits/sec

    [  4]   5.01-6.01   sec  1.62 MBytes  13.6 Mbits/sec

    [  4]   6.01-7.01   sec   640 KBytes  5.24 Mbits/sec

    [  4]   7.01-8.01   sec   640 KBytes  5.24 Mbits/sec

    [  4]   8.01-9.01   sec   896 KBytes  7.32 Mbits/sec

    [  4]   9.01-10.01  sec  1.12 MBytes  9.44 Mbits/sec

    - - - - - - - - - - - - - - - - - - - - - - - - -

    [ ID] Interval           Transfer     Bandwidth

    [  4]   0.00-10.01  sec  9.50 MBytes  7.96 Mbits/sec                  sender

    [  4]   0.00-10.01  sec  9.43 MBytes  7.91 Mbits/sec                  receiver

     

    What I don't get with these results is the big bandwidth different in opposite directions, and the amount of packet loss being reported on the UDP tests.

     

    I've also run speedtests at each site, and these show up fine, between 300-400Mbps up and down.

    ISP in the UK is Zen Internet, ISP(s) in the US are Cogent and Comcast (issues persists over BOTH ISPs).

     

    IPSEC VPNs are configured as follows:

    config vpn ipsec phase1-interface

        edit "Primary IPSEC to UK"

            set interface "wan1"

            set ike-version 2

            set peertype any

            set mode-cfg enable

            set proposal aes128-sha1

            set dhgrp 14

            set nattraversal disable

            set remote-gw 51.148.10.113

           set psksecret ENC <omitted>

     

    config vpn ipsec phase2-interface

        edit "Primary IPSEC to UK"

            set phase1name "Primary IPSEC to UK"

            set proposal aes128-md5

            set pfs disable

     

    edit "Primary IPSEC to US"

            set interface "wan2"

            set ike-version 2

            set peertype any

            set mode-cfg enable

            set proposal aes128-sha1

            set dhgrp 14

            set nattraversal disable

            set remote-gw 38.126.144.66

            set psksecret ENC <omitted>

     

        edit "Primary IPSEC to US"

            set phase1name "Primary IPSEC to US"

            set proposal aes128-md5

            set pfs disable

     

    Other troubleshooting I've tried;

    [ul]
  • Replaced all Ethernet cables
  • Checked for speed/duplex mismatch at layer 2
  • Checked for CRC/errors on switch ports/firewall ports
  • Failed over firewalls to passive member at both sites
  • Rebooted firewalls
  • Updated Fortios from 5.4.1 to 5.4.4.
  • Speedtests at both sites out to internet show good download/upload speeds
  • Network consultants from Nouveau Solutions Ltd checked Firewall configs and confirmed OK.
  • Zen Internet checked Cisco Catalyst at UK site and confirmed OK.
  • Traffic tested over both ISPs in the US, same outcome.[/ul]

    Where would you guys suggest I head next with this? Is there any more troubleshooting I can perform on the firewalls, or is this more likely to be an ISP issue?

  • 5 REPLIES 5
    MikePruett
    Valued Contributor

    MTU is almost always the culprit for me outside of physical connectivity / transit issues.

     

    When you do a diag sniffer packet on the traffic while it traverses are you seeing fragments at any point?

    Mike Pruett Fortinet GURU | Fortinet Training Videos
    Alex_talmage

    Mike,

     

    Thanks for your response. A diag sniffer packet when doing the iperf does indeed show fragmentation:

     

    18.624443 10.158.6.40.51078 -> 192.168.245.2.5201: udp 8192 (frag 22865:1480@0+) 18.624445 10.158.6.40 -> 192.168.245.2: ip-proto-17 (frag 22865:1480@1480+) 18.624446 10.158.6.40 -> 192.168.245.2: ip-proto-17 (frag 22865:1480@2960+) 18.624447 10.158.6.40 -> 192.168.245.2: ip-proto-17 (frag 22865:1480@4440+) 18.624448 10.158.6.40 -> 192.168.245.2: ip-proto-17 (frag 22865:1480@5920+) 18.624449 10.158.6.40 -> 192.168.245.2: ip-proto-17 (frag 22865:800@7400)

     

    What does this mean?

    MikePruett

    means the packets are too large. You need to change the MTU size on your gear and you should be good. The Gate adds overhead for the IPSec tunnel so you can't push a true 1500 through. On traffic that traverses the tunnel I usually bump it down to 1366 or so.

    Mike Pruett Fortinet GURU | Fortinet Training Videos
    Alex_talmage

    Thanks Mike. Where would I set the MTU, on the end-user equipment or on the Fortigate?

    pbarbieri

    Hello Mike I ask your experience. I have a similar problem between two firewall 600D. They talk by using GRE tunnels. If I ping by CLI GRE to GRE all is perfect. If I ping on the firewall interface (client to client (server to server) ping have timeout with some packet lost. MTU is set on interface to 1300. why do you think that fragmentation could impact on packet lost? ping default size packet is small unless we do not change. which function could be the culprit ? bandwidth is low and 99% are multicast packet with FE QoS DSCP, while ping and all the other traffic (SSH, Ping, HTTP) is C0 Best effort.