Troubleshooting Tip: Issues with PMTUD and VXLAN

npaiva · ‎02-08-2023

Solution

This article explains PMTUD-related issues users may encounter when using a VXLAN setup, as well as how to fix or work around them.

Scope

FortiGate with a VXLAN setup using PMTUD.

Solution

PMTUD is a mechanism that allows a client or server to discover the Maximum Transmission Unit (MTU) to a destination.

This technique relies on ICMP control messages to alert the sender of the traffic when a packet is too large to reach the destination.

For PMTUD to work, the transmitter sends the first packets with the 'Don’t fragment' bit attached. In this example, assume that the transmitter is sending a packet with 1500 bytes to a destination and a router along the path to the destination receives this packet on the ingress interface, which has a 1500 bytes MTU. However, the egress interface on this same router has an MTU of 1400.

Since the packet has the 'Don’t fragment' bit attached, the router cannot fragment the packet to send it along the egress interface. Consequently, the router will drop the packet and send an ICMP Type 3, Code 4 message ('Destination unreachable' and 'Fragmentation needed') with the MTU of the next hop back to the transmitter.

As soon as the transmitter receives the packet, it will be able to adjust the MTU of the packets to this specific destination.

Below is an example packet capture of PMTUD in motion, where the server 195.8.215.136 is sending traffic to the Fortigate 192.168.5.82:

Server to client:

server to client DF.png

FortiGate to server:

frag needed mtu next hop.png

As demonstrated, the router with the lower MTU in the interface needs to have ICMP enabled for PMTUD to work. This enables it to send the ICMP message with the MTU of the next hop, which works over Layer 3 boundaries.

With a VXLAN topology, there is no Layer 3 boundary for the hosts. The VLAN is extended across the tunnel for both the source and destination host as if they were connected to the same switch.

Since VXLAN adds an extra overhead of 50 bytes (54 if VLAN tagging is used on the inner payload), this can cause some issues with accessing a server that is located across the VXLAN tunnel. Since the first TCP packets from the client will have the DF bit set to 1 and a size of 1500 to try to find the maximum MTU, these packets will be dropped and no ICMP Type 3, Code 4 message will be sent back to the host.

To fix this issue, use policies to determine the TCP Maximum Segment Size for VXLAN traffic.

See the configuration example below. Key sections are formatted in bold for emphasis.

# edit "port2"

set vdom "root"
set ip 10.254.254.1 255.255.255.252
set allowaccess ping
set type physical
set description "To-DC2"
set snmp-index 2

set vdom "root"
set device-identification enable
set role lan
set snmp-index 9
set interface "port3"
set description "Local subnet"
set vlanid 100

edit "vxlan-int-100"
set interface "port2"
set vni 100
set remote-ip "10.254.254.2"

config system switch-interface

edit "sw100"

set vdom "root"
set member "vlan100" "vxlan-int-100"
set intra-switch-policy explicit

config firewall policy

edit 1

set srcintf "vlan100"
set dstintf "vxlan-int-100"
set action accept
set srcaddr "all"
set dstaddr "all"
set schedule "always"
set service "ALL"
set tcp-mss-sender 1406
set tcp-mss-receiver 1406

This MSS value was calculated with the following knowledge:

- The MTU = 1500 bytes.

- VXLAN overhead with VLAN tag = 54 bytes.

- IP Header = 20 bytes.

- TCP Header = 20 bytes.

In some cases, the value may require adjustment to a lower size because the minimum TCP header size is 20 bytes, but it can go up to 60 bytes with additional options.

Troubleshooting Tip: Issues with PMTUD and VXLAN

You are leaving our website