Solution
This article explains PMTUD-related issues users may encounter when using a VXLAN setup, as well as how to fix or work around them.
Scope
FortiGate with a VXLAN setup using PMTUD.
Solution
PMTUD is a mechanism that allows a client or server to discover the Maximum Transmission Unit (MTU) to a destination.
This technique relies on ICMP control messages to alert the sender of the traffic when a packet is too large to reach the destination.
For PMTUD to work, the transmitter sends the first packets with the 'Don’t fragment' bit attached. In this example, assume that the transmitter is sending a packet with 1500 bytes to a destination and a router along the path to the destination receives this packet on the ingress interface, which has a 1500 bytes MTU. However, the egress interface on this same router has an MTU of 1400.
Since the packet has the 'Don’t fragment' bit attached, the router cannot fragment the packet to send it along the egress interface. Consequently, the router will drop the packet and send an ICMP Type 3, Code 4 message ('Destination unreachable' and 'Fragmentation needed') with the MTU of the next hop back to the transmitter.
As soon as the transmitter receives the packet, it will be able to adjust the MTU of the packets to this specific destination.
Below is an example packet capture of PMTUD in motion, where the server 195.8.215.136 is sending traffic to the Fortigate 192.168.5.82:
Server to client:
FortiGate to server:
As demonstrated, the router with the lower MTU in the interface needs to have ICMP enabled for PMTUD to work. This enables it to send the ICMP message with the MTU of the next hop, which works over Layer 3 boundaries.
With a VXLAN topology, there is no Layer 3 boundary for the hosts. The VLAN is extended across the tunnel for both the source and destination host as if they were connected to the same switch.
Since VXLAN adds an extra overhead of 50 bytes (54 if VLAN tagging is used on the inner payload), this can cause some issues with accessing a server that is located across the VXLAN tunnel. Since the first TCP packets from the client will have the DF bit set to 1 and a size of 1500 to try to find the maximum MTU, these packets will be dropped and no ICMP Type 3, Code 4 message will be sent back to the host.
To fix this issue, use policies to determine the TCP Maximum Segment Size for VXLAN traffic.
See the configuration example below. Key sections are formatted in bold for emphasis.
# edit "port2"
set vdom "root"
set ip 10.254.254.1 255.255.255.252
set allowaccess ping
set type physical
set description "To-DC2"
set snmp-index 2
next
edit "vlan100"
set vdom "root"
set device-identification enable
set role lan
set snmp-index 9
set interface "port3"
set description "Local subnet"
set vlanid 100
next
config system vxlan
edit "vxlan-int-100"
set interface "port2"
set vni 100
set remote-ip "10.254.254.2"
next
end
config system switch-interface
edit "sw100"
set vdom "root"
set member "vlan100" "vxlan-int-100"
set intra-switch-policy explicit
next
end
config firewall policy
edit 1
set srcintf "vlan100"
set dstintf "vxlan-int-100"
set action accept
set srcaddr "all"
set dstaddr "all"
set schedule "always"
set service "ALL"
set tcp-mss-sender 1406
set tcp-mss-receiver 1406
next
end
This MSS value was calculated with the following knowledge:
- The MTU = 1500 bytes.
- VXLAN overhead with VLAN tag = 54 bytes.
- IP Header = 20 bytes.
- TCP Header = 20 bytes.
In some cases, the value may require adjustment to a lower size because the minimum TCP header size is 20 bytes, but it can go up to 60 bytes with additional options.