One of the most powerful and yet subtle capabilities of Delivery monitoring is its ability to discover Maximum Transmission Unit (MTU) conditions and problems on a link. MTU misalignments can result in network idiosyncrasies and degradations that are extremely difficult to diagnose. Delivery monitoring checks for several significant MTU error conditions for each link within a network path. Passing these checks ensures that MTU works properly along the entire path.
What is MTU?
When Internet Protocol is used to transfer data across a path, data is encapsulated into packets before it leaves the physical interface. If the application is sending a small chunk of data, for example 500 bytes, it will usually be sent within a single packet. However, when the application must send a larger chunk of data, the data must be distributed over several packets. The MTU of the medium determines the maximum size of the packets that can be transmitted without fragmentation. Network links that have properly configured MTUs are more efficient. Typically the internet operates with an MTU of 1500 bytes, however other values are acceptable. Mixing different MTU values within one network path is also acceptable, provided that all components within a network path share similar rules regarding conflicting MTUs. As a general rule, fragmentation should be avoided at all costs.
To understand MTU, one must be very aware of the difference between frames and packets. Frames are generated by Layer-2 devices and encapsulate Layer-3 packets. The following diagram shows a 1500 byte TCP/IP packet passing through Ethernet. Notice that although Ethernet supports 1518 bytes frames, it is designed to carry at most 1500 byte packets; therefore, the MTU of Ethernet is 1500 bytes.
There are several RFCs that define MTU. One significant RFC is RFC 1191 that defines the Path MTU Discovery of the link. Other RFCs also apply, which is why the maximum packet size allowed on a network without fragmentation is based on the type of network connections involved.
|879, 1356||X.25, ISDN||576|
|1055||Serial Line IP (SLIP)||1006|
|1042, 2516||IEEE 802.3/802.2, PPPoE||1492|
|1042||4 Mbit Token Ring||4464|
|1042||802.4 Token Bus||8166|
|IBM||16 Mbit Token Ring||17914|
Why Use Large MTUs?
Increasing frame size gives better performance. To understand why, examine the following results from Advanced Test Engineering and Measurement. They are performance measurements taken across the Abilene and CA*net4 backbone using Delivery monitoring between various universities, at various MTU settings. Notice that theoretically, increasing MTU should have just a minor effect on network’s maximum performance. However, most high-speed networks do not work at full capacity. For these networks, increasing MTU has proven to greatly increase overall throughput.
Understanding MTU Conflicts
MTU negotiation errors are very difficult to detect and manifest themselves in subtle but destructive behaviors. When an MTU conflict exists, MTU negotiation fails and packets will be lost when they are too large to traverse the network. Due to the nature of MTU conflicts, traffic will fail in one direction but not in the other. What is often misunderstood is the fact that an MTU conflict typically results in slow network connections, not broken connections. Here are some real-life symptoms that were eventually attributed to MTU negotiation conflicts:
- FTP upload of a large file takes a few minutes, but download takes hours, or even fails.
- FTP downloads work only with small files.
- Web pages load quickly, but .GIF files are very slow to load, and occasionally fail.
- Email works, but discussion databases take forever to load.
- Normal server to client traffic is quick, but client backups fail.
To understand why MTU conflicts often result in slow links, consider the following client-server TCP flow. It illustrates a typical environment where a Gigabit server configured with 9000 byte MTU is incorrectly connected via a Layer 2 switch to a 10/100 client employing a 1500 byte MTU. In this example, a black hole condition exists for traffic flowing from the server to the client. The server receives a request from the client, and a 9000 byte packet is generated. The Layer 2 switch accepts the packet, but drops it once it discovers that the packet is too large to send to the client. Since Layer 2 switches have no knowledge of Layer 3 content, they cannot inspect the DF bit in the IP header, nor can they generate a sufficient response to the server to explain why it dropped the packet. As far as the server is concerned, the packet was lost due to congestion.
TCP contains a slow-start congestion avoidance algorithm that shrinks the TCP transmit window to half its size when a packet has timed out. In the case of a black-hole hop, the result is a retransmitted packet that is half its previous size. In this example, 3 packets are lost before the TCP transmit window produces a 1125 byte packet. This packet does survive, which produces a response from the client and in turn keeps the connection alive. Eventually the connection develops a cycle of sending a 2250 byte packet that is lost, waiting a timeout period, and then sending a 1125 byte packet that is received. Overall, the connection does not die. Rather, it runs very slowly.
To avoid MTU conflicts, you must ensure that you deploy Layer 2 devices on MTU boundaries, and that you do not filter out ICMP messages. This will ensure that Path MTU discovery (PMTU) works as described in RFC 1191. The following diagram illustrates how PMTU discovery works.
Most modern operating systems and applications are PMTU enabled, and thereby the "Don't Fragment" bit is set in all IP headers. Therefore, when the 9000 byte packet is received by the router, fragmentation is not attempted. Rather, the result is an ICMP "fragmentation required but DF set" message, also known as the "too big" message. When the server receives this ICMP message, it updates its routing table for the client with the MTU reported in the message, and will remember to send smaller packets to the client. Note that once the client's MTU has been discovered, the MTU is not renegotiated on subsequent connections.
In complex environments it is easy forget to install Layer 2 routers between MTU boundaries. To avoid problems, we suggest maintaining MTU values on logical diagrams outlining the rules for Layer 3 subnets. Doing so establishes rule sets for portions of networks regardless of how they are physically connected. Here is an example.
Another common cause of black-hole hops is confusion between frame size and MTU. The following diagram illustrates a breakdown of a typical Ethernet frame.
Notice that at Layer 2, a typical Ethernet frame has a maximum size of 1518 bytes. But at Layer 3 we deal with packets, and in this example the MTU represents the maximum packet size of 1500 bytes. It is important to understand that the difference between packet size and Ethernet frame size is 18 bytes. Therefore, if you want to set up a GigE connection with a 9000 byte MTU, you must set the frame size of the NIC to 9018. When the cause of this condition is a reduced MTU at a destination hop, Maximum Segment Size (MSS) negotiation can protect TCP from failure. However, many Black-Hole Hops are caused by incorrectly configured mid-point layer-2 devices, in which case MSS negotiations are ineffective. Furthermore, in many scenarios MSS negotiation is ignored.
Avoiding MTU Conflicts
To avoid MTU problems, consider the following:
- Use one common MTU per subnet
- Separate different MTUs with Layer 3 routers. Avoid Layer 2 FDDI/Token Ring/Ethernet bridges.
- Do not filter ICMP packets.
- Use reputable VPN solutions.
- Test network paths from high MTU to low MTU (appliance on large MTU end)
- Maintain logical diagrams to establish rule sets for MTUs, and to ensure that routers separate MTU domains.
- Avoid adjusting MTU of clients to compensate for network issues.
- Ensure server and network personnel have established common MTU policies.
The Delivery monitoring ApproachMTU negotiation is an important part of the overall health of a network. During a test, Delivery monitoring will report the following MTU conditions detected along the test path:
- The (apparent measurable) PMTU
- Nonstandard MTU's in use
- Standard MTU's in use, other than 1500 bytes (Ethernet)
- "Black-hole" hop, where a router fails to send the "fragmentation needed and DF set" ICMP message. In other words, it is unable to properly participate in MTU negotiation.
- "Gray-hole" hop, where a router returns the wrong MTU value in the "fragmentation needed and DF set" ICMP message. In other words, it is responding with an incorrect MTU value for the constricting hop.
- MTU conflicts, where the network is exhibiting behavior, including packet loss, that corresponds to an MTU conflict with one or more devices on the network path.
Other resourcesRFC 791 RFC 792 RFC 1191