We all know that T1 lines run at 1.544Mbps. All 10BaseT cards run at 10Mbps, and GigE routers connect GigE cards at 1Gbps. Then why is it that some devices run more slowly than others? Surprisingly, it is not a simple question to answer. In this section we discuss what constitutes a slow device, and how to measure a device’s capacity.

The apparent network

Before we define what a slow network component is, we must first define a demarcation point for the network. Most network engineers consider the network to be routers, switches, NIC cards, and wires. But, is this enough? Consider what an application developer believes to be the network. They believe that a network is an API on one side of a network path to an API on the other end. This includes NIC drivers, CPU, memory, busses and operating systems. Now consider an end user who believes that everything is the network, including the application and computer.

For the purposes of this discussion, we will consider a network to be from the API on one end of the network path to the API on the other end. We like to refer to this type of network as the "Apparent Network".

An Apparent Network reaches further than the traditional view of a network. The traditional "network is this big and this full" approach is often insufficient. We must consider operating systems, packet drivers, API interfaces, network protocols, which API is used and how an application uses the networks.

Performance degradation

So, why would finding the cause of a slow network be so difficult? Consider the traditional approaches one must take. Whether you use consultants or your in-house resources, your approach typically is the same. You will use various tools to gather information about what the networks and computers are doing. Utilization charts, packet traces, application response times and byte/sec, pages/sec and seeks/sec reports are analyzed and attempts are made to find where congestion is occurring.

Using these approaches is rarely effective in establishing the source of the performance degradation. Unless a glaring problem is found, rules of thumb are used, interpretation of results are disputed, finger pointing occurs and often upgrades are performed across the board.

The problem is that all the information that you gathered is showing you only what the equipment is doing and not measuring what the equipment can do. For example, if a disk drive is performing 160 seeks/second and transferring 15Mbps, how do we know if it is saturated? Do you need to upgrade?

What is Missing?

When we encounter any slow network component (bottleneck) we often find equipment that is unable fill the downstream link. For example, look at the following diagram.

 pathview-poorly-performing-routers-1.png
pathview-poorly-performing-routers-2.png

A fast router can saturate a downstream link if required to do so. The slow router adds additional gaps between frames. To complicate matters, some network devices add larger gaps as frame sizes reduce.

The above scenario occurs very frequently. We have measured 100Mbps routers that perform at no more than 37Mbps using 1500 byte packets. Based on manufacturers and models, we see some 100Mbps NIC cards max out at between 8 to 75MBps. We have seen T1 links operating at only 64Kbps, identifying the fact that only one channel was turned on. And, although we see some GigE cards operating at 880Mbps, others only operate at 60Mbps.

Traditional approaches to network measurement ignore these limits. Some tools measure one point of the network using packet analyzers, SNMP or RMON. Assumptions are made that the equipment feeding the link can saturate the link.

Still, other tools will simulate application response times and throughputs. Response times for applications include network, CPU, memory, buss and disk I/O latencies. Network throughputs are usually based on single-stream application data, which usually does not saturate network links. A network engineer who is presented with this information is expected to be able to quickly analyze this data and understand which piece of networking gear is not performing. In reality, this usually is impossible.

Bottleneck vs. Congestion Point

Often we see the terms network bottleneck and network congestion point used interchangeably. We believe that there is a distinction between these terms. A network bottleneck is the single point on a network path that is the slowest, and could be in the physical network or at the endpoint CPUs. Every network path has one bottleneck. Improve the performance at the bottleneck, and the bottleneck moves to another point in the path.

A congestion point on the other hand is the point in a network path where production traffic is backing up. Often there is congestion at the bottleneck, but not always. There may be several congestion points on a network.

One of the primary goals of Delivery monitoring is to measure the speed of the bottlenecks, as seen by the application. By using an end-to-end approach, Delivery monitoring passes through all components in the Apparent Network, following the same path that the application does. Using sampling and mathematics, Delivery monitoring subtracts other traffic and latency, allowing you to identify the exact maximum speed of the bottleneck no matter where in the world that bottleneck exists.

Additionally, Delivery monitoring measures the end-to-end congestion of a network path without prior knowledge of the network. It does not require that you own the equipment that you are testing. By measuring congestion points, you can quickly identify Frame Relay vendors that are over subscribed, or high-speed backbones that need upgrading.