AppNeta Performance Manager (APM) continuously monitors the various metrics that impact the delivery of application data all along the application path—from the source, through the network, to the destination. It does this using monitoring points equipped with a proprietary technology called TruPath. monitoring points are deployed to various locations within your network as surrogates for actual end-stations. Once deployed, TruPath periodically sends out bursts of packets to a pre-determined target and collects timing data about the packets after traversing the network. That timing data reveals several characteristics about the application path like latency, jitter, and capacity.

What is capacity?

Any discussion on how TruPath works should start with capacity. Total capacity is the highest transmission rate that you can achieve between sender and receiver when no other traffic is on the line. In theory, capacity can be calculated for each segment in the path, and end-to-end capacity is equal to the lowest segment capacity. In other words, you’re never going to go any faster than your slowest link.

Total capacity is calculated based on measurements from numerous test iterations containing various packet patterns. The accuracy of this technique is not affected by variations in latency or cross traffic, e.g., a network path with a 10 Mbps total capacity already considers the other traffic on the network, and reflects this capacity regardless of distance to the target.

Capacity is not bandwidth

It’s important to differentiate capacity from bandwidth. Bandwidth means different things to different people. For our purposes, we’ll say that bandwidth is the potential transmission rate of the physical medium. It’s agnostic of any communications standard so there’s nothing standing in the way of putting bits on the wire except for the type of media and the network cards in use.

Capacity on the other hand is a throughput measurement from the perspective of the network layer where some rules of communication are considered. At the network layer, ‘data’ is IP packets rather than just a constant stream of bits. IP packets are subject to a maximum size in order to accommodate link layer headers and framing overhead, and are also subject to the mechanics of routing, namely queuing delays.

So while capacity and bandwidth are both throughput concepts, they assume different perspectives. Because of this, capacity measurements will generally be lower than any bandwidth value that is marketed by your ISP. Regardless, capacity is the best representation of how your applications are experiencing the network.

How is capacity measured?

TruPath uses packet dispersion analysis to calculate capacity. At it’s simplest, the technique is as follows: send out two packets of equal size back-to-back, with no other traffic on the line. What we’re interested in is the distance between those packets by the time they reach the target, which we call dispersion. To calculate dispersion we measure the time between the arrival of the last byte of the first packet and the last byte of the second packet. Divide the packet size by dispersion to calculate the end-to-end capacity of the path in bits per second.

The method above can only be used when the target is an AppNeta monitoring point because coordination is required between the path endpoints. But, TruPath actually offers two different methods for continuous monitoring. Dual-ended monitoring is the method that requires monitoring points as endpoints, and measurements are taken at the target. This method uses UDP packets. Single-ended monitoring on the other hand uses ICMP packets and can be used against any desired target. The source sends out two back-to-back pings and the target replies to each. Now instead of observing the packets as they arrive at the target, the source can measure round-trip time for each ping + response pair. Half the difference between the two round-trip times is the one-way dispersion.

Whether we use one-way or round-trip times, real networks present a few obstacles to the packet pair method. For example, sometimes packets get dropped. Losing one packet when you only have a pair is a problem. Packets might not get queued, in which case there’s no dispersion and capacity gets overestimated. Not least, analysis is premised on packets dispersing with each hop, but if queueing is uneven between the first and second packet, you could get compression instead. You might think that you could just send out a 1000 packet pairs and average the results, but that leads to spurious results. Instead, TruPath overcomes these obstacles by using many back-to-back test packets, or packet trains.

The measurement process is iterative:

  • Each iteration consists of multiple packet trains with a specific packet size and number of packets per train.
  • Multiple trains enables us to tolerate some loss while using large packets guarantees queuing.
  • Measurements are taken every 1 minute and there are up to 50 packets per train.
  • The initial packet size is the path PMTU. PMTU is the largest packet size a path can handle without fragmentation, and there is a standard discovery method.
  • Performance Manager considers the best option for total capacity measurement to be a train with the largest packet size possible but still experiences zero loss.

Anything that affects the round-trip time of test packets including low-bandwidth links, congested links, and operating system effects, are already considered in the capacity measurement. But these are the same effects your applications are experiencing.

Why you need 10G to measure at 10G

It’s true that TruPath is low impact. Continuous monitoring uses on average just 2 Kbps and diagnostics only 10-200 Kbps. But you still need a 10G monitoring point to monitor at 10G. TruPath is interested in dispersion, which is how much the distance between two packets grows while traveling from A to B, and packet dispersion is premised on queuing. To guarantee that packets actually get buffered you need to send them out with very small inter-packet spacing. Some serious hardware is required for that kind of transmission rate and the time stamping fidelity to go along with it, which is why we introduced 10G monitoring points.

The advantages of capacity over bandwidth

There are two major advantages of using capacity over bandwidth as a measure of network performance. The first is that capacity is a more accurate measure of the network resources available to the application. This is because capacity measured at a higher layer in the network stack, closer to the application layer. It’s the equivalent of moving closer to someone so you can see what they’re pointing at.

The other reason that capacity is better is because the method used to measure it, packet dispersion, is extremely lightweight. As a result the network can be monitored continuously. Bandwidth on the other hand, using tools like PathTest or speedtest.net, is measured in a destructive fashion. They essentially generate huge bursts of packets with the goal of entirely saturating the path to the target. This technique makes the network inoperable for the duration of the test. The implication is that bandwidth testing has to be scheduled rather than continuous, and the test only ever returns maximum bandwidth, never utilized bandwidth.

Available and utilized capacity

Available capacity is calculated using the average dispersion of a series of packet trains. Compare that to total capacity which uses the minimum dispersion over a series of packet trains.

You might ask, ‘why average dispersion?’ Available capacity is straight-forward case of what you see is what you get. If you send out a series of bits, whether test packets or other application traffic, the rate at which those bits reach their target, the throughput that they experience, is what is available to the application. There is no more capacity because other apps are using it, and there is no less because no other apps are asking for it. At AppNeta we use this analogy: imagine a train with a hundred cars going by at some speed, and only a random half of those cars are carrying payload. If you have cargo that you need to get on that train, to you that train is at 50% capacity, regardless of which cars are already full. Once we have available capacity in hand, subtract it from total capacity to calculate utilized capacity.

There’s one caveat. Packet loss and capacity are related: a test packet that returns too late or never returns indicates zero available capacity, or 100% utilization. To account for brief periods of 100% utilization, TruPath includes a coefficient of packets lost in its available capacity calculation.