- How long was a path in violation?
- Charts are black
- Charts are grey
- High capacity
- Low capacity
- APM doesn’t match my ISP
- Recognizing oversubscription
AppNeta Performance Manager (APM) plots performance in real-time; no page refresh is required.
If you change an attribute essential to monitoring, i.e., source, target, target type, instrumentation, QoS, alert profile, a vertical line marks the point at which you reconfigured the path, or restarted the sequencer process.
Gray horizontal lines mark path alert thresholds. When a metric crosses a threshold, it is an event. In case of a violation event the area beyond the threshold is colored red. Whenever a non-default alert profile is applied, faint vertical lines mark the beginning and end of alert profile time ranges.
How long was a path in violation?
On the events page, use the search to filter down to a single path, and then sort by event time. It should be pretty easy to pair violations and clears, depending on how many conditions violated during the filtered time range, and then from there use the timestamps to manually calculate the duration of the violation.
There are no downloadable reports that can provide this information, and the events page only shows the last 7 days of events. For analysis of a greater time range, you’ll need to go to the path performance page and hover over the event markers, or comb through your email notifications if they were set up.
Charts are black
A black overlay means that the monitoring point can’t reach the path target. This could either be because the target doesn’t respond to icmp, or because the source monitoring point could not resolve the target hostname—it’s commonly the case that users have set monitoring points as end points, but haven’t met this prerequisite.
Charts are grey
A grey block means that APM hasn’t heard from the monitoring point. If you see a grey bar and manage monitoring points says that the monitoring point is offline, this is expected. If the monitoring point is online, then you should check status.appenta.com.
A grey block isn’t a problem in and of itself. Monitoring points traverse the public internet in order to reach APM. In some cases, glitches happen, and in fact the monitoring point has been designed with this in mind. Monitoring points can cache up to 2 hours of data locally, and back fill the charts when they reconnect. So a handful of events like these aren’t a big deal, but if the condition persists it probably represents a major issue.
Rarely, though worth mentioning, it’s possible that a grey bar persists because it fails to connect to APM on its first few attempts. The monitoring point employs a back-off strategy where it attempts to connect every few seconds, then every minute, and so on until it dampens down to once every thirty minutes. So if some obstruction preventing the monitoring point from connecting to APM is subsequently removed, it could take up to 30 minutes for the grey bar to resolve.
If total capacity measurements are way beyond what you expect, this is usually because the link to your ISP is physically capable of greater speed, but your ISP has used a traffic engineering technique called ‘rate limiting’ to clamp you to the amount specified in your SLA. Usually transactional data and control data is allowed through at full capacity because they are short bursts of traffic, but sustained data transfers like streaming media will trigger the rate limiter.
Because delivery monitoring is extremely light weight, it too might not be able trigger the rate limiter. As a result, you’ll end up seeing the entire capacity of the link, rather than the amount that has been provisioned for you by your ISP. If you have another monitoring point at the target, Customer Care can enable rate-limited monitoring for you. Rate-limited monitoring increases the amount of data sent in the test in order to trigger the rate limiter. See rate-limited capacity.
There are several reasons why total capacity might be lower than expected. But the first reason might actually be your expectations if you’re used to measuring bandwidth. Remember that capacity is always an end-to-end measurement where the bandwidth provisioned by your ISP is almost certainly with respect to one or a few links. That aside, total capacity for any link or set of links will always be less than bandwidth because it’s a network layer measurement while bandwidth is a physical layer measurement. Link-layer headers and framing overhead reduces rated capacity to a theoretical maximum, which is different for every network technology.
Further reducing capacity is the fact that NICs, routers, and switches are sometimes unable to saturate the network path, and therefore the theoretical maximum can’t be achieved. ‘saturate’ means the ability to transmit packets at line rate without any gaps between packets. All switches can go line rate for the length of time that a packet is being sent. The trick is to be able to send the next packet without rest in between. This capability is referred to as ‘switch capacity’ and is practically impossible to control for even within your own administrative domain, let alone across the public internet.
Considering both of the these factors—the latter being variable—APM offers the range for total capacity that you can expect given the physical medium and modern equipment with good switching capacity.
Half-duplex links: Total capacity is based on the assumption that traffic will flow in both directions. Therefore, you can expect the total capacity for half-duplex links to be roughly half of what it would be with full-duplex.
|Standard||Standard link speed||L1 + L2 overhead||Theoretical total capacity||Optimal total capacity|
|DS0 or ISDN||64 Kbps||3.9%||61.5 Kbps||61.5 Kbps|
|ISDN dual channel||128 Kbps||3.9%||123 Kbps||123 Kbps|
|T1 (HDLC+ATM)||1.544 Mbps||11.6%||1.365 Mbps||1.325-1.375 Mbps|
|T1 (HDLC)||1.544 Mbps||3.5%||1.49 Mbps||1.40-1.49 Mbps|
|E1||2.0 Mbps||3.5%||1.93 Mbps||1.86-1.95 Mbps|
|T3||45 Mbps||3.5%||43.425 Mbps||42.50-43.45 Mbps|
|10M Ethernet full-duplex||10 Mbps||2.5%||4.875 Mbps||4.8-4.9 Mbps|
|10M Ethernet full-duplex||10 Mbps||2.5%||9.75 Mbps||9.7-9.8 Mbps|
|100M Ethernet half-duplex||100 Mbps||2.5%||48.75 Mbps||48.5-49.0 Mbps|
|100M Ethernet Full-duplex||100 Mbps||2.5%||97.5 Mbps||90-97.5 Mbps|
|Gigabit Ethernet Full-duplex||1 Gbps||2.5%||975 Mbps||600-900 Mbps|
Once your expectations are in line with what APM measures, make sure you choose a good target, because some devices are better than others.
Capacity can also be misleading on a single-ended path. When you have a link with different up/down speeds, e.g., your cable connection at home, a single-ended path only shows the slowest of the two. For example, if you have 50 Mbps download and 5 Mbps upload on your home DSL connection, a single-ended path only shows a capacity of 5 Mbps. You should always use dual-ended paths for asymmetric paths, and you see measurements that don’t look right at least set up an additional dual-ended path to verify that asymmetry isn’t the issue.
Next, it’s important to note that when low capacity as a persistent rather than transient condition is caused by the bottleneck, not congestion. And the bottleneck can be at any point in the path not just the first/last mile. It could instead be far away on the public internet. To verify which is the case, make an additional path to an AppNeta WAN target, verifying through path route that a different route is taken. If the capacity measurements are the same, then the bottleneck is likely the link to your ISP. Otherwise, the bottleneck is somewhere else on the path, and the capacity you’re seeing is accurate.
Are you seeing corresponding packet loss? Every 1 minute, capacity is measured by sending multiple bursts of back-to-back packets as described in how TruPath works. To measure total capacity, at least one burst must come back with zero packet loss. If that is not the case, then capacity is skipped for that interval. In the case of intermittent packet loss, this leads to a choppy graph, and in the case of sustained packet loss, you’ll see capacity bottom out.
If all of the above checks out, the next thing you want to do is run PathTest, to corroborate the low capacity measurements. Remember that this is a destructive test which measures bandwidth, not capacity.
- If PathTest supports the capacity measurements, then is possible that you’re not getting the proper provisioning from your ISP.
- If the PathTest result is incongruent with your capacity readings, you should open a support ticket so we can help you further investigate.
APM doesn’t match my ISP
Sometimes your expectations of capacity are based on a speed test provided by your ISP. Different tests yielding different results is not uncommon, but there are reasons why the comparison might not be legitimate. To make sure you’re comparing apples to apples:
- Run a dual-ended path. Speed tests usually return an up value and a down value. Running a single-ended path won’t do because they return only one number, the lower value of inbound and outbound.
- Make sure APM is targeting the same geographic region as the speed test. APM measures the capacity of the bottleneck between two locations. A path from A-to-B could have a different bottleneck than and A-to-C, and so we expect that their test results could be irreconcilable.
- If still no match, run PathTest. PathTests are your source of truth because it uses flooding instead of packet dispersion. Use udp for the protocol because it gets treated like actual data traffic by network equipment, where as icmp is control traffic, which gets treated differently. Make sure you’re testing in both directions, and when it comes to choosing a bandwidth, set it to ‘max’.
- If PathTest doesn’t match delivery monitoring, rate limiting or some other traffic engineering might be at play. In this case open support ticket so that Customer Care can turn on rate-limited monitoring.
Admittedly, what’s outlined above is going to be a chore. It’s not often that our customers have a spare monitoring point laying around at precisely the geographic location needed. You’re better off working with your ISP to re-run the test with their tool but with a source and target that are congruent to your path.
Oversubscription is a technique your ISP uses in order to sell the full bandwidth of a link to multiple customers. It’s a common practice and usually not problematic, but if it is impacting performance, you’ll see it first in your utilized capacity measurements.
The first thing you want to do is corroborate capacity measurements with RTT, loss, and jitter. If there are no corresponding anomalies, then whatever triggered the high utilization isn’t really impacting performance. If there are, you’ll then use usage monitoring to check for an increase in network utilization.
High utilized capacity coupled with no increase in flow data is a classic sign of oversubscription, and it’s time to follow up with your ISP.