- How long was a path in violation?
- Charts are black
- Charts are grey
- High capacity
- Low capacity
- APM and ISP capacity numbers differ
- Recognizing oversubscription
AppNeta Performance Manager (APM) plots performance in real-time; no page refresh is required.
If you change an attribute essential to monitoring, i.e., source, target, target type, instrumentation, QoS, alert profile, a vertical line marks the point at which you reconfigured the path, or restarted the sequencer process.
Gray horizontal lines mark path alert thresholds. When a metric crosses a threshold, it is an event. In case of a violation event the area beyond the threshold is colored red. Whenever a non-default alert profile is applied, faint vertical lines mark the beginning and end of alert profile time ranges.
How long was a path in violation?
On the events page, use the search to filter down to a single path, and then sort by event time. It should be pretty easy to pair violations and clears, depending on how many conditions violated during the filtered time range, and then from there use the timestamps to manually calculate the duration of the violation.
There are no downloadable reports that can provide this information, and the events page only shows the last 7 days of events. For analysis of a greater time range, you’ll need to go to the path performance page and hover over the event markers, or comb through your email notifications if they were set up.
Charts are black
A black overlay means that the monitoring point can’t reach the path target. This could either be because the target doesn’t respond to ICMP, or because the source monitoring point could not resolve the target hostname—it’s commonly the case that users have set monitoring points as end points, but haven’t met this prerequisite.
Charts are grey
A grey block means that APM hasn’t heard from the monitoring point. If you see a grey bar and Manage Monitoring Points says that the monitoring point is offline, this is expected. If the monitoring point is online, then you should check status.appneta.com.
A grey block isn’t a problem in and of itself. Monitoring points traverse the public Internet in order to reach APM. In some cases, glitches happen, and in fact the monitoring point has been designed with this in mind. Monitoring points can cache up to 2 hours of data locally, and back fill the charts when they reconnect. So a handful of events like these aren’t a big deal, but if the condition persists it probably represents a major issue.
Rarely, though worth mentioning, it’s possible that a grey bar persists because it fails to connect to APM on its first few attempts. The monitoring point employs a back-off strategy where it attempts to connect every few seconds, then every minute, and so on until it dampens down to once every thirty minutes. So if some obstruction preventing the monitoring point from connecting to APM is subsequently removed, it could take up to 30 minutes for the grey bar to resolve.
If total capacity measurements are way beyond what you expect, this is usually because the link to your ISP is physically capable of greater speed, but your ISP has used a traffic engineering technique called ‘rate limiting’ to clamp you to the amount specified in your SLA. Usually transactional data and control data is allowed through at full capacity because they are short bursts of traffic, but sustained data transfers like streaming media will trigger the rate limiter.
Because Delivery monitoring is extremely light weight, it too might not be able trigger the rate limiter. As a result, you’ll end up seeing the entire capacity of the link, rather than the amount that has been provisioned for you by your ISP. If you have another monitoring point at the target, Customer Care can enable rate-limited monitoring for you. Rate-limited monitoring increases the amount of data sent in the test in order to trigger the rate limiter. See rate-limited capacity.
There are several reasons why total capacity might be lower than expected. But the first reason might actually be your expectations if you’re used to measuring bandwidth. Remember that capacity is always an end-to-end measurement where the bandwidth provisioned by your ISP is almost certainly with respect to one or a few links. That aside, total capacity for any link or set of links will always be less than bandwidth because it’s a network layer measurement while bandwidth is a physical layer measurement. Link-layer headers and framing overhead reduces rated capacity to a theoretical maximum, which is different for every network technology.
Further reducing capacity is the fact that NICs, routers, and switches are sometimes unable to saturate the network path, and therefore the theoretical maximum can’t be achieved. ‘saturate’ means the ability to transmit packets at line rate without any gaps between packets. All switches can go line rate for the length of time that a packet is being sent. The trick is to be able to send the next packet without rest in between. This capability is referred to as ‘switch capacity’ and is practically impossible to control for even within your own administrative domain, let alone across the public Internet.
Considering both of the these factors—the latter being variable—APM offers the range for total capacity that you can expect given the physical medium and modern equipment with good switching capacity.
Half-duplex links: Total capacity is based on the assumption that traffic will flow in both directions. Therefore, you can expect the total capacity for half-duplex links to be roughly half of what it would be with full-duplex.
|Standard||Standard link speed||L1 + L2 overhead||Theoretical total capacity||Optimal total capacity|
|DS0 or ISDN||64 Kbps||3.9%||61.5 Kbps||61.5 Kbps|
|ISDN dual channel||128 Kbps||3.9%||123 Kbps||123 Kbps|
|T1 (HDLC+ATM)||1.544 Mbps||11.6%||1.365 Mbps||1.325-1.375 Mbps|
|T1 (HDLC)||1.544 Mbps||3.5%||1.49 Mbps||1.40-1.49 Mbps|
|E1||2.0 Mbps||3.5%||1.93 Mbps||1.86-1.95 Mbps|
|T3||45 Mbps||3.5%||43.425 Mbps||42.50-43.45 Mbps|
|10M Ethernet half-duplex||10 Mbps||2.5%||4.875 Mbps||4.8-4.9 Mbps|
|10M Ethernet full-duplex||10 Mbps||2.5%||9.75 Mbps||9.7-9.8 Mbps|
|100M Ethernet half-duplex||100 Mbps||2.5%||48.75 Mbps||48.5-49.0 Mbps|
|100M Ethernet Full-duplex||100 Mbps||2.5%||97.5 Mbps||90-97.5 Mbps|
|Gigabit Ethernet Full-duplex||1 Gbps||2.5%||975 Mbps||600-900 Mbps|
Once your expectations are in line with what APM measures, make sure you choose a good target, because some devices are better than others.
Capacity can also be misleading on a single-ended path. When you have a link with different up/down speeds, e.g., your cable connection at home, a single-ended path only shows the slowest of the two. For example, if you have 50 Mbps download and 5 Mbps upload on your home DSL connection, a single-ended path only shows a capacity of 5 Mbps. You should always use dual-ended paths for asymmetric paths, and you see measurements that don’t look right at least set up an additional dual-ended path to verify that asymmetry isn’t the issue.
Next, it’s important to note that when low capacity as a persistent rather than transient condition is caused by the bottleneck, not congestion. And the bottleneck can be at any point in the path not just the first/last mile. It could instead be far away on the public Internet. To verify which is the case, make an additional path to an AppNeta WAN target, verifying through path route that a different route is taken. If the capacity measurements are the same, then the bottleneck is likely the link to your ISP. Otherwise, the bottleneck is somewhere else on the path, and the capacity you’re seeing is accurate.
Are you seeing corresponding packet loss? Every 1 minute, capacity is measured by sending multiple bursts of back-to-back packets as described in how TruPath works. To measure total capacity, at least one burst must come back with zero packet loss. If that is not the case, then capacity is skipped for that interval. In the case of intermittent packet loss, this leads to a choppy graph, and in the case of sustained packet loss, you’ll see capacity bottom out.
If all of the above checks out, the next thing you want to do is run PathTest, to corroborate the low capacity measurements. Remember that this is a destructive test which measures bandwidth, not capacity.
- If PathTest supports the capacity measurements, then is possible that you’re not getting the proper provisioning from your ISP.
- If the PathTest result is incongruent with your capacity readings, you should open a support ticket so we can help you further investigate.
APM and ISP capacity numbers differ
There are times when the network capacity numbers returned by APM do not match those from a speed test provided by your ISP. If this is the case, try the following:
- Confirm that the speed test run by the ISP is effectively using the same source and target as your test.
- Use dual-ended monitoring (testing a path between two AppNeta monitoring points). Dual-ended monitoring measures network capacity in both directions (source to target and target to source), similar to speed tests. Testing each direction independently allows you to account for asymmetry in the network path. For example, upload and download rates may be different and may take different routes. Single-ended monitoring can only determine the capacity in the direction with the lowest capacity.
Run PathTest. Carriers use a variety of techniques for shaping and policing network traffic, some of which are only clearly evident under load. PathTest does not use lightweight packet dispersion, but rather generates bursts of packets which may trigger carrier shaping technologies. For this test, set up PathTest as follows:
- In APM, navigate to Delivery > Path Plus
- In the PathTest Settings pane:
- Set Protocol to UDP - UDP and ICMP packets are treated differently by network equipment. UDP packets are treated as data traffic whereas ICMP packets are treated as control traffic.
- Set Direction to Both (Sequential).
- Set Bandwidth to Max.
- Click Run Test.
- For cases where you need to measure capacity over time, you can use rate-limited monitoring. Rate-limited monitoring is similar to PathTest in that it loads the network while testing, but instead of a single measurement, it makes measurements at regular intervals over time. Open a support ticket with Customer Care to enable rate-limited monitoring.
Oversubscription is a technique your ISP uses in order to sell the full bandwidth of a link to multiple customers. It’s a common practice and usually not problematic, but if it is impacting performance, you’ll see it first in your utilized capacity measurements.
The first thing you want to do is corroborate capacity measurements with RTT, loss, and jitter. If there are no corresponding anomalies, then whatever triggered the high utilization isn’t really impacting performance. If there are, you’ll then use Usage monitoring to check for an increase in network utilization.
High utilized capacity coupled with no increase in flow data is a classic sign of oversubscription, and it’s time to follow up with your ISP.