- Triggering diagnostics
- Diagnostic results
AppNeta Performance Manager (APM) is designed to continuously monitor the various path metrics that impact application delivery. But in addition, APM has diagnostic capabilities that help you identify source of performance issues and resolve them.
Diagnostics help you to pinpoint problematic hops by using the same packet dispersion techniques as continuous monitoring against each hop in the path. Diagnostics for single-ended paths uses the same icmp packet dispersion analysis as single-ended path monitoring. They are available only in the outbound direction, as there is no requirement that the target be an AppNeta monitoring point.
Dual-ended paths on the other hand must target a monitoring point, and so their diagnostics differ: icmp is used mid-path only, while udp is used against the last hop and for end-to-end analysis. In addition, diagnostics are available in the inbound and outbound direction.
Dual-ended diagnostics are subject to some restrictions:
- Diagnostics are only available for dual-ended paths when the source and target monitoring points are in the same organizational hierarchy. If they are not, single-ended diagnostics are run in the outbound direction, and inbound diagnostics are not available.
There are times when inbound diagnostics, i.e., target-to-source, can’t be run:
- When the source connects to APM via our relay server, inbound diagnostics are unavailable.
- The target and one or more other monitoring points are behind a firewall.
- For the last hop on inbound diagnostics, APM falls back to icmp if the firewall in front of the source is blocking udp.
Diagnostic tests are triggered:
- manually via Delivery > Path List > Path Details panel > Diagnose;
- automatically after path creation;
- automatically upon violation of an alert condition.
Within a parent or child organization, the number of diagnostic tests triggered by a violation event that may be run concurrently is subject to a limit. Once the concurrency limit is reached, a limited number of tests may be queued. Tests triggered after the queue is exhausted are skipped, and a skipped event is marked on the capacity chart. This is true only for tests triggered by a violation event; diagnostics triggered manually or triggered as part of path creation are never skipped. Finally, the path status changes to testing only when the test starts, not when it is queued.
The data details and voice details tabs tabulate results and add some observations and inferences:
- Observations are network conditions that were witnessed during the test; they’re rated by frequency in terms of percentage of packets sent or percentage of times the condition was witnessed. For example, if ‘packet reordering detected’ has a 12% frequency, then 12% of the packets in the train sent to the hop arrived out of their original sequence. Some observations don’t have a frequency because the relevant portion of the diagnostic was only performed once.
- Diagnostics appear when a network condition can be inferred from measurements; they’re rated by certainty from 0 to 100%.
APM assigns an icon to each message to indicate severity. When diagnostics are hidden, the severity column shows the highest severity among all messages for the hop.
Why are some columns are blank? Mid-path analysis is not performed during diagnostics or basic voice assessments for voice handsets and network responder target types.
- Nothing to report.
- Information about a hop.
- The hop is impacting performance.
- The hop couldn’t be measured.
- The hop is severely impacting path performance.
Related network devices
Related network devices lists each hop in a path, and gives you the opportunity to save the hop’s details so that APM can poll it via snmp. If a hop is under your administrative control, click ‘add device’. Complete the dialog that follows, and an initial snmpwalk begins immediately. Then click a table row to see polling results on the network device details page.
Advanced Diagnostics reveals data with certainty of less than 30% and diagnostics ‘inconsistent with end-to-end network performance’. Contact Customer Care to enable this feature. Once one org admin has access to advanced diagnostics, that admin can assign this privilege to any other user.
For any hop, a diagnostic might return measurements that have low certainty or are ‘inconsistent with end-to-end network performance’. In this case, APM presents a simplified report, in which the hop is marked as ‘indeterminate’ and the unqualified data are suppressed. Advanced diagnostics reveals this suppressed data. If the user is prepared to tolerate some uncertainty and/or inconsistency, advanced diagnostics can further help in determining the cause of network issues, especially in the case of packet loss.
What’s different in advanced diagnostics?
The simplified report (default) suppresses data and marks the hop as ‘indeterminate’.
The advanced report reveals data that are suppressed in the simplified report. The intermediate icon changes to a dimmed icon. Toggle ‘show diagnostics’ for per-hop diagnostic details.
Included in the details section are a certainty qualification and messages about dubious measurements.
Diagnostics are dense with data, but luckily APM does the interpretation for you and produces a summary of issues. Each of those issues, or messages, has a corresponding article that explains what it means. You can look up a diagnostic message on this site either by title or by message id via the left-side navigation. To discover the message id, hover over the message link on the summary tab of the test details page and note the url displayed at the bottom of the page.
Whenever a bottleneck is encountered along a path in the direction of a test, total capacity will decrease, regardless of the physical capacity of the path beyond that point. This is consistent with an application’s experience of a network path. From time to time, however, intermediate hops might seem inconsistent with the end-to-end path to a target. You might also observe messages for these intermediate hops such as ‘CPU-limited response - Total Capacity depressed’, ‘Inconsistent behaviors observed at this hop …’, and ‘High utilization detected’; yet the end-to-end path appears normal. For example, the total capacity might appear to be lower than expected in the middle of the network. An intermediate hop may respond with only 19 Mbps while the target hop is responding at 78 Mbps. How could a device that can only handle 19 Mbps pass 78 Mbps to a downstream device? The answer to this question requires an understanding of how modern routers, switches, and firewalls are designed.
Several CPUs and ASICs combine within a single device. When APM directs test packets to the target, they pass through the router’s ASICs. Often you will find that one ASIC is associated with each router port. ASICs are designed to be very fast when routing Layer 3, and most can do some basic onboard queuing and filtering. Rarely will you find ASICs that are capable of handling all routing functions. Specialized functions are passed to a router management CPU which is shared by all ASICs. When APM tests directly to a router hop, the ASIC typically redirects test packets to the router management CPU. In some cases this path is slower or more congested than the main network path through the router. Going back to our original example, we can better understand what APM is reporting. The router is capable of passing 157 Mbps through its ASICs, but we are only able to achieve 38.8 Mbps when testing to the router’s management CPU.
With this basic router architecture in mind, here are few points to consider when examining inter-hop responses:
If you are measuring lower-speed links, you can use routers and switches as targets. For example, if you are testing a 10 Mbps link you may target a router, provided that the router is capable of responding faster than the speed of the link.
Watch for routers that are reporting high utilization, even though end-to-end utilization is low. Typically this indicates that ACLs are redirecting large amounts of network traffic from the ASICs to the router’s management CPU. Inevitably you will find that the show cpu command will reveal that the router’s CPU is busy. If you find a router in this condition, we recommend reordering or removing ACLs, or replacing the router with an appropriate firewall or traffic shaper. Also see ‘High utilization detected’.
In some cases the bottleneck between ASICs and the router management CPU becomes significant to applications, and therefore should be taken into consideration. Although traffic is typically concentrated between ASICs, you may find that the management CPU will handle some types of traffic due to ACLs, broadcasts, multicast, and fragmentation. If the traffic that is handled by the router management CPU is important to your application, you’ll need to devise a workaround.