AppNeta Performance Manager (APM) is designed to continuously monitor the various path metrics that impact application delivery. But in addition, APM has diagnostic capabilities that help you identify source of performance issues and resolve them.

Diagnostics help you to pinpoint problematic hops by using the same packet dispersion techniques as continuous monitoring against each hop in the path. Diagnostics for single-ended paths uses the same icmp packet dispersion analysis as single-ended path monitoring. They are available only in the outbound direction, as there is no requirement that the target be an AppNeta monitoring point.

Dual-ended paths on the other hand must target a monitoring point, and so their diagnostics differ: icmp is used mid-path only, while udp is used against the last hop and for end-to-end analysis. In addition, diagnostics are available in the inbound and outbound direction.

Dual-ended diagnostics are subject to some restrictions:

  • Diagnostics are only available for dual-ended paths when the source and target monitoring points are in the same organizational hierarchy. If they are not, single-ended diagnostics are run in the outbound direction, and inbound diagnostics are not available.
  • There are times when inbound diagnostics, i.e., target-to-source, can’t be run:

    • When the source connects to APM via our relay server, inbound diagnostics are unavailable.
    • The target and one or more other monitoring points are behind a firewall.
    • For the last hop on inbound diagnostics, APM falls back to icmp if the firewall in front of the source is blocking udp.

Triggering diagnostics

Diagnostic tests are triggered:

  • manually via Delivery > Path List > Path Details panel > Diagnose;
  • automatically after path creation;
  • automatically upon violation of an alert condition.

Within a parent or child organization, the number of diagnostic tests triggered by a violation event that may be run concurrently is subject to a limit. Once the concurrency limit is reached, a limited number of tests may be queued. Tests triggered after the queue is exhausted are skipped, and a skipped event is marked on the capacity chart. This is true only for tests triggered by a violation event; diagnostics triggered manually or triggered as part of path creation are never skipped. Finally, the path status changes to testing only when the test starts, not when it is queued.

Diagnostic results

The data details and voice details tabs tabulate results and add some observations and inferences:

  • Observations are network conditions that were witnessed during the test; they’re rated by frequency in terms of percentage of packets sent or percentage of times the condition was witnessed. For example, if ‘packet reordering detected’ has a 12% frequency, then 12% of the packets in the train sent to the hop arrived out of their original sequence. Some observations don’t have a frequency because the relevant portion of the diagnostic was only performed once.
  • Diagnostics appear when a network condition can be inferred from measurements; they’re rated by certainty from 0 to 100%.

APM assigns an icon to each message to indicate severity. When diagnostics are hidden, the severity column shows the highest severity among all messages for the hop.

Why are some columns are blank? Mid-path analysis is not performed during diagnostics or basic voice assessments for voice handsets and network responder target types.

OK
Nothing to report.
Information
Information about a hop.
Warning
The hop is impacting performance.
Indeterminate
The hop couldn’t be measured.
Error
The hop is severely impacting path performance.

Related network devices lists each hop in a path, and gives you the opportunity to save the hop’s details so that APM can poll it via snmp. If a hop is under your administrative control, click ‘add device’. Complete the dialog that follows, and an initial snmpwalk begins immediately. Then click a table row to see polling results on the network device details page.

Advanced diagnostics

Advanced Diagnostics reveals data with certainty of less than 30% and diagnostics ‘inconsistent with end-to-end network performance’. Contact Customer Care to enable this feature. Once one org admin has access to advanced diagnostics, that admin can assign this privilege to any other user.

For any hop, a diagnostic might return measurements that have low certainty or are ‘inconsistent with end-to-end network performance’. In this case, APM presents a simplified report, in which the hop is marked as ‘indeterminate’ and the unqualified data are suppressed. Advanced diagnostics reveals this suppressed data. If the user is prepared to tolerate some uncertainty and/or inconsistency, advanced diagnostics can further help in determining the cause of network issues, especially in the case of packet loss.

advancedDiagnostics_manageUsers

What’s different in advanced diagnostics?

The simplified report (default) suppresses data and marks the hop as ‘indeterminate’.

advancedDiagnostics_dataDetails1

The advanced report reveals data that are suppressed in the simplified report. The intermediate icon changes to a dimmed icon. Toggle ‘show diagnostics’ for per-hop diagnostic details.

advancedDiagnostics_dataDetails2

Included in the details section are a certainty qualification and messages about dubious measurements.

advancedDiagnostics_dataDetails3

Diagnostic messages

Diagnostics are dense with data, but luckily APM does the interpretation for you and produces a summary of issues. Each of those issues, or messages, has a corresponding article that explains what it means. You can look up a diagnostic message on this site either by title or by message id via the left-side navigation. To discover the message id, hover over the message link on the summary tab of the test details page and note the url displayed at the bottom of the page.

pathview-diag-msg-number.png

Interhop analysis

Whenever a bottleneck is encountered along a path in the direction of a test, total capacity will decrease, regardless of the physical capacity of the path beyond that point. This is consistent with an application’s experience of a network path. From time to time, however, intermediate hops might seem inconsistent with the end-to-end path to a target. You might also observe messages for these intermediate hops such as ‘CPU-limited response - Total Capacity depressed’, ‘Inconsistent behaviors observed at this hop …’, and ‘High utilization detected’; yet the end-to-end path appears normal. For example, the total capacity might appear to be lower than expected in the middle of the network. An intermediate hop may respond with only 19 Mbps while the target hop is responding at 78 Mbps. How could a device that can only handle 19 Mbps pass 78 Mbps to a downstream device? The answer to this question requires an understanding of how modern routers, switches, and firewalls are designed.

Several CPUs and ASICs combine within a single device. When APM directs test packets to the target, they pass through the router’s ASICs. Often you will find that one ASIC is associated with each router port. ASICs are designed to be very fast when routing Layer 3, and most can do some basic onboard queuing and filtering. Rarely will you find ASICs that are capable of handling all routing functions. Specialized functions are passed to a router management CPU which is shared by all ASICs. When APM tests directly to a router hop, the ASIC typically redirects test packets to the router management CPU. In some cases this path is slower or more congested than the main network path through the router. Going back to our original example, we can better understand what APM is reporting. The router is capable of passing 157 Mbps through its ASICs, but we are only able to achieve 38.8 Mbps when testing to the router’s management CPU.

With this basic router architecture in mind, here are few points to consider when examining inter-hop responses:

  • If you are measuring lower-speed links, you can use routers and switches as targets. For example, if you are testing a 10 Mbps link you may target a router, provided that the router is capable of responding faster than the speed of the link.

  • Watch for routers that are reporting high utilization, even though end-to-end utilization is low. Typically this indicates that ACLs are redirecting large amounts of network traffic from the ASICs to the router’s management CPU. Inevitably you will find that the show cpu command will reveal that the router’s CPU is busy. If you find a router in this condition, we recommend reordering or removing ACLs, or replacing the router with an appropriate firewall or traffic shaper. Also see ‘High utilization detected’.

  • In some cases the bottleneck between ASICs and the router management CPU becomes significant to applications, and therefore should be taken into consideration. Although traffic is typically concentrated between ASICs, you may find that the management CPU will handle some types of traffic due to ACLs, broadcasts, multicast, and fragmentation. If the traffic that is handled by the router management CPU is important to your application, you’ll need to devise a workaround.

All diagnostic messages

Message ID Message
Excessive latency measured Advanced Analysis “Excessive latency measured”
Excessive packet rtt Advanced Analysis “Excessive packet round-trip time (RTT) detected”
High utilization detected Advanced Analysis “High utilization detected”
Rate-limiting detected Advanced Analysis “Rate-limiting behavior detected”
Maximum transmission unit Maximum Transmission Unit
Message 1 ICMP connectivity error messages received
Message 10 Responder agent in use
Message 1001 Capacity constriction <capacity measure> detected between hop <number> and hop <number>
Message 1002 Black hole detected between hop and target
Message 1003 Congestion detected between hop <number> and target
Message 1004 Duplex conflict detected between hop <number> and target
Message 1005 Excessive latency detected at target
Message 1006 Gray-hole detected between hop <number> and target
Message 1007 Excessively long media used for half duplex causing <percent> loss detected between hop <number> and hop <number>
Message 1008 High utilization <percent> detected at target
Message 1009 Intermittent connectivity causing <percent> loss detected between hop <number> and target
Message 1010 IP QoS value alteration started at hop <number>
Message 1011 Limited performance due to use of half duplex or mixture of half- and full-duplex modes detected between hop <number> and target
Message 1012 Limited performance due to misconfigured NIC cards or drivers detected between hop <number> and target
Message 1013 Limited performance may be due to a mid-path device or a slow-path response at target
Message 1014 Limiting mechanism detected between hop <number> and target
Message 1015 Media errors causing <percent> loss detected between hop <number> and target
Message 1016 MTU constriction <measured MTU> detected at hop <number>
Message 1017 Other possible diagnoses present - see <view> details
Message 1018 Packet loss <percent> originating at hop <number>
Message 1019 Path MTU discovery is not supported at target
Message 1020 MOS value of <MOS value> measured to target <address>, due to <percent> loss
Message 1021 Poorly performing link due to half-duplex constriction detected between hop <number> and target
Message 1022 QoS mechanism may be compromised on network path
Message 1023 Routing loop causing <percent> loss detected between hop <number> and target
Message 1024 Small packet congestion detected between hop <number> and target
Message 1025 Specified VoIP call load is excessive for network path
Message 1026 Largest latency in path detected between hop <number> and hop <number>
Message 1027 White hole detected between hop and target
Message 1028 MTU is too large to reliably detect network problems
Message 1029 Limited performance may be due to a slow-path response at target
Message 1030 Packet reordering <percent> detected at target <address>
Message 1031 Limited performance due to VLAN tagging (802.1p QoS) mismatch detected between hop <number> and target
Message 1033 IP QoS value alteration detected at target <address>
Message 1034 Source appliance appears to be behind a firewall
Message 11 Excessive latency measured
Message 12 Excessive packet round-trip time (RTT) detected
Message 13 Device does not respond suitably to test packets
Message 14 Traceroute failed to complete
Message 154 Some measurements are unavailable at this hop
Message 16 Inconsistent network path detected”
Message 17 Network path does not properly support PMTU discovery
Message 18 Packet loss detected
Message 19 Insufficient statistics to generate all measurements”
Message 2 ICMP error response messages received
Message 21 ICMP TTL Expired message received”
Message 22 ICMP Fragmentation Needed And DF Set message received
Message 23 Total capacity does not correspond to any known standard
Message 24 CPU-limited response - total capacity depressed
Message 25 Total capacity corresponds to an Ethernet standard
Message 26 Total capacity corresponds to a non-Ethernet standard
Message 27 Inconsistent handling of voice packets detected”
Message 28 High utilization detected
Message 3 ICMP router messages received
Message 30 Measured MTU is too large for reliable packet error detection
Message 31 Sub-optimal MOS indicates presence of user-detectable voice degradation
Message 32 IP QoS value was altered in packets
Message 33 Congestion causing packet loss detected
Message 34 Rate-limiting behavior detected
Message 35 Half/Full-duplex conflict detected
Message 36 Full/Half-duplex conflict detected
Message 37 MTU conflict detected
Message 39 Media errors detected
Message 4 ICMP ECHO REPLY packets received that were not solicited
Message 40 Black-hole hop detected - MTU conflicts possible
Message 41 Gray-hole hop detected - MTU conflicts possible
Message 42 Path MTU constriction point detected
Message 43 Path congestion point detected
Message 44 Root cause of target host condition detected
Message 45 Longest link on network path detected
Message 46 Path capacity constriction point detected
Message 47 Inconsistent behaviors observed at this hop - response does not reflect end-to-end performance
Message 5 Packet reordering detected
Message 51 Routing loop in network path detected
Message 52 ICMP limiting by a device in the network path detected
Message 54 White-hole hop detected - performance degradation due to fragmentation possible
Message 55 Half-duplex collision domain violation detected
Message 56 Sub-optimal performance detected
Message 59 Specified voice call load is excessive for this path
Message 6 Detected MTU is a known standard
Message 60 QoS mechanism may be compromised
Message 61 Poorly performing link impacting time sensitive traffic detected
Message 62 Small packet congestion detected
Message 63 Limiting detected at target
Message 64 Intermittent connectivity loss detected during testing
Message 7 Detected MTU is nonstandard
Message 8 Detected MTU is a common jumbo frame size
Message 9 IP address for this hop appears more than once in network path
Mixing tcp and legacy Mixing TCP and Legacy Protocols
Poorly performing hops Poorly Performing Routers/Switches/NICs
The apparent network The apparent network
Packet reordering Understanding reordered packets
Why duplex conflicts occur Why duplex conflicts occur