Set Up Monitoring

This article provides the recommended approach to configuring alerts and notifications in AppNeta Performance Manager (APM) such that the appropriate teams and/or individuals are notified when significant network or application events are detected.

Within APM, alerts are events that indicate something is not as it should be, for example, unacceptable data loss on a network path. Alerts can be generated based on a variety of network path, web path, and Usage metrics (not covered in this article). The thresholds for the metrics that are important to you are specified in alert profiles and these profiles are applied to the paths of interest. Once the alert profile is applied, alerts are generated when the metrics being monitored fall outside the specified thresholds.

This information is available within APM (for example, in Events and in the Violation Breakdown Report) but you can also be notified of an alert in one of three ways:

  • Email notifications - Emails with alert details are sent directly from APM to specified email addresses.
  • Event integration - Messages containing the alerts are sent to your event management server for processing.
  • SNMP notifications - SNMP notifications containing the alerts are sent to an SNMP network management system for processing (not covered in this article).

As it is quite easy to generate too many alert notifications, this article describes an approach that will help you identify what is important to you and help limit alert notifications to those that need to be acted on and do it in a controlled manner.

We recommend starting small and limiting the scope of your alerting and notifications then gradually increasing the scope over time. To this end, we recommend the following approach:

  1. Identify - Identify the various segments of your infrastructure to be monitored.
  2. Configure - For a given segment, deploy Monitoring Points and set up monitoring.
  3. Baseline - Monitor the segment to determine what constitutes a violation and update the alert profile as necessary.
  4. Notify - When a violation occurs, notify the appropriate individuals or teams according to their standard operating procedure.
  5. Repeat - Repeat for the next segment of your infrastructure.

Step 1: Identify

Identify the various segments of your infrastructure (logical groupings of network or web paths) to be monitored and the individuals or teams responsible for each of them. The goal is to create a configuration plan that you can use to configure alerts and notifications in a controlled, step-by-step, manner - one segment at a time.

Considerations:

  • Scope - Each segment needs to include either network paths or web paths.
  • What to alert on - The metrics to be alerted on need to be the same across the segment and are typically related to a Key Performance Indicator (KPI). For example, if you are alerting on latency, paths spanning continents will have a higher latency than local paths to the same target so you would separate the paths into different segments with different acceptable latency thresholds.
  • Priority - Some segments are business critical and when a violation occurs someone must be notified immediately. For other non-business critical segments this is not the case and a longer notification period is fine.
  • Time of day - Violations that occur during business hours are typically more critical than those that occur outside of business hours.
  • Team responsible - There should typically be only one team responsible for the segment - the team to notify when there is a problem and that owns resolution.
  • How to notify - How is the team responsible used to receiving notifications? AppNeta notifications need to fit with your standard operating procedures.

Use a planning table similar to the one below to record your plan:

  • Segment name - A name to identify the infrastructure segment you are monitoring.
  • Scope - The network paths or web paths in the segment.
  • Metrics - The metrics to monitor and what constitutes a violation and a clear. This is the alert profile configuration. Leave this column blank to begin with. Update it during the Baselining step.
  • When to notify - Once a violation occurs or is cleared, this is when someone is notified.
  • Who to notify - When a notification is sent, this is the team or individual the notification is sent to.

Example planning table:

Segment name Scope Metrics When to notify Who to notify
Data Center All network paths from branch offices to the Data Center “Connectivity” -
Violates immediately when connectivity is lost
Clears immediately when connectivity is restored

“Data loss” -
Violates when >= 2% for 5 mins.
Clears when < 2% for 5 mins.
Notify if violation persists for 2 minutes Core infrastructure team
Office 365 All web paths from branch offices to Office 365 “Connectivity” -
Violates immediately when connectivity is lost
Clears immediately when connectivity is restored

“Page load time” -
Violates when >= 9000 ms for 2 consecutive tests
Clears when < 9000 ms for 2 consecutive tests
Notify if violation persists for 2 minutes Application team

Step 2: Configure

Once you have deployed Monitoring Points for a given segment, it is important to configure monitoring such that the paths in the segment are properly identified so they can be referenced in Step 4: Notify.

Network paths and web paths should be tagged and assigned a group based on the service being monitored.

Network Paths:

  1. If you have not already configured network paths, consider creating a path template group for the segment (e.g. “Data Center”).
  2. Specify a tag (e.g. “Service: Data Center”).
  3. Specify a group (e.g. “Data Center”).
  4. Leave the default network path alert profile.

Web paths:

  1. If you have not already configured web paths, create a web app group for the segment (e.g. “Office 365”).
  2. Add a tag to the associated web paths (e.g. “App: Office 365”).
  3. Add the associated single-ended network paths to a group (e.g. “Office 365”).
  4. Leave the default web path alert profile.

Step 3: Baseline

Once monitoring is configured, you need to tune alerts such that you don’t receive too many or too few. Baselining is the process of monitoring a segment over the course of several days to understand what typical metrics look like in order to understand what would constitute an anomaly that must be alerted on.

  1. Use the Violation Breakdown Report (for network paths) or the Web App Performance dashboard (for web paths) to see how many alerts are generated.
  2. If the default alert profile is generating too many or too few alerts, create a custom alert profile for the segment (for network paths or web paths) and apply it to the set of paths. Considerations include:
    • Data Loss - The Data Loss threshold in the default alert profile (Network Path Default - Data) may be too tolerant for low loss links.
    • Voice Jitter - The Voice Jitter threshold in the default alert profile (Network Path Default - Voice) might be too tolerant for voice links.
    • Latency - The default alert profiles for network paths do not alert on latency. If you have a reliable limit that should apply across all locations in the segment then you may want to set a limit. This is especially true when using the auto-target GMT (gmt.pm.appneta.com) which should reliably get you to a low latency target. Also, latency to your DNS server should always be <10 msec.
    • Connectivity - The Connectivity threshold in the default alert profiles (Network Path Default - Data or Network Path Default - Voice) which alerts to every connectivity loss event, may not be appropriate for paths involving laptops or workstations since it is expected that they will go offline regularly. We recommend creating a custom alert profile for such paths that does not contain a Connectivity threshold (for example: Data Loss: Violate >2% for 5 min; Clear <2% for 5 min).
    • Violation clear time - Consider configuring a longer time to clear events (for example, 5 mins) rather than increasing a threshold. This reduces the noise when performance is ‘flapping’.
  3. Continue to monitor (review the Violation Breakdown Report over the course of a standard business week) and adjust the alert profile until non-actionable alerts are minimized.
  4. Update the planning table with the final alert profile.

Step 4: Notify

Once alerts are being generated correctly, set up notifications to notify the appropriate individuals or teams when a violation occurs. The typical ways to do this are via email directly from APM or via an event management system. If you have an event management system you’ll want to use it. If not, you’ll use email notifications from APM.

Email notifications

To receive email notifications directly from APM:

  1. Create a Saved list.
    1. Name it after the segment (e.g. “Data Center” or “Office 365”).
    2. Include a Group filter using the Group specified in Step 2: Configure (e.g. “Data Center” or “Office 365”).
  2. Create a Notification Profile
    1. Name it after the segment (e.g. “Data Center” or “Office 365”).
    2. Specify the organization.
    3. Edit Email Addresses and specify those of the individuals or teams to notify. See “Who to notify” in the planning table above.
    4. If the segment contains network paths (Delivery monitoring), select Network path alert profile Violate/Clear events. If it contains web paths (Experience monitoring), select Web path alert profile Violate/Clear events.
    5. Remove the default list.
    6. Add a list using the Saved List for the segment (e.g. “Data Center” or “Office 365”) (created above) and a notification type of Violation persists for. Use a short time period (e.g. 2 mins) for business critical segments and a longer time period (e.g. 10 mins) for medium-high priority segments. Note that other notification types (Immediate and Digest Summary) are available and can be used in cases where Violation persists for is not ideal.
    7. If you are notifying on Monitoring Point connectivity to APM violations, select Monitoring Point Availability event and specify the notification type (typically, Unavailable persists for).

Event management system

To receive notifications from an event management system:

  1. Set up Event Integration to send alert messages to your event management system.
    • Sequencer Events for monitoring Monitoring Point connectivity
    • Service Quality Events for network path monitoring (Delivery)
    • Web Application Events for web path monitoring (Experience)
  2. Configure your event management system for notifications.
    1. Aggregate alerts for a given segment based on tags configured in Step 2: Configure (e.g. “Service: Data Center” or “App: Office 365”).
    2. Notify individuals or teams when a certain number or percentage of paths in a segment are violating. See “Who to notify” in the planning table above.
    3. Escalate notifications depending on severity of issue based on number or percentage of violating paths, length of time in violation, etc.

Step 5: Repeat

Repeat from Step 2: Configure for the next segment of your infrastructure (as defined in the planning table created in Step 1: Identify).

Examples

Example 1: Single Service - Data Center

This is an example of how to configure alerts and notifications for a segment containing network paths from branch offices to a Data Center.

Identify

Segment name Scope Metrics When to notify Who to notify
Data Center All network paths from branch offices to the Data Center “Connectivity” -
Violates immediately when connectivity is lost
Clears immediately when connectivity is restored

“Data loss” -
Violates when >= 2% for 5 mins.
Clears when < 2% for 5 mins.
Notify if violation persists for 2 minutes Core infrastructure team

Configure

  • Monitoring Point deployed to the Data Center
  • Monitoring Points deployed to branch offices
  • Path Template Group - “Data Center”
    • Tag - “Service: Data Center”
    • Group - “Data Center”

Notify (Email)

  • Saved List - “Data Center”
    • Group - “Data Center”
  • Notification Profile - “Data Center”
    • Emails to “Core infrastructure team”
    • Network path alert profile Violate/Clear events
    • Saved List - “Data Center”
    • Immediate notification

Notify (Event management system)

  • Send Service Quality Events
  • Aggregate by tag - “Service: Data Center”
  • Notify “Core infrastructure team”

Example 2: Single Web App - Office 365

This is an example of how to configure alerts and notifications for a segment containing web paths from branch offices to Office 365.

Identify

Segment name Scope Metrics When to notify Who to notify
Office 365 All web paths from branch offices to Office 365 “Connectivity” -
Violates immediately when connectivity is lost
Clears immediately when connectivity is restored

“Page load time” -
Violates when >= 9000 ms for 2 consecutive tests
Clears when < 9000 ms for 2 consecutive tests
Notify if violation persists for 2 minutes Application team

Configure

  • Monitoring Points deployed to branch offices
  • Web App Group - “Office 365”
    • Monitor to - Office 365 URL (https://login.microsoftonline.com/ )
  • Group for single-ended network paths - “Office 365”
  • Tag for web paths - “App: Office 365”

Notify (Email)

  • Saved List - “Office 365”
    • Group - “Office 365”
  • Notification Profile - “Office 365”
    • Emails to “Application team”
    • Web path alert profile Violate/Clear events
    • Saved List - “Office 365”
    • Immediate notification

Notify (Event management system)

  • Send Web Application Events
  • Aggregate by tag: “App: Office 365”
  • Notify “Application team”

Example 3: Data Center - two regions

This is an example of how to configure alerts and notifications for a segment containing network paths from branch offices in two regions (North America (NA) and Europe, Middle East, and Africa (EMEA)) to a Data Center in California. In this case, in addition to alerting on Connectivity and Data Loss, we want to be alerted when latency is too long. We also recognize that the latency from offices in North America should be much less than that from offices in EMEA due to the differences in physical distance to the Data Center so in order to distinguish the two we create separate segments. That said, if you were only concerned when latency was very poor (say >200msec) you could combine the two regions and create a single segment. This would simplify the configuration.

Identify

Segment name Scope Metrics When to notify Who to notify
Data Center - North America All North American network paths from branch offices to the Data Center “Connectivity” -
Violates immediately when connectivity is lost
Clears immediately when connectivity is restored

“Data loss” -
Violates when >= 2% for 5 mins.
Clears when < 2% for 5 mins.
“Latency -
Violates when >= 50 msec for 2 mins.
Clears when < 50 msec for 5 mins.
Notify if violation persists for 2 minutes Core infrastructure team - North America
Data Center - EMEA All EMEA network paths from branch offices to the Data Center “Connectivity” -
Violates immediately when connectivity is lost
Clears immediately when connectivity is restored

“Data loss” -
Violates when >= 2% for 5 mins.
Clears when < 2% for 5 mins.
“Latency -
Violates when >= 150 msec for 2 mins.
Clears when < 150 msec for 5 mins.
Notify if violation persists for 2 minutes Core infrastructure team - EMEA

Configure

  • Monitoring Point deployed to the Data Center

  • Monitoring Points deployed to North American branch offices
  • Path Template Group - “Data Center - NA”
    • Tag - “Service: Data Center - NA”
    • Group - “Data Center - NA”
  • Monitoring Points deployed to EMEA branch offices
  • Path Template Group - “Data Center - EMEA”
    • Tag - “Service: Data Center - EMEA”
    • Group - “Data Center - EMEA”

Notify (Email)

  • Saved List - “Data Center - NA”
    • Group - “Data Center - NA”
  • Notification Profile - “Data Center - NA”
    • Emails to “Core infrastructure team - North America”
    • Network path alert profile Violate/Clear events
    • Saved List - “Data Center - NA”
    • Immediate notification
  • Saved List - “Data Center - EMEA”
    • Group - “Data Center - EMEA”
  • Notification Profile - “Data Center - EMEA”
    • Emails to “Core infrastructure team - EMEA”
    • Network path alert profile Violate/Clear events
    • Saved List - “Data Center - EMEA”
    • Immediate notification

Notify (Event management system)

  • Send Service Quality Events
  • Aggregate by tag - “Service: Data Center - NA”
  • Notify “Core infrastructure team - North America”
  • Aggregate by tag - “Service: Data Center - EMEA”
  • Notify “Core infrastructure team - EMEA”