Skip to main content

Alarms Handling and AIOps

BACKGROUNDER

What is Machine Learning?

In the type of machine learning employed by Optima a model is built to achieve two goals:

  1. help reduce uncertainty, and
  2. to make predictions based on prior events.

This type of machine learning is called supervised machine learning. It requires some minor form of user guidance from time to time. This is deliberate to keep the model relevant. Generally the algorithm is trained using a set of input data streams. These streams are then matched against known good responses from human operators. Based on the result, it starts generating reasonable predictions to new alarm incidents. The more data the model trains on, the better the predictions.

What is Artificial Intelligence?

AI is a broad subject. It generally denotes the concept of machines carrying out tasks in a manner that makes them appear ‘smart’. The better the AI, the better it can cope in meaningful ways. Especially when confronted with new, untrained-for and complex situations. AI systems also tend to learn from the result of their own actions.  Over time the aim is to fine-tune and adapt. Ultimately leading to an improved accumulation and application of their own ‘artificial intelligence’.

What is AIOps?

AIOps is a multi-tiered approach to automate and enhance certain IT operations. It employs analytics and machine learning. Sifting through large data sets, its aim is to detect and react to a patterns or structures. These may be deeply embedded in the data stream. Thus they might not be immediately clear to a casual observer. Or even trained human operators monitoring the stream in real-time. The ultimate goal of AIOps is to react to issues in real time. AIOps relies on human oversight. But it relieves the operators from error-prone and time consuming data filtering tasks.

The State Of AIOps Research At Optima

Understanding the origin and category of alarms received by an network management system (NMS) is key. Only then can network operators stop just trying to cope with the incoming flood of alarms. Once liberated, they are able to manage the real causes of those alarm notifications.

Optima’s AIOps (Artificial Intelligence in IT Operations) helps human operators maintain their focus. Operators thus once again feel empowered. Free to concentrate on formulating timely and proper alarm responses.

As a first step, Optima’s AIOps attempts to classify alarms depending on their type, make-up and occurrence pattern:

Quick Alarm Bursts

Repeated short term oscillations. The alarm source rapidly alternates between the normal and alarm states. This creates a flurry of alarm messages in a relatively short amount of time. Usually caused by:

  • poor alarm generation control,
  • system noise, or
  • threshold/alarm configuration issues.

Slow, Recurring Alarms

Longer term alarms that keep repeating with a fair degree of predictability. These types of alarms can result from:

  • inherent problems arising out of depletion of resources over time,
  • systems running for long times without proper maintenance,
  • systems always ending up in the same kind of unstable states, or
  • hidden bugs that only surface after systems have been operational for some time.

Almost Permanent Alarms

Annoying alarms that are active for extended periods of time. Or those that return to an active alarm state right after being manually cleared. These are usually a sign of:

  • alarm reporting incompatibilities,
  • missed maintenance, or
  • deeper configuration issues.

Alarm Storms

These are the most disruptive types of alarms. Alarm floods may:

  • drown out valuable root cause indicators,
  • prevent other functional alarms from getting through, or
  • cause network surveillance outages for extended periods of time.

Most often they are caused by:

  • faulty software,
  • faulty equipment,
  • missing configuration settings,
  • race conditions (A depending on B depending on A),
  • incompatible hardware, or
  • unrecognized operational states.

Summary

Overall, Optima’s AIOps could be described to go through a series of processing stages:

The task at hand is to follow a specific schema to categorize and classify alarms.

Once completed, it is crucial to denoise the alarms by identifying and filtering out unnecessary ones to ensure that the alarm volume is manageable.

The next step involves correlation in which matching notifications are found, and the appropriate start and end of individual alarm sources are identified.

Grouping also plays a significant role as the system learns to cluster alarms and improve handling, making it one of the critical steps in machine learning.

Afterward, it is significant to extract the proper order of alarms while applying the gained knowledge to build cause-and-effect sequences.

The system also learns to look for familiar patterns and identify anomalies, which can be used in future predictions.

Machine learning power is applied to future events, ultimately leading to a more efficient and effective system.

Decades Of Experience And Expertise

As a company, Optima not only manufactures the RTUs which get deployed at the far edges of the network. We also make the opposite end of the spectrum: the central NMS. These server based network management systems are the hot spot. Where all the daily battles are waged. Systems that need to be able to cope with the constant onslaught of alarm notifications. Alarms which are flooding in from every corner of the network. Remorseless – 24/7.

As a company, we have learned a lot from being in this unique situation. We recognized this very early on: We are not only responsible for the generation of the alarm notifications. We also need to know how to take care of them at the receiving end. Years of research and improvements to our portfolio resulted in a rich set of tools – and a deep understanding of what it takes to make battle hardened products.

Our Commitment

We pledge to strive for the best. To employ the latest hardware and software technologies. To continuously improve our entire product lineup. So our customers can benefit from:

  • reduced workloads,
  • faster response times, and
  • a deeper knowledge and understanding of issues affecting their networks.

Optima’s AIOps for Alarms Handling. The right approach for managing network wide alarm monitoring solutions.

Ralf Doewich

Optima Tele.com, Inc.

Leave a Reply

Your email address will not be published. Required fields are marked *