How to use Machine Learning for Anomaly Detection and Conditional Monitoring

  • The main goal of Anomaly Detection analysis is to identify the observations that do not adhere to general patterns considered as normal behavior.
  • Anomaly Detection could be useful in understanding data problems.
  • There are domains where anomaly detection methods are quite effective.
  • Modern ML tools include Isolation Forests and other similar methods, but you need to understand the basic concept for successful implementation
  • Isolation Forests method is unsupervised outlier detection method with interpretable results.

Introduction

Before doing any data analysis, the need to find out any outliers in a dataset arises. These outliers are known as anomalies.

What is Anomaly Detection? Practical use cases.

The main goal of Anomaly Detection analysis is to identify the observations that do not adhere to general patterns considered as normal behavior. For instance, Fig. 1 shows anomalies in the classification and regression problems. We can see that some values deviate from most examples.

  • data errors (measurement inaccuracies, rounding, incorrect writing, etc.);
  • noise data points;
  • hidden patterns in the dataset (fraud or attack requests).
  • Supervised methods;
  • Unsupervised methods.

Unsupervised Anomaly Detection using Isolation Forests

Isolation Forests method is based on the random implementation of the Decision Trees and other results ensemble. Each Decision Tree is built until the train dataset is exhausted. A random feature and a random splitting are selected to build the new branch in the Decision Tree. The algorithm separates normal points from outliers by the mean value of the depths of the Decision Tree leaves. This method is implemented in the scikit-learn library (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html).

Conclusion

In a nutshell, anomaly detection methods could be used in branch applications, e.g., data cleaning from the noise data points and observations mistakes. On the other hand, anomaly detection methods could be helpful in business applications such as Intrusion Detection or Credit Card Fraud Detection Systems. Andrey demonstrates in his project, Machine Learning Model: Python Sklearn & Keras on Education Ecosystem, that the Isolation Forests method is one of the simplest and effective for unsupervised anomaly detection. In addition, this method is implemented in the state-of-the-art library Scikit-learn.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Education Ecosystem (LEDU)

Education Ecosystem (LEDU)

Education Ecosystem (LEDU) is a decentralized project-based learning platform that teaches people how to build tech products, https://www.educationecosystem.com