Importing Libraries and Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
Matplotlib/Seaborn – This library is used to draw visualizations.
Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
TensorFlow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3

import pandas as pd 
import tensorflow as tf 
from keras.layers import Input, Dense 
from keras.models import Model 
from sklearn.metrics import precision_recall_fscore_support 
import matplotlib.pyplot as plt 

In this step, we import the libraries required for the implementation of the anomaly detection algorithm using an autoencoder. We import pandas for reading and manipulating the dataset, TensorFlow and Keras for building the autoencoder model, and scikit-learn for calculating the precision, recall, and F1 score.

Python3

data = pd.read_csv( 
    'https://raw.githubusercontent.com/numenta'
    '/NAB/master/data/realKnownCause/ambient'
    '_temperature_system_failure.csv') 
  
# Exclude datetime column 
data_values = data.drop('timestamp', 
                        axis=1).values 
  
# Convert data to float type 
data_values = data_values.astype('float32') 
  
# Create new dataframe with converted values 
data_converted = pd.DataFrame(data_values, 
                              columns=data.columns[1:]) 
  
# Add back datetime column 
data_converted.insert(0, 'timestamp', 
                      data['timestamp']) 

We load a dataset called “ambient_temperature_system_failure.csv” from the Numenta Anomaly Benchmark (NAB) dataset, which contains time-series data of ambient temperature readings from a system that experienced a failure.

The panda’s library is used to read the CSV file from a remote location on GitHub and store it in a variable called “data”.

Now, the code drops the “timestamp” column from the “data” variable, since it is not needed for data analysis purposes. The remaining columns are stored in a variable called “data_values”.
Then, the “data_values” are converted to the “float32” data type to reduce memory usage, and a new pandas DataFrame called “data_converted” is created with the converted data. The columns of “data_converted” are labeled with the original column names from “data”, except for the “timestamp” column that was previously dropped.
Finally, the code adds the “timestamp” column back to “data_converted” at the beginning using the “insert()” method. The resulting DataFrame “data_converted” has the same data as “data” but without the unnecessary “timestamp” column, and the data is in a format that can be used for analysis and visualization.

Python3

data_converted = data_converted.dropna()

We remove any missing or NaN values from the dataset.

Anomaly Detection in Time Series Data

Anomaly detection is the process of identifying data points or patterns in a dataset that deviate significantly from the norm. A time series is a collection of data points gathered over some time. Anomaly detection in time series data may be helpful in various industries, including manufacturing, healthcare, and finance. Anomaly detection in time series data may be accomplished using unsupervised learning approaches like clustering, PCA (Principal Component Analysis), and autoencoders.

Tags:

#Miscellaneous #python #Python-Miscellaneous #AI-ML-DS #Data Science #Machine Learning #Machine Learning #python

Time Series Data and Anamoly Detection

Anomaly Detection using Autoencoder

Importing Libraries and Dataset

Python3

Python3

Python3

Anomaly Detection in Time Series Data

Similar Reads