Difference between Global and Contextual outlier

Global Outlier

Datapoint that deviates significantly from the majority of the data.
Global outliers are often detected using statistical methods like the z-score or the interquartile range (IQR) and are based on the distribution of the entire dataset.
It can significantly skew summary statistics and have a more pronounced impact on overall data analysis.

Let’s take an example to understand the difference better.

Suppose you have a dataset of quiz scores for a class, and most students score between 70 and 75. If there is a student who scores 95 on one of the exams, this score is a global outlier because it is unusually high compared to the overall range of scores in the dataset.

Python3

import pandas as pd 
import matplotlib.pyplot as plt 
 
# Sample Dataset 
data = {'Student' : ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'], 
        'Math_Score' : [70, 75, 72, 70, 95, 73, 71]
        }
 
# Outlier 
contextual_outlier_index = 4
 
# Visualize the data
plt.plot(data['Student'],data['Math_Score'], marker="o")
plt.plot(data['Student'][contextual_outlier_index], data['Math_Score'][contextual_outlier_index],  color = 'red', marker='*', label = "Outlier")
 
plt.xlabel('Name of Students')
plt.ylabel('Marks scored in Mathematics')
plt.title('Class Score in Quiz 6')
plt.legend()
plt.grid()
plt.show()

Output:

Scores of the class in Quiz 6, where the score of Eve is an Global Outlier

Contextual outlier

Datapoints that deviates from the norm when specific conditions or context is taken into account.
These are detected by first defining the relevant context or condition within which they are outliers. Then, outlier detection techniques are applied within that context to identify data points that deviate significantly from the norm within that subset.
It provides valuable insight when analyzing under certain conditions.

Example:

Python3

import pandas as pd
import random
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
 
# Generate random data for the dataset
random_data = {
    'Date': [datetime(2023, 1, 1) + timedelta(days=i) for i in range(30)],
    'Temperature': np.sort([round(random.uniform(-10.0, 40.0), 1) for _ in range(30)]
                           )
}
 
# Create the DataFrame
data = pd.DataFrame(random_data)
 
# Mark temperature values greater than 35 as contextual outliers
contextual_outlier_index = data['Temperature'] > 30
 
# Format date to show only date and month
data['Date'] = data['Date'].dt.strftime('%d-%b')
 
# Visualize the data
plt.plot(data['Date'], data['Temperature'], marker="o")
plt.plot(data['Date'][contextual_outlier_index],
         data['Temperature'][contextual_outlier_index],
         'ro-', label="Contextual Outliers\nIn January temperature is always < 30")
 
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.title('Contextual Outliers')
plt.legend()
# Rotate x-axis labels for better visibility
plt.xticks(rotation=90)  
plt.show()

Output:

Contextual Outliers

Detecting contextual outliers typically involves defining the relevant conditions or contexts and then applying outlier detection techniques within those specific subsets of the data. The choice of contextual factors and the methods for identifying and handling contextual outliers will depend on the nature of the data and the goals of the analysis.

Contextual Outliers

Understanding contextual outliers is essential across various fields, including statistics, finance, and anomaly detection, as they offer valuable insights into unique events or conditions that impact the data. By identifying and analyzing these outliers, we gain a deeper understanding of the nuances within our datasets, enabling us to make more informed decisions and draw meaningful conclusions within specific contexts.

This article explores the fascinating world of contextual outliers, shedding light on their significance and how they differ from global outliers. We’ll illustrate the concept with real-world examples, demonstrating how contextual outliers emerge when certain conditions or events come into play.

Tags:

#Geeks Premier League 2023 #AI-ML-DS #Geeks Premier League #Machine Learning #Machine Learning

Contextual outliers

Difference between Global and Contextual outlier

Global Outlier

Let’s take an example to understand the difference better.

Python3

Contextual outlier

Python3

Contextual Outliers

Similar Reads