Outlier Detection Techniques in Logistic Regression

Detecting and appropriately managing outliers is crucial for ensuring the accuracy and reliability of logistic regression analyses. Two common approaches for detecting outliers in logistic regression are:

Single-case deletion approach

The single-case deletion approach is one of the techniques of outlier detection, which involves removing individual outliers from the dataset one at a time. However, it suffers from two limitations in the presence of multiple outliers:

Masking: Masking occurs when the observation’s influence in the dataset is not immediately evident until one or more other observations are removed. Masking is an outlier effect that is hidden or masked by the presence of other outliers or extreme values in the dataset. This phenomenon can occur in the outlier deletion methods where the outliers are identified and removed sequentially such as the single-case deletion approach.
Swamping: Swamping occurs when the data points in the dataset are not outliers and are identified incorrectly due to the other unusual observations on the model. When the outlier detection methods are overly sensitive to extreme values or when the removal of genuine outliers leads to misclassification of other data points, the presence of swamping might be the reason for such activities.

Multiple-case Deletion approach

One-by-one or sequential detection of outliers in a single-case deletion approach may fall into the trap of masking and swamping effects. We can use a multiple-case deletion approach instead of a single-case deletion approach to overcome this issue. Even in the presence of masking effects, the multiple-case deletion approach aims to identify the multiple influential observations in the dataset.

There are two stages involved in this deletion approach:

A clean subset of data: We obtain the approximate clean subset of data that is said to be free from influential observation. This can be done when we implement a multiple-case deletion technique which in turn helps to remove the multiple influential observations at once rather than removing the outliers one by one like the single-case deletion technique.
Enhancing efficiency: We refine the detection rule to improve the efficiency of the outlier detection, which can help in accurately identifying the influential observations.

The multiple-case deletion approach generally leads to a more accurate identification of outliers compared to the single-case approach.

Outlier Detection in Logistic Regression

Outliers, data points that deviate significantly from the rest, can significantly impact the performance of logistic regression models. In this article we will explore various techniques for detecting and handling outliers in Logistic regression.

Tags:

#AI-ML-DS With Python #AI-ML-DS #Machine Learning #Machine Learning

What are Outliers?

Handling Outliers

Outlier Detection Techniques in Logistic Regression

Single-case deletion approach

Multiple-case Deletion approach

Outlier Detection in Logistic Regression

Similar Reads