Step-by-Step implementation of One-Class Support Vector Machines in Python

Importing required modules

At first, we will import all required Python libraries like Pandas, NumPy, Matplotlib and SKlearn etc.

Python3




# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import OneClassSVM
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler


Dataset loading and preprocessing

Now we will load the famous credit card dataset. For faster implementation we will use first 50k rows of the dataset. Then we will use Standard Scaler to scale the target column. Then we will separate the features and target variable for further usages.

Python3




credit_data = pd.read_csv('creditcard.csv', nrows=50000) # https://www.kaggle.com/mlg-ulb/creditcardfraud
standardized_data_without_class = StandardScaler().fit_transform(credit_data.loc[:,credit_data.columns!='Class'])
data_50k_new = standardized_data_without_class[0:50000]
data_50k_df = pd.DataFrame(data=data_50k_new)
# Separate features and target variable
X = credit_data.drop(columns=['Class'])
y = credit_data['Class']


Model training

Now we will train the One-class SVM on various hyperparameters which are discussed below:

  • kernel: The choice of the kernel determines the transformation applied to the input data in a higher-dimensional space. Here we have set to default “rbf” which stands for Radial Basis Function, commonly known as the Gaussian kernel. This kernel is suitable for capturing complex, non-linear relationships in the data.
  • degree: We have set it to default value 3. It defines the degree of the polynomial function and is particularly applicable when the kernel is set to “poly.” However, if the patterns of dataset act as polynomial then this parameter automatically handles the kernel as required.
  • gamma: is a crucial parameter that influences the shape of the decision boundary. A smaller gamma value results in a broader decision boundary which makes the model less sensitive to individual data points. Conversely, a larger gamma value leads to a more complex decision boundary, potentially capturing intricate patterns in the data. Fine-tuning gamma is essential for achieving optimal model performance.
  • nu: It represents an upper bound on the fraction of margin errors and support vectors. It allows users to control the balance between precision and recall in the model. A smaller nu value makes the algorithm more lenient, permitting a higher fraction of margin errors and support vectors, which can be useful in scenarios with a considerable number of anomalies.

Python3




clf_svm = OneClassSVM(kernel="rbf", degree=3, gamma=0.1, nu=0.01)
y_predict = clf_svm.fit_predict(data_50k_df)


Model evaluation

Here model evaluation is little different from other traditional ML models. This model can be evaluated by accuracy by not for performance of classification rather for outlier or anomalies detection or logically separation.

Python3




svm_predict = pd.Series(y_predict).replace([-1,1],[1,0])
svm_anomalies = data_50k_df[svm_predict==1]
# Calculate accuracy
accuracy = accuracy_score(y, svm_predict)
print("Accuracy in separating Outlier:", accuracy)


Output:

Accuracy in separating Outlier: 0.9641

Visualizing detected outliers(anomalies)

Now we will plot the inlier and outlier plots between any two features. To do this we will define a small function(plot_OCSVM) which can plot any features with outliers any per our choice. By just changing the integer value during function calling we can visualize them.

Python3




def plot_OCSVM(i):
    plt.scatter(data_50k_df.iloc[:,i],data_50k_df.iloc[:,i+1],c='red',s=40, edgecolor="k")
    plt.scatter(svm_anomalies.iloc[:,i],svm_anomalies.iloc[:,i+1],c='green', s=40, edgecolor="k")
    plt.title("OC-SVM Outlier detection between Feature Pair: V{} and V{}".format(i,i+1))
    plt.xlabel("V{}".format(i))
    plt.ylabel("V{}".format(i+1))
#plot_OCSVM(1)  # chnage the integer value to visualize different pairs of features
plot_OCSVM(2)
#plot_OCSVM(3)


Output:

Detected Outliers or anomalies

So, from the above plot we can clearly see the One-class SVM has sharply separated the normal occurrences with anomalies(potentially outliers) for both the features. We can also visualize other features by calling the function with different values.



Understanding One-Class Support Vector Machines

Support Vector Machine is a popular supervised machine learning algorithm. it is used for both classifications and regression. In this article, we will discuss One-Class Support Vector Machines model.

Similar Reads

One-Class Support Vector Machines

One-Class Support Vector Machine is a special variant of Support Vector Machine that is primarily designed for outlier, anomaly, or novelty detection. The objective behind using one-class SVM is to identify instances that deviate significantly from the norm. Unlike other traditional Machine Learning models, one-class SVM is not used to perform binary or multiclass classification tasks but to detect outliers or novelties within the dataset. Some of the key working principles of one-class SVM is discussed below....

How does One-Class SVM differ from SVM?

SVM and one-class SVMs are like twins but not identical twins, as their usage and principals are different. The three most common differences are discussed below:...

How One-Class SVM Works?

One-Class Support Vector Machines (OCSVM) operate on a fascinating principle inspired by the idea of isolating the norm from the abnormal in a dataset. Unlike traditional Support Vector Machines (SVM), which are adept at handling binary and multiclass classification problems, OCSVM specializes in the nuanced task of anomaly detection. The workflow of OCSVM is discussed below:...

One-Class SVM in Anomaly Detection

In the domain of anomaly detection, One-Class Support Vector Machines (OCSVM) serve as a robust and versatile tool designed to discern normal patterns from irregular occurrences. Notably, OCSVM takes a distinctive approach by training exclusively on the majority class, which represents normal instances, eliminating the need for labeled anomalous data during training. This is particularly advantageous in real-world scenarios where anomalies are rare, making it challenging to obtain sufficient labeled samples. The core principle of OCSVM involves defining a boundary around the normal instances in the feature space, achieved through a specified kernel function and a nuanced parameter termed “nu.” This parameter acts as an upper limit on the fraction of margin errors and support vectors, enabling users to fine-tune the model’s sensitivity to outliers. During the testing phase, instances falling outside this learned boundary are flagged as potential outliers, facilitating efficient anomaly identification. OCSVM’s adaptability extends to various applications, including fraud detection in financial transactions, fault monitoring in industrial systems, and network intrusion detection. Its innate ability to capture complex, non-linear relationships and its focus on the majority class make it a valuable asset in safeguarding systems against unexpected events and ensuring robust anomaly detection across diverse domains. In this article, we will implement credit card anomaly detect using OCSVM further....

Use Cases of One-Class SVM

There are several real-world use-cases of One-Class SVM which are listed below–>...

One-Class SVM Kernel Trick

One-Class SVM supports various kernel options like SVM for optimized performance which are discussed below:...

Step-by-Step implementation of One-Class Support Vector Machines in Python

Importing required modules...