Step-by-Step implementation of One-Class Support Vector Machines in Python

Importing required modules

At first, we will import all required Python libraries like Pandas, NumPy, Matplotlib and SKlearn etc.

Python3

# Import necessary libraries 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split 
from sklearn.svm import OneClassSVM 
from sklearn.metrics import accuracy_score 
from sklearn.preprocessing import StandardScaler

Dataset loading and preprocessing

Now we will load the famous credit card dataset. For faster implementation we will use first 50k rows of the dataset. Then we will use Standard Scaler to scale the target column. Then we will separate the features and target variable for further usages.

Python3

credit_data = pd.read_csv('creditcard.csv', nrows=50000) # https://www.kaggle.com/mlg-ulb/creditcardfraud 
standardized_data_without_class = StandardScaler().fit_transform(credit_data.loc[:,credit_data.columns!='Class']) 
data_50k_new = standardized_data_without_class[0:50000] 
data_50k_df = pd.DataFrame(data=data_50k_new) 
# Separate features and target variable 
X = credit_data.drop(columns=['Class']) 
y = credit_data['Class'] 

Model training

Now we will train the One-class SVM on various hyperparameters which are discussed below:

kernel: The choice of the kernel determines the transformation applied to the input data in a higher-dimensional space. Here we have set to default “rbf” which stands for Radial Basis Function, commonly known as the Gaussian kernel. This kernel is suitable for capturing complex, non-linear relationships in the data.
degree: We have set it to default value 3. It defines the degree of the polynomial function and is particularly applicable when the kernel is set to “poly.” However, if the patterns of dataset act as polynomial then this parameter automatically handles the kernel as required.
gamma: is a crucial parameter that influences the shape of the decision boundary. A smaller gamma value results in a broader decision boundary which makes the model less sensitive to individual data points. Conversely, a larger gamma value leads to a more complex decision boundary, potentially capturing intricate patterns in the data. Fine-tuning gamma is essential for achieving optimal model performance.
nu: It represents an upper bound on the fraction of margin errors and support vectors. It allows users to control the balance between precision and recall in the model. A smaller nu value makes the algorithm more lenient, permitting a higher fraction of margin errors and support vectors, which can be useful in scenarios with a considerable number of anomalies.

Python3

clf_svm = OneClassSVM(kernel="rbf", degree=3, gamma=0.1, nu=0.01) 
y_predict = clf_svm.fit_predict(data_50k_df)

Model evaluation

Here model evaluation is little different from other traditional ML models. This model can be evaluated by accuracy by not for performance of classification rather for outlier or anomalies detection or logically separation.

Python3

svm_predict = pd.Series(y_predict).replace([-1,1],[1,0]) 
svm_anomalies = data_50k_df[svm_predict==1] 
# Calculate accuracy 
accuracy = accuracy_score(y, svm_predict) 
print("Accuracy in separating Outlier:", accuracy)

Output:

Accuracy in separating Outlier: 0.9641

Visualizing detected outliers(anomalies)

Now we will plot the inlier and outlier plots between any two features. To do this we will define a small function(plot_OCSVM) which can plot any features with outliers any per our choice. By just changing the integer value during function calling we can visualize them.

Python3

def plot_OCSVM(i): 
    plt.scatter(data_50k_df.iloc[:,i],data_50k_df.iloc[:,i+1],c='red',s=40, edgecolor="k") 
    plt.scatter(svm_anomalies.iloc[:,i],svm_anomalies.iloc[:,i+1],c='green', s=40, edgecolor="k") 
    plt.title("OC-SVM Outlier detection between Feature Pair: V{} and V{}".format(i,i+1)) 
    plt.xlabel("V{}".format(i)) 
    plt.ylabel("V{}".format(i+1)) 
#plot_OCSVM(1)  # chnage the integer value to visualize different pairs of features 
plot_OCSVM(2) 
#plot_OCSVM(3) 

Output:

Detected Outliers or anomalies

So, from the above plot we can clearly see the One-class SVM has sharply separated the normal occurrences with anomalies(potentially outliers) for both the features. We can also visualize other features by calling the function with different values.

Understanding One-Class Support Vector Machines

Support Vector Machine is a popular supervised machine learning algorithm. it is used for both classifications and regression. In this article, we will discuss One-Class Support Vector Machines model.

Step-by-Step implementation of One-Class Support Vector Machines in Python

Importing required modules

Python3

Dataset loading and preprocessing

Python3

Model training

Python3

Model evaluation

Python3

Visualizing detected outliers(anomalies)

Python3

Understanding One-Class Support Vector Machines

Similar Reads