Complement Naive Bayes (CNB) Algorithm

Imbalanced datasets
How CNB works:
complement
  • For each class calculate the probability of the given instance not belonging to it.
  • After calculation for all the classes, we check all the calculated values and select the smallest value.
  • The smallest value (lowest probability) is selected because it is the lowest probability that it is NOT that particular class. This implies that it has the highest probability to actually belong to that class. So this class is selected.
Note:
Apples and Bananas
Sentence NumberRoundRedLongYellowSoftClass
121100Apples
211095Bananas
321001Apples
Bayes’ Theorem
i
RoundRedLongYellowSoftClass
11001?
Apples
Complement
When to use CNB?
  • When the dataset is imbalanced: If the dataset on which classification is to be done is imbalanced, Multinomial and Gaussian Naive Bayes may give a low accuracy. However, Complement Naive Bayes will perform quite well and will give relatively higher accuracy.
  • For text classification tasks: Complement Naive Bayes outperforms both Gaussian Naive Bayes and Multinomial Naive Bayes in text classification tasks.
Implementation of CNB in Python:
this
Code:
# Import required modules
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.naive_bayes import ComplementNB
  
# Loading the dataset 
dataset = load_wine()
X = dataset.data
y = dataset.target
  
# Splitting the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15, random_state = 42)
  
# Creating and training the Complement Naive Bayes Classifier
classifier = ComplementNB()
classifier.fit(X_train, y_train)
  
# Evaluating the classifier
prediction = classifier.predict(X_test)
prediction_train = classifier.predict(X_train)
  
print(f"Training Set Accuracy : {accuracy_score(y_train, prediction_train) * 100} %\n")
print(f"Test Set Accuracy : {accuracy_score(y_test, prediction) * 100} % \n\n")
print(f"Classifier Report : \n\n {classification_report(y_test, prediction)}")

                    
OUTPUT
Training Set Accuracy : 65.56291390728477 %

Test Set Accuracy : 66.66666666666666 % 


Classifier Report : 

               precision    recall  f1-score   support

           0       0.64      1.00      0.78         9
           1       0.67      0.73      0.70        11
           2       1.00      0.14      0.25         7

    accuracy                           0.67        27
   macro avg       0.77      0.62      0.58        27
weighted avg       0.75      0.67      0.61        27

Conclusion:
References:
  • scikit-learn documentation.