Complement Naive Bayes (CNB) Algorithm
Imbalanced datasets
How CNB works:
complement
- For each class calculate the probability of the given instance not belonging to it.
- After calculation for all the classes, we check all the calculated values and select the smallest value.
- The smallest value (lowest probability) is selected because it is the lowest probability that it is NOT that particular class. This implies that it has the highest probability to actually belong to that class. So this class is selected.
Note:
Apples and Bananas
Sentence Number | Round | Red | Long | Yellow | Soft | Class |
1 | 2 | 1 | 1 | 0 | 0 | Apples |
2 | 1 | 1 | 0 | 9 | 5 | Bananas |
3 | 2 | 1 | 0 | 0 | 1 | Apples |
Bayes’ Theorem
i
Round | Red | Long | Yellow | Soft | Class |
1 | 1 | 0 | 0 | 1 | ? |
Apples
Complement
When to use CNB?
- When the dataset is imbalanced: If the dataset on which classification is to be done is imbalanced, Multinomial and Gaussian Naive Bayes may give a low accuracy. However, Complement Naive Bayes will perform quite well and will give relatively higher accuracy.
- For text classification tasks: Complement Naive Bayes outperforms both Gaussian Naive Bayes and Multinomial Naive Bayes in text classification tasks.
Implementation of CNB in Python:
this
Code:
# Import required modules from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification_report from sklearn.naive_bayes import ComplementNB # Loading the dataset dataset = load_wine() X = dataset.data y = dataset.target # Splitting the data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15 , random_state = 42 ) # Creating and training the Complement Naive Bayes Classifier classifier = ComplementNB() classifier.fit(X_train, y_train) # Evaluating the classifier prediction = classifier.predict(X_test) prediction_train = classifier.predict(X_train) print (f "Training Set Accuracy : {accuracy_score(y_train, prediction_train) * 100} %\n" ) print (f "Test Set Accuracy : {accuracy_score(y_test, prediction) * 100} % \n\n" ) print (f "Classifier Report : \n\n {classification_report(y_test, prediction)}" ) |
OUTPUT
Training Set Accuracy : 65.56291390728477 % Test Set Accuracy : 66.66666666666666 % Classifier Report : precision recall f1-score support 0 0.64 1.00 0.78 9 1 0.67 0.73 0.70 11 2 1.00 0.14 0.25 7 accuracy 0.67 27 macro avg 0.77 0.62 0.58 27 weighted avg 0.75 0.67 0.61 27
Conclusion:
References:
- scikit-learn documentation.