Confusion Matrix in Machine Learning
In machine learning, classification is the process of categorizing a given set of data into different categories. In machine learning, to measure the performance of the classification model, we use the confusion matrix. Through this tutorial, understand the significance of the confusion matrix.
What is a Confusion Matrix?
A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. It is a means of displaying the number of accurate and inaccurate instances based on the model’s predictions. It is often used to measure the performance of classification models, which aim to predict a categorical label for each input instance.
The matrix displays the number of instances produced by the model on the test data.
- True positives (TP): occur when the model accurately predicts a positive data point.
- True negatives (TN): occur when the model accurately predicts a negative data point.
- False positives (FP): occur when the model predicts a positive data point incorrectly.
- False negatives (FN): occur when the model mispredicts a negative data point.
Why do we need a Confusion Matrix?
When assessing a classification model’s performance, a confusion matrix is essential. It offers a thorough analysis of true positive, true negative, false positive, and false negative predictions, facilitating a more profound comprehension of a model’s recall, accuracy, precision, and overall effectiveness in class distinction. When there is an uneven class distribution in a dataset, this matrix is especially helpful in evaluating a model’s performance beyond basic accuracy metrics.
Let’s understand the confusion matrix with the examples:
Confusion Matrix For binary classification
A 2X2 Confusion matrix is shown below for the image recognition having a Dog image or Not Dog image.
| Actual | ||
---|---|---|---|
Dog | Not Dog | ||
Predicted | Dog | True Positive | False Positive |
Not Dog | False Negative | True Negative |
- True Positive (TP): It is the total counts having both predicted and actual values are Dog.
- True Negative (TN): It is the total counts having both predicted and actual values are Not Dog.
- False Positive (FP): It is the total counts having prediction is Dog while actually Not Dog.
- False Negative (FN): It is the total counts having prediction is Not Dog while actually, it is Dog.
Example for binary classification problems
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Actual | Dog | Dog | Dog | Not Dog | Dog | Not Dog | Dog | Dog | Not Dog | Not Dog |
Predicted | Dog | Not Dog | Dog | Not Dog | Dog | Dog | Dog | Dog | Not Dog | Not Dog |
Result | TP | FN | TP | TN | TP | FP | TP | TP | TN | TN |
- Actual Dog Counts = 6
- Actual Not Dog Counts = 4
- True Positive Counts = 5
- False Positive Counts = 1
- True Negative Counts = 3
- False Negative Counts = 1
| Predicted | ||
---|---|---|---|
Dog | Not Dog | ||
Actual | Dog | True Positive | False Negative |
Not Dog | False Positive | True Negative |
Metrics based on Confusion Matrix Data
1. Accuracy
Accuracy is used to measure the performance of the model. It is the ratio of Total correct instances to the total instances.
[Tex]Accuracy = \frac {TP+TN}{TP+TN+FP+FN} [/Tex]
For the above case:
Accuracy = (5+3)/(5+3+1+1) = 8/10 = 0.8
2. Precision
Precision is a measure of how accurate a model’s positive predictions are. It is defined as the ratio of true positive predictions to the total number of positive predictions made by the model.
[Tex]\text{Precision} = \frac{TP}{TP+FP} [/Tex]
For the above case:
Precision = 5/(5+1) =5/6 = 0.8333
3. Recall
Recall measures the effectiveness of a classification model in identifying all relevant instances from a dataset. It is the ratio of the number of true positive (TP) instances to the sum of true positive and false negative (FN) instances.
[Tex]\text{Recall} = \frac{TP}{TP+FN} [/Tex]
For the above case:
Recall = 5/(5+1) =5/6 = 0.8333
Note: We use precision when we want to minimize false positives, crucial in scenarios like spam email detection where misclassifying a non-spam message as spam is costly. And we use recall when minimizing false negatives is essential, as in medical diagnoses, where identifying all actual positive cases is critical, even if it results in some false positives.
4. F1-Score
F1-score is used to evaluate the overall performance of a classification model. It is the harmonic mean of precision and recall,
[Tex]\text{F1-Score} = \frac {2 \cdot Precision \cdot Recall}{Precision + Recall} [/Tex]
For the above case:
F1-Score: = (2* 0.8333* 0.8333)/( 0.8333+ 0.8333) = 0.8333
We balance precision and recall with the F1-score when a trade-off between minimizing false positives and false negatives is necessary, such as in information retrieval systems.
5. Specificity:
Specificity is another important metric in the evaluation of classification models, particularly in binary classification. It measures the ability of a model to correctly identify negative instances. Specificity is also known as the True Negative Rate.
[Tex]\text{Specificity} = \frac{TN}{TN+FP} [/Tex]
Specificity=3/(1+3)=3/4=0.75
6. Type 1 and Type 2 error
Type 1 error
Type 1 error occurs when the model predicts a positive instance, but it is actually negative. Precision is affected by false positives, as it is the ratio of true positives to the sum of true positives and false positives.
[Tex]\text{Type 1 Error} = \frac{FP}{TN+FP} [/Tex]
For example, in a courtroom scenario, a Type 1 Error, often referred to as a false positive, occurs when the court mistakenly convicts an individual as guilty when, in truth, they are innocent of the alleged crime. This grave error can have profound consequences, leading to the wrongful punishment of an innocent person who did not commit the offense in question. Preventing Type 1 Errors in legal proceedings is paramount to ensuring that justice is accurately served and innocent individuals are protected from unwarranted harm and punishment.
Type 2 error
Type 2 error occurs when the model fails to predict a positive instance. Recall is directly affected by false negatives, as it is the ratio of true positives to the sum of true positives and false negatives.
In the context of medical testing, a Type 2 Error, often known as a false negative, occurs when a diagnostic test fails to detect the presence of a disease in a patient who genuinely has it. The consequences of such an error are significant, as it may result in a delayed diagnosis and subsequent treatment.
[Tex]\text{Type 2 Error} = \frac{FN}{TP+FN} [/Tex]
Precision emphasizes minimizing false positives, while recall focuses on minimizing false negatives.
Implementation of Confusion Matrix for Binary classification using Python
Step 1: Import the necessary libraries
import numpy as np
from sklearn.metrics import confusion_matrix,classification_report
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Create the NumPy array for actual and predicted labels
actual = np.array(
['Dog','Dog','Dog','Not Dog','Dog','Not Dog','Dog','Dog','Not Dog','Not Dog'])
predicted = np.array(
['Dog','Not Dog','Dog','Not Dog','Dog','Dog','Dog','Dog','Not Dog','Not Dog'])
Step 3: Compute the confusion matrix
cm = confusion_matrix(actual,predicted)
Step 4: Plot the confusion matrix with the help of the seaborn heatmap
cm = confusion_matrix(actual,predicted)
sns.heatmap(cm,
annot=True,
fmt='g',
xticklabels=['Dog','Not Dog'],
yticklabels=['Dog','Not Dog'])
plt.xlabel('Prediction',fontsize=13)
plt.ylabel('Actual',fontsize=13)
plt.title('Confusion Matrix',fontsize=17)
plt.show()
Output:
Step 5: Classifications Report based on Confusion Metrics
print(classification_report(actual, predicted))
Output:
precision recall f1-score support Dog 0.83 0.83 0.83 6 Not Dog 0.75 0.75 0.75 4 accuracy 0.80 10 macro avg 0.79 0.79 0.79 10 weighted avg 0.80 0.80 0.80 10
Confusion Matrix For Multi-class Classification
Now, let’s consider there are three classes. A 3X3 Confusion matrix is shown below for the image having three classes.
Here, TP= True Positive , FP= False Positive , FN= False Negative.
Index | Actual | Predicted | TP | FP | FN |
---|---|---|---|---|---|
1 | Cat | Cat | 1 | 0 | 0 |
2 | Dog | Dog | 1 | 0 | 0 |
3 | Horse | Dog | 0 | 1 | 1 |
4 | Cat | Cat | 1 | 0 | 0 |
5 | Dog | Dog | 1 | 0 | 0 |
6 | Cat | Cat | 1 | 0 | 0 |
7 | Dog | Dog | 1 | 0 | 0 |
8 | Horse | Horse | 1 | 0 | 0 |
9 | Horse | Horse | 1 | 0 | 0 |
10 | Cat | Dog | 0 | 1 | 1 |
- True Positives (TP): 8 (1+1+0+1+1+1+1+1+1+0)
- False Positives (FP): 2 (0+0+1+0+0+0+0+0+0+1)
- False Negatives (FN): 2 (0+0+1+0+0+0+0+0+0+1)
A 3X3 Confusion matrix is shown below for three classes.
| Predicted | |||
---|---|---|---|---|
Cat | Dog | Horse | ||
| Cat | TP | FP | FN |
Dog | FP | TP | FN | |
Horse | FN | FP | TP |
Class-wise Summary:
- For Cat: [TP=3,FP=0,FN=1]
- Index 1: True Positive (Cat actual, Cat predicted)
- Index 4: True Positive (Cat actual, Cat predicted)
- Index 6: True Positive (Cat actual, Cat predicted)
- Index 10: False Negative (Cat actual, Dog predicted)
- For Dog: [TP=3,FP=2,FN=0]
- Index 2: True Positive (Dog actual, Dog predicted)
- Index 5: True Positive (Dog actual, Dog predicted)
- Index 7: True Positive (Dog actual, Dog predicted)
- Index 10: False Positive (Cat actual, Dog predicted)
- Index 3: False Positive (Horse actual, Dog predicted)
- For Horse: [TP=2,FP=0,FN=1]
- Index 8: True Positive (Horse actual, Horse predicted)
- Index 9: True Positive (Horse actual, Horse predicted)
- Index 3: False Negative (Horse actual, Dog predicted)
Then, the confusion matrix will be:
| Predicted | |||
---|---|---|---|---|
Cat | Dog | Horse | ||
| Cat | TP(3) | FP(1) | FN(0) |
Dog | FN(0) | TP(3) | FN(1) | |
Horse | FN(1) | FP(1) | TP(2) |
Implementation of Confusion Matrix for Binary classification using Python
Step 1: Import the necessary libraries
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Create the NumPy array for actual and predicted labels
actual = np.array(
['Cat', 'Dog', 'Horse', 'Cat', 'Dog', 'Cat', 'Dog', 'Horse', 'Horse', 'Cat'])
predicted = np.array(
['Cat', 'Dog', 'Dog', 'Cat', 'Dog', 'Cat', 'Dog', 'Horse', 'Horse', 'Dog'])
Step 3: Compute the confusion matrix
cm = confusion_matrix(actual,predicted)
Step 4: Plot the confusion matrix with the help of the seaborn heatmap
cm = confusion_matrix(actual,predicted)
sns.heatmap(cm,
annot=True,
fmt='g',
xticklabels=['Cat', 'Dog', 'Horse'],
yticklabels=['Cat', 'Dog', 'Horse'])
plt.xlabel('Prediction', fontsize=13)
plt.ylabel('Actual', fontsize=13)
plt.title('Confusion Matrix', fontsize=17)
plt.show()
Output:
Step 5: Classifications Report based on Confusion Metrics
print(classification_report(actual, predicted))
Output:
precision recall f1-score support Cat 1.00 0.75 0.86 4 Dog 0.60 1.00 0.75 3 Horse 1.00 0.67 0.80 3 accuracy 0.80 10 macro avg 0.87 0.81 0.80 10 weighted avg 0.88 0.80 0.81 10
Conclusion
To sum up, the confusion matrix is an essential instrument for evaluating the effectiveness of classification models. Insights into a model’s accuracy, precision, recall, and general efficacy in classifying instances are provided by the thorough analysis of true positive, true negative, false positive, and false negative predictions it offers. The article provided examples to illustrate each metric’s computation and discussed its importance. It also demonstrated how confusion matrices can be implemented in Python for binary and multi-class classification scenarios. Practitioners can make well-informed decisions regarding model performance—particularly when dealing with imbalanced class distributions—by comprehending and applying these metrics.
FAQs on Confusion Matrix
Q. How to interpret a confusion matrix?
A confusion matrix summarizes a classification model’s performance, with entries representing true positive, true negative, false positive, and false negative instances, providing insights into model accuracy and errors.
Q. What are the advantages of using Confusion matrix?
The confusion matrix provides a comprehensive evaluation of a classification model’s performance, offering insights into true positives, true negatives, false positives, and false negatives, aiding nuanced analysis beyond basic accuracy.
Q. What are some examples of confusion matrix applications?
Confusion matrices find applications in various fields, including medical diagnosis (identifying true/false positives/negatives for diseases), fraud detection, sentiment analysis, and image recognition accuracy assessment.
Q. What is the confusion matrix diagram?
A confusion matrix diagram visually represents the performance of a classification model. It displays true positive, true negative, false positive, and false negative values in a structured matrix format.
Q. What are the three values of the confusion matrix?
The three values of the confusion matrix are true positive (correctly predicted positive instances), true negative (correctly predicted negative instances), and false positive (incorrectly predicted positive instances).