Interpreting Random Forest classifier Results

Model Performance Metrics for Random Forest classification

To illustrate the interpretation of Random Forest classification results, let’s consider a practical example using the Iris dataset, a common dataset in machine learning.

Step 1: Import Libraries and Load Data

Python

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import label_binarize


# Load the Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)
feature_names = iris.feature_names
target_names = iris.target_names

Step 2: Train the Random Forest Classifier

Split the dataset into training and test sets using train_test_split.
Initialize and train the RandomForestClassifier with 100 trees.

Python

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Random Forest model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

Step 3: Evaluate the Model

1. Utilizing Confusion matrix

Python

# Predict on the test set
y_pred = rf.predict(X_test)

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

Output:

Confusion Matrix:
[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]

Confusion Matrix

2. Using Classification report

Python

# Classification Report
class_report = classification_report(y_test, y_pred, target_names=target_names)
print("Classification Report:")
print(class_report)

Output:

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

3. ROC curve

Python

# Binarize the output
y_test_bin = label_binarize(y_test, classes=[0, 1, 2])
y_pred_prob = rf.predict_proba(X_test)

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(len(target_names)):
    fpr[i], tpr[i], _ = roc_curve(y_test_bin[:, i], y_pred_prob[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Plot ROC curve
plt.figure()
for i in range(len(target_names)):
    plt.plot(fpr[i], tpr[i], lw=2, label=f'ROC curve of class {target_names[i]} (area = {roc_auc[i]:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic for Multi-class')
plt.legend(loc="lower right")
plt.show()

Output:

ROC curve

4. Visualizing Feature Importance

Extract feature importances from the trained model.
Plot a bar chart showing the importance of each feature.

Python

# Feature Importance
importances = rf.feature_importances_
indices = np.argsort(importances)[::-1]

plt.figure()
plt.title("Feature Importances")
plt.bar(range(X.shape[1]), importances[indices], color="r", align="center")
plt.xticks(range(X.shape[1]), [feature_names[i] for i in indices], rotation=90)
plt.xlim([-1, X.shape[1]])
plt.show()

Output:

Feature Importance

Interpreting Random Forest Classification Results

Random Forest is a powerful and versatile machine learning algorithm that excels in both classification and regression tasks. It is an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode of the classes (for classification) or mean prediction (for regression) of the individual trees. Despite its robustness and high accuracy, interpreting the results of a Random Forest model can be challenging due to its complexity.

This article will guide you through the process of interpreting Random Forest classification results, focusing on feature importance, individual predictions, and overall model performance.

Table of Content

Interpreting Random Forest Classification: Feature Importance
Interpreting Individual Predictions
Model Performance Metrics for Random Forest classification
Interpreting Random Forest classifier Results

1. Utilizing Confusion matrix
2. Using Classification report
3. ROC curve
4. Visualizing Feature Importance

Interpreting Random Forest classifier Results

1. Utilizing Confusion matrix

2. Using Classification report

3. ROC curve

4. Visualizing Feature Importance

Interpreting Random Forest Classification Results

Similar Reads