Implementation: Random Forest for Image Classification Using OpenCV

How are we going to apply random forest for image classification?

The task involves using machine learning techniques, specifically Random Forest, to identify Parkinson’s disease through spiral and wave drawings.
Traditional diagnostic methods struggle with the complexity of these drawings, which vary in style, scale, and quality.
The goal is to develop a reliable classification system that distinguishes between drawings with and without Parkinson’s disease, contributing to early detection and intervention, ultimately improving patient outcomes and quality of life.

Importing the necessary libraries

Python

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import os
import matplotlib.pyplot as plt
from skimage.feature import hog
import random
import cv2

Reading the Images

You can download the dataset from here. Use the following command to unzip the file:

!unzip /content/drawings.zip -d drawing

Python

def display_images(directory, num_images=5):
    fig, axes = plt.subplots(2, num_images, figsize=(15, 5))
    fig.suptitle(f"Images from {directory.split('/')[-1]}", fontsize=16)
     
    for i, label in enumerate(os.listdir(directory)):
        label_dir = os.path.join(directory, label)
        image_files = os.listdir(label_dir)
        random.shuffle(image_files)
        for j in range(num_images):
            image_path = os.path.join(label_dir, image_files[j])
            img = cv2.imread(image_path)
            axes[i, j].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
            axes[i, j].set_title(f"{label} Image {j+1}")
            axes[i, j].axis('off')
    plt.tight_layout()
    plt.show()
 
# Display training images
display_images('/content/drawing/drawings/spiral/training')
display_images('/content/drawing/drawings/wave/training')
 
# Display testing images
display_images('/content/drawing/drawings/spiral/testing')
display_images('/content/drawing/drawings/wave/testing')

Output:

Wave

Spiral

So, this is the dataset we are working with and will help us to classify based on images if the person has Parkinson’s or not.

HOG features are crucial for extracting gradient orientations from images, enabling effective object detection and recognition tasks. They provide a compact representation of an image’s structure, reducing computational complexity and preventing overfitting. HOG features are invariant to lighting and color changes, improving generalization ability of models. They are compatible with machine learning algorithms like Support Vector Machines (SVMs) and Random Forests, allowing images to be represented as feature vectors for training and prediction.

Let’s understand the code in detail:

hog – is the function that calculates the HOG features based on the following parameters:

image – the input image for which we need the hog features
orientations – the number of bins in the histogram
pixels_per_cell – the size of a cell over which gradient histogram is computed
cells_per_block – the number of cells in each block
visualize – whether to return an image of HOG descriptors

By capturing the local gradient information, HOG features can describe the shape and structure of objects in an image, making them useful for tasks like object detection and recognition. The parameters orientations, pixels_per_cell, and cells_per_block control the level of granularity and detail in the computed HOG features.

Python

def extract_hog_features(image):
    # Calculate HOG features
    hog_features = hog(image, orientations=9, pixels_per_cell=(8, 8),
    cells_per_block=(2, 2), visualize=False)
    return hog_features

This function is used to load images from a specific directory, resize them, convert them to grayscale and then extracting HOG features from it.

Here we use OpenCV library to do most of the work as its imread() function reads images from file, resize() function resizes the image to a particular shape and cvtColor() changes the image to grayscale using cv2.COLOR_BGR2GRAY. All this is done in order to reduce the compute time since colored images and images of their original size have much more values and this is one of the ways to reduce those values thus reducing computation time.

Python

def load_and_extract_features(directory):
    X = []
    y = []
    for label in os.listdir(directory):
        label_dir = os.path.join(directory, label)
        for filename in os.listdir(label_dir):
            image_path = os.path.join(label_dir, filename)
            # Load image using OpenCV
            img = cv2.imread(image_path)
            # Resize image to (128, 128)
            img_resized = cv2.resize(img, (128, 128))
            # Convert image to grayscale
            img_gray = cv2.cvtColor(img_resized, cv2.COLOR_BGR2GRAY)
            # Calculate HOG features
            hog_features = extract_hog_features(img_gray)
            X.append(hog_features)
            y.append(label)
    return X, y

Define Random Forest Classifier

This function is responsible for training a RandomForestClassifier with the provided training data as the function parameters.

Inside the function, a Random Forest classifier object (rf_classifier) is initialized using the RandomForestClassifier class from scikit-learn. The classifier is configured with the following parameters:

n_estimators: The number of trees in the forest. Here, it’s set to 1000.
criterion: The function to measure the quality of a split. ‘gini’ is used here, which refers to the Gini impurity.
max_depth: The maximum depth of the trees. Here, it’s set to 5.

The fit method then trains the classifier and finally the trained model is returned.

Python

# Define a function to train a Random Forest classifier
def train_random_forest(X_train, y_train):
    rf_classifier = RandomForestClassifier(n_estimators=1000, criterion='gini', max_depth=5)
    rf_classifier.fit(X_train, y_train)
    return rf_classifier

After we are done with creating a model we load the images and split them into training and testing data sets and create two different models for spiral drawings and wave drawings respectively.

Model Training

Python

# Load and extract features from training data
spiral_train_X, spiral_train_y = load_and_extract_features('/content/drawing/drawings/spiral/training')
wave_train_X, wave_train_y = load_and_extract_features('/content/drawing/drawings/wave/training')
 
# Train Random Forest classifiers
spiral_rf_classifier = train_random_forest(spiral_train_X, spiral_train_y)
wave_rf_classifier = train_random_forest(wave_train_X, wave_train_y)
 
# Load and extract features from testing data
spiral_test_X, spiral_test_y = load_and_extract_features('/content/drawing/drawings/spiral/testing')
wave_test_X, wave_test_y = load_and_extract_features('/content/drawing/drawings/wave/training')

Model Evaluation

Python

spiral_predictions = spiral_rf_classifier.predict(spiral_test_X)
wave_predictions = wave_rf_classifier.predict(wave_test_X)
 
spiral_accuracy = accuracy_score(spiral_test_y, spiral_predictions)
wave_accuracy = accuracy_score(wave_test_y, wave_predictions)
 
print("Spiral Classification Accuracy:", spiral_accuracy)
print("Wave Classification Accuracy:", wave_accuracy)

Output:

Spiral Classification Accuracy: 0.7666666666666667
Wave Classification Accuracy: 0.7333333333333333

Random Forest for Image Classification Using OpenCV

Random Forest is a machine learning algorithm that uses multiple decision trees to achieve precise results in classification and regression tasks. It resembles the process of choosing the best path amidst multiple options. OpenCV, an open-source library for computer vision and machine learning tasks, is used to explore and extract insights from visual data. The goal here is to classify images, particularly focusing on discerning Parkinson’s disease through spiral and wave drawings, using Random Forest and OpenCV’s capabilities.

Implementation: Random Forest for Image Classification Using OpenCV

Importing the necessary libraries

Python

Reading the Images

Python

Python

Python

Define Random Forest Classifier

Python

Model Training

Python

Model Evaluation

Python

Random Forest for Image Classification Using OpenCV

Similar Reads