Handwritten Digit Recognition with OpenCV

Handwritten digit recognition is the ability of a computer to automatically recognize handwritten digits. The article aims to recognize handwritten digits using OpenCV.

Implementation of Handwritten Digit Recognition System

For implementing handwritten digit recognition, we will be using the MNIST dataset and training a Convolutional Neural Network model using Keras and Open CV.

We will install Open-CV and Keras using the following commands:

pip install opencv-python
pip install keras

Step 1: Import Necessary Libraries

We will import OpenCV, Numpy and Keras library. Keras library is imported to define a neural network model for handwritten digit recognition.

import cv2
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.utils import to_categorical

Step 2: Loading MNIST Dataset

We have loaded the MNIST dataset.

# Load the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Step 3: Preprocessing the Dataset

After loading the dataset, we preprocess the images by normalizing their pixel values to range between 0 and 1 and image is reshaped to include a channel dimension, which is necessary for convolutional neural networks. Then, one hot encoding is performed on labels to convert them into categorical format.

# Preprocess the images
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

# Reshape the images and add a channel dimension
train_images = np.expand_dims(train_images, axis=-1)
test_images = np.expand_dims(test_images, axis=-1)

# One-hot encode the labels
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Step 4: Build the Model

Now, we have defined CNN model using Sequential API, The model 2 convolution layers, two max pooling layers. Flatten layer is added to flatten the output of the convolutional layers into 1D array and two dense layers are added for classification.

# Build the CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

Step 5: Compile the Model

We will now compile the model.

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Step 6: Model Training

The compiled model is trained on the training dataset for 5 epochs , using a batch size of 64 , while validating the model’s performance on the testing dataset.

# Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_data=(test_images, test_labels))

Step 7: Loading Image of a Digit and Preprocessing the Image

Once, we have trained the model, we consider an image of a digit and check the predicted output is correct or not. For the prediction, we have read the image as grayscale image, resized, invert the colors, normalize the image and reshape the image array to match the input image expected by the neural network

image = cv2.imread('digit.png', cv2.IMREAD_GRAYSCALE)

# Resize the image to 28x28
image = cv2.resize(image, (28, 28))

# Invert the colors
image = cv2.bitwise_not(image)

# Normalize the image
image = image.astype('float32') / 255

# Reshape the image
image = np.expand_dims(image, axis=0)
image = np.expand_dims(image, axis=-1)

Step 8: Prediction

By using np.argmax on the output of ‘model.predict(image)’, we will obtain the predicted class label for the input image.

# Predict the digit
prediction = np.argmax(model.predict(image))

print("Predicted Digit:", prediction)

Complete Code to Recognize Handwritten Digit

Python
import cv2
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.utils import to_categorical

# Load the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Preprocess the images
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

# Reshape the images
train_images = np.expand_dims(train_images, axis=-1)
test_images = np.expand_dims(test_images, axis=-1)

# One-hot encode the labels
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Build the model
model = Sequential([
    Flatten(input_shape=(28, 28, 1)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_data=(test_images, test_labels))

Output:

Epoch 1/5
938/938 [==============================] - 8s 7ms/step - loss: 0.2969 - accuracy: 0.9172 - val_loss: 0.1569 - val_accuracy: 0.9547
Epoch 2/5
938/938 [==============================] - 10s 11ms/step - loss: 0.1352 - accuracy: 0.9602 - val_loss: 0.1116 - val_accuracy: 0.9673
Epoch 3/5
938/938 [==============================] - 10s 10ms/step - loss: 0.0935 - accuracy: 0.9725 - val_loss: 0.1068 - val_accuracy: 0.9653
Epoch 4/5
938/938 [==============================] - 6s 6ms/step - loss: 0.0712 - accuracy: 0.9788 - val_loss: 0.0809 - val_accuracy: 0.9755
Epoch 5/5
938/938 [==============================] - 4s 4ms/step - loss: 0.0557 - accuracy: 0.9835 - val_loss: 0.0793 - val_accuracy: 0.9750
<keras.src.callbacks.History at 0x7b5b44b0e170>
Python
image = cv2.imread('digit.png', cv2.IMREAD_GRAYSCALE)

# Resize the image to 28x28
image = cv2.resize(image, (28, 28))

# Invert the colors
image = cv2.bitwise_not(image)

# Normalize the image
image = image.astype('float32') / 255

# Reshape the image
image = np.expand_dims(image, axis=0)
image = np.expand_dims(image, axis=-1)

# Predict the digit
prediction = np.argmax(model.predict(image))

print("Predicted Digit:", prediction)

Output:

Predicted Digit: 5

Conclusion