Step-by-Step Implementation of Image Segmentation with Mask R-CNN and GrabCut

Prerequisites

Cloning Repository for Mask RCNN

!git clone https://github.com/akTwelve/Mask_RCNN.git

I am using this image in the implementation (Image credits: Hu Chen)

# Sample image
!wget -O image.jpg https://github.com/Anant-mishra1729/Deep-learning/blob/main/Mask_RCNN/image.jpg?raw=true

Step 1: Importing required libraries and setting up the development environment

Python

import os
import sys
import skimage.io
import matplotlib.pyplot as plt
import cv2
import time
import numpy as np
%matplotlib inline
import tensorflow as tf
 
# Root directory of the project
ROOT_DIR = "Mask_RCNN"
 
# Import maskrcnn (mrcnn folder) as module
sys.path.append(ROOT_DIR)  
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
 
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
    
# Sample image path
IMAGE_PATH = "image.jpg"

Loading Model Configuration

We will use a pre-trained model, pre-trained on the MS-COCO dataset.

After downloading the pre-trained model, we will use class InferenceConfig having parent class as coco.CocoConfig to generate a configuration file for the model… with GPU_COUNT = 1 and IMAGES_PER_GPU = 1, we will have a BATCH SIZE = GPU_COUNT * IMAGES_PER_GPU = 1 .i.e we will provide one image at a time to GPU for inference.

Python

# For importing coco folder as module
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))
import coco
 
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
 
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)
 
# Loading the model configuration
class InferenceConfig(coco.CocoConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
 
config = InferenceConfig()
config.display()

Output:

Downloading pretrained model to Mask_RCNN/mask_rcnn_coco.h5 ...
... done downloading pretrained model!
Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                93
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    81
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001

Creating a model for inference

model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
- The mode the parameter is set to "inference", indicating that the model will be used for inference (i.e., making predictions).
- The model_dir parameter specifies the directory where the model weights and logs will be saved.
- The config parameter is an instance of a configuration class (config) that determines the model’s architecture and behavior.
tf.keras.Model.load_weights(model.keras_model, COCO_MODEL_PATH, by_name=True)
- model.keras_model accesses the underlying Keras model instance within the Mask R-CNN model.
- by_name=True argument indicates that the weights should be loaded by matching layer names, allowing for partial weight loading and transfer learning.

Python

# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
 
tf.keras.Model.load_weights(model.keras_model, COCO_MODEL_PATH, by_name=True)

Specifying class names

These class names will be used while visualizing the segmented objects, it performs object classification and provides class IDs as integer values to identify each class. These IDs are nothing but indices of this list.
For example: If the model detects and segments a car, then it will return 7 instead of train, we will use this list to determine the object class as train.

Python

class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']

Inference using Mask R-CNN

Using skimage.io.imread() we will load the image.jpg

model.detect([image], verbose = 1) will run inference on the image and will return a dictionary with the following keys

rois : (ymin, xmin, ymax, xmax) coordinates of bounding box
masks : generated masks
class_ids : The resulting class ids
scores : Probability of class being correct.

visualize.display_instances(image, r[‘rois’], r[‘masks’], r[‘class_ids’], class_names, r[‘scores’])

Image Segmentation with Mask R-CNN, GrabCut, and OpenCV

Image segmentation plays a crucial role in computer vision tasks, enabling machines to understand and analyze visual content at a pixel level. It involves dividing an image into distinct regions or objects, facilitating object recognition, tracking, and scene understanding. In this article, we explore three popular image segmentation techniques: Mask R-CNN, GrabCut, and OpenCV.

Let’s understand, What Image Segmentation with Mask R-CNN and GrabCut are?

What is R-CNN?

R-CNN stands for Region-based Convolutional Neural Network. It is a ground-breaking object detection system that combines object localization and recognition into an end-to-end deep learning framework.

R-CNN

RNN can be summarised in the following ways.

Region Proposal: Initially, a region proposal algorithm (such as selective search) generates a set of potential bounding box regions in an image that are likely to contain objects of interest. These regions serve as candidate object locations.
Feature Extraction: Each region proposal is then individually cropped and resized to a fixed size and passed through a pre-trained CNN (such as AlexNet or VGGNet). The CNN extracts high-level features from the region, transforming it into a fixed-length feature vector.
Classification and Localization: The feature vector obtained from the CNN is fed into separate fully connected layers. The classification layer predicts the probability of different object classes present in the region, while the regression layer refines the coordinates of the bounding box, improving localization accuracy.
Non-Maximum Suppression (NMS): To eliminate redundant detections, non-maximum suppression is applied. It removes overlapping bounding boxes, keeping only the most confident detection for each object instance.

Tags:

#Computer Vision Projects #Deep-Learning #AI-ML-DS #Computer Vision

GrabCut

Segment Image

Step-by-Step Implementation of Image Segmentation with Mask R-CNN and GrabCut

Prerequisites

Step 1: Importing required libraries and setting up the development environment

Python

Loading Model Configuration

Python

Creating a model for inference

Python

Specifying class names

Python

Inference using Mask R-CNN

Image Segmentation with Mask R-CNN, GrabCut, and OpenCV

What is R-CNN?

Similar Reads