Segment Image

We will use this function to visualize the results.

Python




#IMAGE_DIR = 'Mask_RCNN/images/'
# Load a random image from the images folder
#file_names = next(os.walk(IMAGE_DIR))[2]
#image = skimage.io.imread(os.path.join(IMAGE_DIR, np.random.choice(file_names)))
image = skimage.io.imread("/content/image.jpg")
plt.imshow(image)
plt.title('Original')
plt.axis('off')
plt.show()
 
# Run detection
results = model.detect([image], verbose=1)
 
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
                            class_names, r['scores'])


Output:

Processing 1 images
image shape: (1282, 1920, 3) min: 0.00000 max: 255.00000 uint8
molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 134.10000 float64
image_metas shape: (1, 93) min: 0.00000 max: 1920.00000 float64
anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32

Mask RCNN image Segmentations

Python




# For converting roi coordinates generated using Mask R-CNN
# to (x1, y1, width, height)
def convertCoords(coords):
  # Input : (y1, x1, y2, x2)
  y1, x1, y2, x2 = coords
  width,height = x2 - x1, y2 - y1
  return (x1, y1, width, height)
 
# Extracting masks and rois from the result
def separateEntities(r):
  masks = [r['masks'][:,:,i].astype("uint8") for i in range(len(r['class_ids']))]
  rects = [(convertCoords(roi)) for roi in r['rois']]
  return masks,rects
 
# Generate masks
masks, rects = separateEntities(r)


Visualizing picture of masks generated using Mask R-CNN

Python3




# Create subplots with 1 row and 2 columns
fig, (ax1, ax2) = plt.subplots(1, 2, figsize= (10,5))
 
# Display the first mask on the left subplot
ax1.imshow(masks[0])
ax1.axis('off')
 
# Display the second mask on the right subplot
ax2.imshow(masks[1])
ax2.axis('off')
 
# Show the subplots
plt.show()


Output:

Generated Mask

Visualizing Segmented Image using Mask R-CNN

Python3




# Create subplots with 1 row and 2 columns
fig, (ax1, ax2) = plt.subplots(1, 2, figsize= (10,5))
 
# Display the first mask on the left subplot
ax1.imshow(cv2.bitwise_and(image,image,mask = (r['masks'][:,:,0].astype("uint8"))))
ax1.axis('off')
 
# Display the second mask on the right subplot
ax2.imshow(cv2.bitwise_and(image,image,mask = (r['masks'][:,:,1].astype("uint8"))))
ax2.axis('off')
 
# Show the subplots
plt.show()


Output:

Segmented Image

The masks generated using Mask-RCNN are not precise, there are visible background details, we will use GrabCut to remove the undesired background by refining the masks.

Applying GrabCut

GrabCut is available in OpenCV as cv2.grabCut(img, mask, rect, bgdModel, fgdModel, iterCount, mode)

Let’s explore the arguments:

  1. img: This argument represents the input image on which we want to perform the GrabCut algorithm.
  2. mask: The mask image is used to define the regions of the image as background, foreground, probable background/foreground, etc. This is achieved by assigning specific flags to different areas of the mask image. The flags used are
    1. cv.GC_BGD (background),
    2. cv.GC_FGD (foreground),
    3. cv.GC_PR_BGD (probable background),
    4. cv.GC_PR_FGD (probable foreground), or you can directly pass 0, 1, 2, or 3 to represent these regions in the image.
  3. rect: This argument represents the coordinates of a rectangle that encloses the foreground object in the format (x, y, w, h). The rectangle is used to initialize the GrabCut algorithm and provide an initial estimate of the foreground and background regions.
  4. bdgModel, fgdModel: These are numpy arrays that are internally used by the GrabCut algorithm. You need to create two zero arrays of type np.float64 and size (1, 65) to store the model parameters.
  5. iterCount: The iterCount specifies the number of iterations the GrabCut algorithm should run. More iterations can lead to better segmentation results, but at the cost of increased computation time.
  6. mode: The mode parameter determines whether we are providing the rectangle coordinates (cv.GC_INIT_WITH_RECT), mask (cv.GC_INIT_WITH_MASK), or a combination of both. This choice determines whether we are initially drawing a rectangle around the object of interest or providing additional touch-up strokes later.

Python




def applyGrabCut(image,mask,rect,iters):
  fgModel = np.zeros((1, 65), dtype="float")
  bgModel = np.zeros((1, 65), dtype="float")
 
  # apply GrabCut using the the bounding box segmentation method
  (mask_grab, bgModel, fgModel) = cv2.grabCut(image, mask, rect, bgModel,
    fgModel, iterCount=iters, mode=cv2.GC_INIT_WITH_RECT)
 
  values = (
    ("Definite Background", cv2.GC_BGD),
    ("Probable Background", cv2.GC_PR_BGD),
    ("Definite Foreground", cv2.GC_FGD),
    ("Probable Foreground", cv2.GC_PR_FGD),
  )
  valueMasks = {}
  for name,value in values:
    valueMasks[name] = (mask_grab == cv2.GC_PR_FGD).astype("uint8") * 255
 
  return valueMasks


Generating Picture of Masks and Segmented image

Python




# Create subplots with 1 row and 2 columns
fig, axs = plt.subplots(2, 2, figsize=(11, 7))
 
# Process the first mask and display the result
vm1 = applyGrabCut(image, masks[0], rects[0], 10)
mother = cv2.bitwise_and(image, image, mask=vm1['Definite Foreground'])
axs[0, 0].imshow(vm1['Probable Foreground'])
axs[0, 0].set_title('Mother')
axs[0, 0].axis('off')
 
# Display the processed first mask
axs[0, 1].imshow(mother)
axs[0, 1].set_title('Mother')
axs[0, 1].axis('off')
 
# Process the second mask and display the result
vm2 = applyGrabCut(image, masks[1], rects[1], 10)
child = cv2.bitwise_and(image, image, mask=vm2['Definite Foreground'])
axs[1, 0].imshow(vm2['Probable Foreground'])
axs[1, 0].set_title('Child')
axs[1, 0].axis('off')
 
# Display the processed second mask
axs[1, 1].imshow(child)
axs[1, 1].set_title('Child')
axs[1, 1].axis('off')
 
# Show the subplots
plt.show()


Output:

Masks & Segmented image

Combining both masks

Python




# Create subplots with 1 row and 2 columns
fig, (ax1, ax2) = plt.subplots(2, 1, figsize= (8,8))
 
 
# Display the combined mask
combined_mask = vm1['Definite Foreground'] | vm2['Definite Foreground']
ax1.imshow(combined_mask)
ax1.axis('off')
ax1.set_title('Combined Mask')
 
# Display the second segmented image
result = mother | child
ax2.imshow(result)
ax2.axis('off')
ax2.set_title('Segmented IMage')
 
 
# Show the subplots
plt.show()


Output:

You can run this implementation on Google Colab Notebook

Image Segmentation with Mask R-CNN, GrabCut, and OpenCV

Image segmentation plays a crucial role in computer vision tasks, enabling machines to understand and analyze visual content at a pixel level. It involves dividing an image into distinct regions or objects, facilitating object recognition, tracking, and scene understanding. In this article, we explore three popular image segmentation techniques: Mask R-CNN, GrabCut, and OpenCV.

Let’s understand, What Image Segmentation with Mask R-CNN and GrabCut are?

What is R-CNN?

R-CNN stands for Region-based Convolutional Neural Network. It is a ground-breaking object detection system that combines object localization and recognition into an end-to-end deep learning framework. 

R-CNN

RNN can be summarised in the following ways.

  • Region Proposal: Initially, a region proposal algorithm (such as selective search) generates a set of potential bounding box regions in an image that are likely to contain objects of interest. These regions serve as candidate object locations.
  • Feature Extraction: Each region proposal is then individually cropped and resized to a fixed size and passed through a pre-trained CNN (such as AlexNet or VGGNet). The CNN extracts high-level features from the region, transforming it into a fixed-length feature vector.
  • Classification and Localization: The feature vector obtained from the CNN is fed into separate fully connected layers. The classification layer predicts the probability of different object classes present in the region, while the regression layer refines the coordinates of the bounding box, improving localization accuracy.
  • Non-Maximum Suppression (NMS): To eliminate redundant detections, non-maximum suppression is applied. It removes overlapping bounding boxes, keeping only the most confident detection for each object instance.

Similar Reads

Mask R-CNN

Mask R-CNN (Mask Region-based Convolutional Neural Network) is a Faster R-CNN object identification framework upgrade that adds the ability to do instance segmentation. It was proposed by Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick in their work “Mask R-CNN” published in 2017....

GrabCut

GrabCut is a classical algorithm of foreground extraction with minimal user interaction. It takes an input image and a user-defined bounding box that encloses the foreground object as its input (here dog is the foreground object). It then generates a refined segmentation mask that separates the foreground object from the background....

Step-by-Step Implementation of Image Segmentation with Mask R-CNN and GrabCut

Prerequisites...

Segment Image

...

Conclusion

...

Mask R-CNN Image Segmentation – FAQs

...