What is Batch Normalization in CNN?

Batch Normalization is a technique used to improve the training and performance of neural networks, particularly CNNs. The article aims to provide an overview of batch normalization in CNNs along with the implementation in PyTorch and TensorFlow.

Table of Content

  • Overview of Batch Normalization
  • Need for Batch Normalization in CNN model
  • How Does Batch Normalization Work in CNN?
    • 1. Normalization within Mini-Batch
    • 2. Scaling and Shifting
    • 3. Learnable Parameters
    • 4. Applying Batch Normalization
    • 5. Training and Inference
  • Applying Batch Normalization in CNN model using TensorFlow
  • Applying Batch Normalization in CNN model using PyTorch
  • Advantages of Batch Normalization in CNN

Overview of Batch Normalization

Batch normalization is a technique to improve the training of deep neural networks by stabilizing and accelerating the learning process. Introduced by Sergey Ioffe and Christian Szegedy in 2015, it addresses the issue known as “internal covariate shift” where the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change.

Need for Batch Normalization in CNN model

Batch Normalization in CNN addresses several challenges encountered during training. There are following reasons highlight the need for batch normalization in CNN:

  1. Addressing Internal Covariate Shift: Internal covariate shift occurs when the distribution of network activations changes as parameters are updated during training. Batch normalization addresses this by normalizing the activations in each layer, maintaining consistent mean and variance across inputs throughout training. This stabilizes training and speeds up convergence.
  2. Improving Gradient Flow: Batch normalization contributes to stabilizing the gradient flow during backpropagation by reducing the reliance of gradients on parameter scales. As a result, training becomes faster and more stable, enabling effective training of deeper networks without facing issues like vanishing or exploding gradients.
  3. Regularization Effect: During training, batch normalization introduces noise to the network activations, serving as a regularization technique. This noise aids in averting overfitting by injecting randomness and decreasing the network’s sensitivity to minor fluctuations in the input data.

How Does Batch Normalization Work in CNN?

Batch normalization works in convolutional neural networks (CNNs) by normalizing the activations of each layer across mini-batch during training. The working is discussed below:

1. Normalization within Mini-Batch

In a CNN, each layer receives inputs from multiple channels (feature maps) and processes them through convolutional filters. Batch Normalization operates on each feature map separately, normalizing the activations across the mini-batch.

During training, batch normalization (BN) standardizes the activations of each layer by subtracting the mean and dividing by the standard deviation of each mini-batch.

  • Mean Calculation:
  • Variance Calculation:
  • Normalization:

2. Scaling and Shifting

After normalization, BN adjusts the normalized activations using learned scaling and shifting parameters. These parameters enable the network to adaptively scale and shift the activations, thereby maintaining the network’s ability to represent complex patterns in the data.

  • Scaling:
  • Shifting:

3. Learnable Parameters

The parameters and are learned during training through backpropagation. This allows the network to adaptively adjust the normalization and ensure that the activations are in the appropriate range for learning.

4. Applying Batch Normalization

Batch Normalization is typically applied after the convolutional and activation layers in a CNN, before passing the outputs to the next layer. It can also be applied before or after the activation function, depending on the network architecture.

5. Training and Inference

During training, Batch Normalization calculates the mean and variance of each mini-batch. During inference (testing), it uses the aggregated mean and variance calculated during training to normalize the activations. This ensures consistent normalization between training and inference.

Applying Batch Normalization in CNN model using TensorFlow

In this section, we have provided a pseudo code, to illustrate how can we apply batch normalization in CNN model using TensorFlow. For applying batch normalization layers after the convolutional layers and before the activation functions, we use ‘tf.keras.layers.BatchNormalization()’.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, BatchNormalization

# Build the CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    BatchNormalization(),  # Add batch normalization layer
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    BatchNormalization(),  # Add batch normalization layer
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

Applying Batch Normalization in CNN model using PyTorch

In PyTorch, we can easily apply batch normalization in a CNN model.

For applying BN in 1D Convolutional Neural Network model, we use ‘nn.BatchNorm1d()’.

import torch
import torch.nn as nn

class CNN1D(nn.Module):
    def __init__(self):
        super(CNN1D, self).__init__()
        self.conv1 = nn.Conv1d(3, 16, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm1d(16)
        self.conv2 = nn.Conv1d(16, 32, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm1d(32)
        self.fc = nn.Linear(32 * 28, 10)  # Example fully connected layer

    def forward(self, x):
        x = torch.relu(self.bn1(self.conv1(x)))
        x = torch.relu(self.bn2(self.conv2(x)))
        x = x.view(-1, 32 * 28)  # Reshape for fully connected layer
        x = self.fc(x)
        return x

# Instantiate the model
model = CNN1D()

For applying Batch Normalization in 2D Convolutional Neural Network model, we use ‘nn.BatchNorm2d()’.

import torch
import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(16)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(32)
        self.fc = nn.Linear(32 * 28 * 28, 10)  # Example fully connected layer

    def forward(self, x):
        x = torch.relu(self.bn1(self.conv1(x)))
        x = torch.relu(self.bn2(self.conv2(x)))
        x = x.view(-1, 32 * 28 * 28)  # Reshape for fully connected layer
        x = self.fc(x)
        return x

# Instantiate the model
model = CNN()

Advantages of Batch Normalization in CNN

  • Fast Convergence
  • Improved generalization
  • reduced sensitivity
  • Higher learning rates
  • Improvement in model accuracy

Conclusion

In conclusion, batch normalization stands as a pivotal technique in enhancing the training and performance of convolutional neural networks (CNNs). Its implementation addresses critical challenges such as internal covariate shift, thereby stabilizing training, accelerating convergence, and facilitating deeper network architectures.