Fundamentals of Batch Normalization
In this section, we are going to discuss the steps taken to perform batch normalization.
Step 1: Compute the Mean and Variance of Mini-Batches
For mini-batch of activations [Tex]x_1,x_2,…,x_m[/Tex], the mean [Tex]μ_B[/Tex] and variance [Tex]\sigma_{B}^{2}[/Tex] of the mini-batch are computed.
Step 2: Normalization
Each activation [Tex]x_i [/Tex]is normalized using the computed mean and variance of the mini-batch.
The normalization process subtracts the mean [Tex]\mu_B[/Tex] from each activation and divides by the square root of the variance [Tex]\sigma_{B}^{2}[/Tex], ensuring that the normalized activations have a zero mean and unit variance.
Additionally, a small constant [Tex]\epsilon[/Tex] is added to the denominator for numerical stability, particularly to prevent division by zero.
[Tex]\widehat{x_i} = \frac{x_i – \mu_{B}}{\sqrt{\sigma_{B}^{2} +\epsilon}}[/Tex]
Step 3: Scale and Shift the Normalized Activations
The normalized activations [Tex]x^i[/Tex] are then scaled by a learnable parameter [Tex]\gamma[/Tex] and shifted by another learnable parameter [Tex]\beta[/Tex]. These parameters allow the model to learn the optimal scaling and shifting of the normalized activations, giving the network additional flexibility.
[Tex]y_i = \gamma \widehat{x_i} + \beta[/Tex]
What is Batch Normalization In Deep Learning?
Internal covariate shift is a major challenge encountered while training deep learning models. Batch normalization was introduced to address this issue. In this article, we are going to learn the fundamentals and need of Batch normalization. We are also going to perform batch normalization.
Table of Content
- What is Batch Normalization?
- Need for Batch Normalization
- Fundamentals of Batch Normalization
- Batch Normalization in TensorFlow
- Batch Normalization in PyTorch
- Benefits of Batch Normalization
- Conclusion