Architecture of the PSPNet

Architecture of PSPNet is little complex which is discussed below:

Input and Feature Extraction
- The process begins with an input image, which undergoes feature extraction using a pretrained ResNet model with a dilated network strategy.
- The dilated network strategy helps extract detailed information from the image, and the final feature map size is 1/8 of the input image.
Pyramid Pooling Module
- The Pyramid Pooling Module is introduced to gather contextual information on top of the extracted feature map.
- A 4-level pyramid is created, covering the entire image, half of the image, and small portions. These levels serve as a global prior for understanding the scene.
- The pooling kernels at different levels capture various contextual scales.
- The information from the pyramid is fused as the global prior and concatenated with the original feature map from the ResNet model.
Final Prediction
- The concatenated information is then processed through a convolutional layer to generate the final prediction map.
- The convolutional layer refines the combined information, providing a detailed prediction of pixel-level scene parsing.

In short, the architecture leverages a pretrained model named ResNet with a dilated network strategy for feature extraction, enhances contextual understanding through Pyramid Pooling Module and efficiently generates pixel-level scene predictions.

PSPNet (Pyramid Scene Parsing Network) for Image Segmentation

Within the intricate landscape of semantic segmentation, the Pyramid Scene Parsing Network or PSPNet has emerged as a formidable architecture by showcasing unparalleled performance in deciphering intricate scenes. In this article, we will discuss about PSPNet and implement it.

Architecture of the PSPNet

PSPNet (Pyramid Scene Parsing Network) for Image Segmentation

Similar Reads