What is Sliding Window Attention?
A sliding window is an attention pattern based on parsing a m x n image with a fixed step size to capture the target image(s) efficiently. It is used to improve the efficiency of the longformer. On comparing the sliding window attention (Fig below) model to a full connection model (Fig above), it can easily be observed that this method is much more efficient than the former.
There are two types of sliding window attention models:
- Dilated Sliding Window Attention
- Global Sliding Window Attention
Dilated Attention and Global Sliding Window Attention are two attention mechanisms that have been proposed to improve the performance and efficiency of transformer-based models in natural language processing tasks.
Dilated and Global Sliding Window Attention
“Dilated” and “Global Sliding Window” attentions are adaptations of attention mechanisms applied in neural networks, specifically in the domains of natural language processing and computer vision.
Prerequisites: Attention Mechanism | ML, Sliding Window Attention, Dilated CNN
A transformer-based model, such as BERT, SpanBERT, etc., has been utilized to carry out numerous Natural Language Processing tasks. These models’ self-attention mechanism Longformerlimits their potential. These models frequently fail to recognize and comprehend data that contains lengthy texts. In the late 2020s, a Longformer (Long-Document Transformer) entered the scene to provide this function. Long-sequenced strings can pose problems that Longformer seeks to resolve when they are longer than 512 tokens. It modified a CNN-like architecture called Sliding Window Attention to achieve this. Sliding window attention efficiently covers lengthy input data texts. It introduces a combination of sparse attention and sliding window approaches to efficiently manage long sequences.