What is Sliding Window Attention?

A sliding window is an attention pattern based on parsing a m x n image with a fixed step size to capture the target image(s) efficiently. It is used to improve the efficiency of the longformer. On comparing the sliding window attention (Fig below) model to a full connection model (Fig above), it can easily be observed that this method is much more efficient than the former.   

There are two types of sliding window attention models:

  1. Dilated Sliding Window Attention
  2. Global Sliding Window Attention

Dilated Attention and Global Sliding Window Attention are two attention mechanisms that have been proposed to improve the performance and efficiency of transformer-based models in natural language processing tasks.

Dilated and Global Sliding Window Attention

“Dilated” and “Global Sliding Window” attentions are adaptations of attention mechanisms applied in neural networks, specifically in the domains of natural language processing and computer vision.

A transformer-based model, such as BERT, SpanBERT, etc., has been utilized to carry out numerous Natural Language Processing tasks. These models’ self-attention mechanism Longformerlimits their potential. These models frequently fail to recognize and comprehend data that contains lengthy texts. In the late 2020s, a Longformer (Long-Document Transformer) entered the scene to provide this function. Long-sequenced strings can pose problems that Longformer seeks to resolve when they are longer than 512 tokens. It modified a CNN-like architecture called Sliding Window Attention to achieve this. Sliding window attention efficiently covers lengthy input data texts. It introduces a combination of sparse attention and sliding window approaches to efficiently manage long sequences.

Similar Reads

What is Longformer?

Longformer is a transformer-based model designed to handle long sequences more efficiently. By introducing a sliding window attention mechanism, it lessens the quadratic complexity of conventional self-attention by allowing the model to focus on just a portion of tokens. Longformer preserves a wider context for each token by adding global information from the entire sequence. By including a global attention component that catches dependencies outside of the window size, it does this. Longformer is a scalable technique for handling long-range dependencies in natural language processing tasks and has been successfully applied to a variety of tasks, including document classification, question answering, and text generation....

What is Sliding Window Attention?

A sliding window is an attention pattern based on parsing a m x n image with a fixed step size to capture the target image(s) efficiently. It is used to improve the efficiency of the longformer. On comparing the sliding window attention (Fig below) model to a full connection model (Fig above), it can easily be observed that this method is much more efficient than the former....

Dilated Sliding Window Attention in Deep Learning

Dilated Attention, also known as Sparse Attention or Fixed Pattern Attention, inserts sparsity into transformers’ self-attention mechanisms by bypassing specific attention connections. To achieve this sparsity, the attention patterns are dilated, such that not all tokens pay attention to each other....

Global Sliding Window Attention in Deep Learning

Global Sliding Window Attention is an attention mechanism used in transformer-based models to address the quadratic complexity issue of traditional self-attention. It limits the attention window size by considering a fixed-size window that slides across the sequence. This mechanism helps reduce the computational complexity while capturing contextual information within a limited context window. It aims to address the quadratic complexity issue of traditional self-attention by limiting the attention window size. In standard self-attention, the attention weights are computed for all pairs of tokens in the sequence, resulting in a quadratic time complexity....

Advantages and Disadvantages

Both Global Sliding Window and Dilated Attention aims to increase the scalability and effectiveness of self-attention process in transformer-based models. It provides alternatives to the traditional self-attention mechanism while balancing computational requirements and capturing long-range dependencies in the input sequences....

Dilated and Global Sliding Window Attention -FAQs

Q. What is Attention Mechanism?...