Model Building in Pytorch

Before we dive into the modelling building, lets first understand how the Embedding layer syntax works in Pytorch. It is give as:

torch.nn.Embedding(num_embeddings,
 embedding_dim,
 padding_idx=None,
 max_norm=None,
 norm_type=2.0,
 scale_grad_by_freq=False,
 sparse=False)

where,

  1. num_embeddings (int) is the size of the vocabulary
  2. embedding_dim (int) talks about size of each embedding vector
  3. padding_idx(int, optional) is used to treat a specific index as padding token
  4. max_norm (float, optional) limits the normalisation to a specific value, thus, avoiding exploding gradient.
  5. norm_type (float, optional) specifies the type of normalisation (L1 or L2)
  6. scale_grad_by_freq(bool, optional) scales the embeddings based on the frequency of the words in the vocab.
  7. sparse (bool, optional) decides whether to use sparse gradient updates for getting embedding gradients, when it is set to True

Now let’s look at some code to see how this works.

Example 1:

Firstly, we’ll import all the libraries required.

Python




import torch
import torch.nn as nn
 
# Embedding layer with 5 vocab size and 40 vector embeddings.
embedding = nn.Embedding(5, 40)
#passing the input
embed = embedding(torch.LongTensor([1]))
#print embeddings
print(embed)


Output:

tensor([[ 2.5257,  1.0342, -0.3173, -0.6847,  1.1305, -1.1096, -1.1943, -0.7296,
         -0.3663,  0.0923, -0.4928,  0.6728,  0.3144, -0.1297, -0.4178,  0.5037,
          1.0004, -0.2568,  0.0439, -0.0526, -0.4425, -0.8101, -1.4096,  0.3209,
         -0.4986, -0.2673, -0.5162, -0.7360, -0.3854,  0.4884,  1.0126, -0.5779,
         -0.4810, -0.1298, -0.4205, -0.6634,  0.5938,  1.9682,  0.1999,  1.2953]],
       grad_fn=<EmbeddingBackward0>)
torch.Size([1, 40])

The above code basically created a lookup table named embedding which has 5 rows and 40 columns. Each row will represents a single word embedding initialized randomly between -1 and 1.

You can also see the shape of the vector printed at the end. It’s the same as we defined in our parameter. These vectors later gets optimised during training to make more meaningful vectors. Now let’s look at an example that’s present in the docs.

Code:

Firstly, we’ll import all the required libraries. For this, we’ll be needing torch and the nn module from torch. Then we’ll set the manual seed to 1 to control the randomness.

Python




import torch
import torch.nn as nn
 
torch.manual_seed(1)


Then, we’ll create a dictionary which has the numerical mappings of the words and initialise the embedding layer by using nn.Embedding of shape (2,5).

Python




#creating the dictionary
word_to_ix = {"geeks": 0, "for": 1, "code":2}
#creating embedding layer - 3 words in vocab, 5-dimensional embeddings
embeds = nn.Embedding(2, 5)


Then the next step is to convert the numerical mapping of the word (which we want to create embedding for) into tensor with the dtype long. We use LongTensors as they can be used to represent labels/categories. We can access the embeddings of the word “geeks” as shown below using the embeds.

Python




#converting to tensor
lookup_tensor = torch.tensor([word_to_ix["geeks"]], dtype=torch.long)
#accessing the embeddings of the word "geeks"
pytorch_embed = embeds(lookup_tensor)


Finally, we’ll print the embeddings.

Python




#print the embeddings
print(pytorch_embed)


Output:

tensor([[ 0.6614,  0.2669,  0.0617,  0.6213, -0.4519]],
       grad_fn=<EmbeddingBackward0>)

As these are randomly initialised vectors so they are not of much significance. They can be optimised through training but that will require a lot of effort. To show case how other padding affect an embedding, let’s take a look at another example.

Example 2:

We will take a dummy sentence and using Counter, create a dictionary of words with keys as the words and their frequency as the value.

Python3




from collections import Counter
 
# This is going to be the dummy sentence :
sentences = "this is the second example showing for the article at gfg. and doing this is actually really fun"
 
words = sentences.split(' ')
 
# create a dictionary
vocab = Counter(words)
vocab = sorted(vocab, key=vocab.get, reverse=True)
vocab_size = len(vocab)
 
# create a word to index dictionary from our Vocab dictionary
word2idx = {word: ind for ind, word in enumerate(vocab)}
 
encoded_sentences = [word2idx[word] for word in words]
 
# assign a value to your embedding_dim
e_dim = 5


After creating the dictionary and assigning a value for our embedding dimension, we are ready to initialise the embedding and see its output, but this time we are going to introduce padding index at position 4. What this will do is, pad the input at the specific index and assign a value of zero there.

Python3




# initialise an Embedding layer from Torch
emb = nn.Embedding(vocab_size, e_dim, padding_idx = 3)
word_vectors = emb(torch.LongTensor(encoded_sentences))
 
#print the word_vectors
print(word_vectors)


Output:

tensor([[-0.6125,  2.1841, -0.5777, -0.4984, -1.1440],
        [-0.2335, -0.4090,  0.9648, -0.4256,  0.8362],
        [ 1.1355,  0.1626,  2.7858,  0.2537, -1.0708],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.1541,  0.9336,  0.5681,  0.2360,  0.2519],
        [-1.1530,  0.6917, -1.9961, -0.6608,  0.4884],
        [ 1.8149,  1.0138,  1.4318, -0.3035,  1.3098],
        [ 1.1355,  0.1626,  2.7858,  0.2537, -1.0708],
        [-1.1981,  2.6248, -0.4739, -0.6791, -0.0200],
        [ 0.8023,  1.0044, -0.9132, -0.0626, -0.7896],
        [-1.1518, -0.6600,  1.0331,  0.9817,  0.0572],
        [-0.7707, -1.9172,  0.1438, -0.3755, -0.4840],
        [-1.4134, -0.1180,  1.7339,  2.1844, -1.2160],
        [-0.6125,  2.1841, -0.5777, -0.4984, -1.1440],
        [-0.2335, -0.4090,  0.9648, -0.4256,  0.8362],
        [-0.6005, -0.7831,  1.0127,  1.6974, -1.9878],
        [-0.7868, -0.7832,  0.8435, -0.8540, -0.2374],
        [-1.8418,  0.3408, -1.8767,  1.2411,  1.2132]],
       grad_fn=<EmbeddingBackward0>)

As you can see, the embedding layer provided for the sentences contains padded values at index 3.

(Note: Embedding layer has one trainable parameter called weights, which is by default set to True. When using a pre-trained embedding, this can be set to False using “emb.weight.requires_grad = False“, but its implementation is beyond the scope of this article.)

Other than this, nn.Embedding layer is a key component in transformer architecture. In transformers, it is used to convert input tokens into continuous representations. In conclusion, the nn.Embedding layer is a fundamental component in many NLP models. Understanding this layer and how it works is an important step in building natural language processing models with pytorch effectively.



Word Embedding in Pytorch

Word Embedding is a powerful concept that helps in solving most of the natural language processing problems. As the machine doesn’t understand raw text, we need to transform that into numerical data to perform various operations. The most basic approach is to assign words/ letters a vector that is unique to them but this approach is not very useful as the words with similar meanings will get completely different vectors. Another more useful approach is training a model that can generate vectors of words. This is better than the previous approach because it will group similar words together and generate similar vectors for them. It also captures the overall meaning/ context of the words and sentences which is better than random assignment of vectors.

Similar Reads

Embedding

Embeddings are real-valued dense vectors (multi-dimensional arrays) that carry the meaning of the words. They can capture the context of the word/sentence in a document, semantic similarity, relation with other words/sentences, etc. A popular example of how they extract the contexts from the words is if you remove a man from the king and add a woman, it will output a vector similar to a queen. Also, similar words are close to each other in the embedding space. Many pre-trained models are available such as Word2Vec, GloVe, Bert, etc....

Model Building in Pytorch

Before we dive into the modelling building, lets first understand how the Embedding layer syntax works in Pytorch. It is give as:...