BERT Embeddings
Another important pre trained transformer based model is by Google known as BERT or Bidirectional Encoder Representations from Transformers. It can be used to extract high quality language features from raw text or can be fine-tuned on own data to perform specific tasks.
BERT’s architecture consists of only encoders and input received is a sequence of tokens i.e. Token embeddings, Segment embeddings and Positional embeddings. The main idea is to mask a few words in a sentence and task the model to predict the masked words.
BERT
Firstly, install the transformers library as we’ll be using pytorch and transformers for implementing this.
!pip install transformers
Python
import torch from transformers import BertTokenizer # Load pre-trained model tokenizer tokenizer = BertTokenizer.from_pretrained( 'bert-base-uncased' ) text = "This blog post explains pre trained word embeddings" marked_text = "[CLS] " + text + " [SEP]" # Tokenize the sentence with the BERT tokenizer. tokenized_text = tokenizer.tokenize(marked_text) #mapping the tokens with their indexes in vocabulary indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) # Print out the tokens. print (tokenized_text) |
Output:
['[CLS]', 'this', 'blog', 'post', 'explains', 'pre', 'trained', 'word', 'em', '##bed', '##ding', '##s', '[SEP]']
Python
#Marking all the tokens to a single sentence segments_ids = [ 1 ] * len (tokenized_text) # Convert inputs to PyTorch tensors tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids]) #initialising the model model = BertModel.from_pretrained( 'bert-base-uncased' , output_hidden_states = True , ) model. eval () with torch.no_grad(): outputs = model(tokens_tensor, segments_tensors) hidden_states = outputs[ 2 ] #concatenating the last four layers of the output to get the embeddings word_embedding = torch.cat([hidden_states[i] for i in [ - 1 , - 2 , - 3 , - 4 ]], dim = - 1 ) print (word_embedding) |
Output:
tensor([[[ 0.1508, -0.0126, -0.0503, ..., 0.0346, 0.4191, 0.2692],
[-0.2833, -0.4473, -0.1290, ..., 0.1606, 0.5159, 0.2478],
[ 0.6687, -0.4654, 0.3076, ..., 0.2321, -0.0784, -0.7501],
...,
[-0.1049, 0.6510, -0.3414, ..., 0.0136, 0.3559, 0.0941],
[-0.2663, -0.0465, -0.2842, ..., -0.4947, 0.0606, 0.1420],
[ 0.8174, 0.2086, -0.4486, ..., -0.0698, -0.0547, -0.0229]]])
Conclusion
Generating word embedding is a crucial technique to solve natural language problems and pre trained embeddings offer a powerful solution to the complexities associated with generating word embeddings from scratch.
Pre-Trained Word Embedding in NLP
Word Embedding is an important term in Natural Language Processing and a significant breakthrough in deep learning that solved many problems. In this article, we’ll be looking into what pre-trained word embeddings in NLP are.
Table of Content
- Word Embeddings
- Challenges in building word embedding from scratch
- Pre Trained Word Embeddings
- Word2Vec
- GloVe
- BERT Embeddings