Text Summarization with HuggingFace’s Transformers

Technical Differences Between TextGeneration and Text2TextGeneration

Let’s demonstrate a text summarization task using HuggingFace’s transformers library and the T5 model.

Installation: We start by installing the necessary libraries, including transformers and torch.
Import Libraries: We import the required classes from the transformers library.
Load Model and Tokenizer: We load a pre-trained T5 model and its corresponding tokenizer.
Prepare Input Text: We prepare the text we want to summarize, ensuring it’s in a suitable format.
Preprocess Text: We format the text according to the T5 model’s requirements, adding the task prefix (e.g., “summarize:”).
Tokenize Text: We convert the input text into tokens that the model can process.
Generate Summary: We use the model to generate a summary, specifying parameters like `num_beams` for beam search, and constraints on length and repetition.
Print Summary: Finally, we decode the generated tokens back into human-readable text and print the summary.

1. Install HuggingFace Transformers

pip install transformers

2. Import Libraries

Python

from transformers import T5Tokenizer, T5ForConditionalGeneration

3. Load the Pre-trained Model and Tokenizer

Python

model_name = 't5-small'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

4. Prepare the Input Text

Python

input_text = """
   The quick brown fox jumps over the lazy dog. This is a classic example used in various typing exercises. 
   The sentence contains every letter in the English alphabet, making it a pangram.
   """

5. Preprocess the Input Text

Python

preprocess_text = input_text.strip().replace("\n", "")
t5_input_text = f"summarize: {preprocess_text}"

6. Tokenize the Input Text

Python

tokenized_text = tokenizer.encode(t5_input_text, return_tensors="pt")

7. Generate the Summary

Python

summary_ids = model.generate(tokenized_text, num_beams=4, no_repeat_ngram_size=2, min_length=30, max_length=100, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:", summary)

Output:

Summary: the quick brown fox jumps over the lazy dog. the sentence contains every letter in the English alphabet, making it a pangram.

Text2Text Generations using HuggingFace Model

Text2Text generation is a versatile and powerful approach in Natural Language Processing (NLP) that involves transforming one piece of text into another. This can include tasks such as translation, summarization, question answering, and more. HuggingFace, a leading provider of NLP tools, offers a robust pipeline for Text2Text generation using its Transformers library. This article will delve into the functionalities, applications, and technical details of the Text2Text generation pipeline provided by HuggingFace.

Table of Content

Understanding Text2Text Generation
Setting Up the Text2Text Generation Pipeline
Applications of Text2Text Generation

1. Question Answering
2. Translation
3. Paraphrasing
4. Summarization
5. Sentiment Classification
6. Sentiment Span Extraction

Text Summarization with HuggingFace’s Transformers
Technical Differences Between TextGeneration and Text2TextGeneration
Customizing Text Generation

Text Summarization with HuggingFace’s Transformers

1. Install HuggingFace Transformers

2. Import Libraries

3. Load the Pre-trained Model and Tokenizer

4. Prepare the Input Text

5. Preprocess the Input Text

6. Tokenize the Input Text

7. Generate the Summary

Text2Text Generations using HuggingFace Model

Similar Reads