Text Summarization with HuggingFace’s Transformers
Let’s demonstrate a text summarization task using HuggingFace’s transformers library and the T5 model.
- Installation: We start by installing the necessary libraries, including transformers and torch.
- Import Libraries: We import the required classes from the transformers library.
- Load Model and Tokenizer: We load a pre-trained T5 model and its corresponding tokenizer.
- Prepare Input Text: We prepare the text we want to summarize, ensuring it’s in a suitable format.
- Preprocess Text: We format the text according to the T5 model’s requirements, adding the task prefix (e.g., “summarize:”).
- Tokenize Text: We convert the input text into tokens that the model can process.
- Generate Summary: We use the model to generate a summary, specifying parameters like `num_beams` for beam search, and constraints on length and repetition.
- Print Summary: Finally, we decode the generated tokens back into human-readable text and print the summary.
1. Install HuggingFace Transformers
pip install transformers
2. Import Libraries
from transformers import T5Tokenizer, T5ForConditionalGeneration
3. Load the Pre-trained Model and Tokenizer
model_name = 't5-small'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
4. Prepare the Input Text
input_text = """
The quick brown fox jumps over the lazy dog. This is a classic example used in various typing exercises.
The sentence contains every letter in the English alphabet, making it a pangram.
"""
5. Preprocess the Input Text
preprocess_text = input_text.strip().replace("\n", "")
t5_input_text = f"summarize: {preprocess_text}"
6. Tokenize the Input Text
tokenized_text = tokenizer.encode(t5_input_text, return_tensors="pt")
7. Generate the Summary
summary_ids = model.generate(tokenized_text, num_beams=4, no_repeat_ngram_size=2, min_length=30, max_length=100, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:", summary)
Output:
Summary: the quick brown fox jumps over the lazy dog. the sentence contains every letter in the English alphabet, making it a pangram.
Text2Text Generations using HuggingFace Model
Text2Text generation is a versatile and powerful approach in Natural Language Processing (NLP) that involves transforming one piece of text into another. This can include tasks such as translation, summarization, question answering, and more. HuggingFace, a leading provider of NLP tools, offers a robust pipeline for Text2Text generation using its Transformers library. This article will delve into the functionalities, applications, and technical details of the Text2Text generation pipeline provided by HuggingFace.
Table of Content
- Understanding Text2Text Generation
- Setting Up the Text2Text Generation Pipeline
- Applications of Text2Text Generation
- 1. Question Answering
- 2. Translation
- 3. Paraphrasing
- 4. Summarization
- 5. Sentiment Classification
- 6. Sentiment Span Extraction
- Text Summarization with HuggingFace’s Transformers
- Technical Differences Between TextGeneration and Text2TextGeneration
- Customizing Text Generation