Implementing Recurrent Neural Networks in PyTorch

Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly effective for sequential data. Unlike traditional feedforward neural networks, RNNs have connections that form directed cycles, allowing them to maintain a hidden state that can capture information from previous inputs. This makes them suitable for tasks such as time series prediction, natural language processing, and more.In this article, we will explore how to implement RNNs in PyTorch.

Table of Content

  • Introduction to Recurrent Neural Networks
  • Building an RNN from Scratch in Pytorch
    • Setting Up the Environment
    • Steps to Build an RNN
  • Example 1: Predicting Sequential Data: An RNN Approach Using PyTorch
  • Example 2: Sentiment Analysis with RNN: Classifying Movie Reviews Using PyTorch

Introduction to Recurrent Neural Networks

RNNs are designed to recognize patterns in sequences of data, such as time series or text. They achieve this by maintaining a hidden state that is updated at each time step based on the current input and the previous hidden state. This allows RNNs to capture temporal dependencies in the data.The basic structure of an RNN consists of:

  • Input Layer: Takes the input data at each time step.
  • Hidden Layer: Maintains the hidden state and updates it based on the input and the previous hidden state.
  • Output Layer: Produces the output at each time step.

Building an RNN from Scratch in Pytorch

Setting Up the Environment

Before we start implementing the RNN, we need to set up our environment. Ensure you have PyTorch installed. You can install it using pip:

pip install torch

Steps to Build an RNN

  1. Import Libraries: Bring in the required libraries, torch, torch.nn and torch.optim.
  2. Define the RNN Model: Create a class for your RNN model by, subclassing torch.nn.Module.
  3. Preparing Data: Data must be in a sequential format in order for RNNs to function properly. Preprocessing procedures like tokenization for text data, and normalization for time series data are frequently involved in this.
  4. DataLoader in PyTorch: PyTorch provides the DataLoader class to easily handle batching, shuffling, and loading data in parallel. This is crucial for efficient training of RNNs.
  5. Train the Model: Use a loss function and an optimizer to train your model on your dataset. Training Loop When training an RNN, the data is iterated over several times, or epochs and the model weights are updated by the use of backpropagation through time (BPTT)
  6. Evaluate the Model: Test your model to see how well it performs on unseen data.

Example 1: Predicting Sequential Data: An RNN Approach Using PyTorch

To use an RNN to predict the next value in a series of numbers, we will build a basic synthetic dataset. This will assist us in comprehending the fundamentals of RNN operation and PyTorch implementation. Step-by-Step Implementation:

Step 1: Import Libraries

First, we need to import the necessary libraries.

Python
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

Step 2: Create Synthetic Dataset

We will create a simple sine wave dataset. The goal is to predict the next value in the sine wave sequence.

Python
# Generate sine wave data
def generate_data(seq_length, num_samples):
    X = []
    y = []
    for i in range(num_samples):
        x = np.linspace(i * 2 * np.pi, (i + 1) * 2 * np.pi, seq_length + 1)
        sine_wave = np.sin(x)
        X.append(sine_wave[:-1])  # input sequence
        y.append(sine_wave[1:])   # target sequence
    return np.array(X), np.array(y)

seq_length = 50
num_samples = 1000
X, y = generate_data(seq_length, num_samples)

# Convert to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

print(X.shape, y.shape)  # Output: (1000, 50), (1000, 50)

Output:

torch.Size([1000, 50]) torch.Size([1000, 50])

Step 3: Define the RNN Model

Next, we will define the RNN model.

Python
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        h0 = torch.zeros(1, x.size(0), hidden_size).to(x.device)
        out, _ = self.rnn(x, h0)
        out = self.fc(out)
        return out

input_size = 1
hidden_size = 20
output_size = 1
model = SimpleRNN(input_size, hidden_size, output_size)

Step 4: Train the Model

Now, we will train the model using Mean Squared Error (MSE) loss and the Adam optimizer.

Python
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    outputs = model(X.unsqueeze(2))  # Add a dimension for input size
    loss = criterion(outputs, y.unsqueeze(2))
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Output:

Epoch [10/100], Loss: 0.3548
Epoch [20/100], Loss: 0.2653
Epoch [30/100], Loss: 0.1757
Epoch [40/100], Loss: 0.0921
Epoch [50/100], Loss: 0.0592
Epoch [60/100], Loss: 0.0421
Epoch [70/100], Loss: 0.0306
Epoch [80/100], Loss: 0.0222
Epoch [90/100], Loss: 0.0151
Epoch [100/100], Loss: 0.0093

Step 5: Visualize the Results

Finally, we will visualize the predictions made by the model.

Python
# Make predictions
model.eval()
with torch.no_grad():
    predictions = model(X.unsqueeze(2)).squeeze(2).numpy()

# Plot results
plt.figure(figsize=(10, 6))
plt.plot(y[0].numpy(), label='True')
plt.plot(predictions[0], label='Predicted')
plt.legend()
plt.show()

Output:

Predicting Sequential Data

Example 2: Sentiment Analysis with RNN: Classifying Movie Reviews Using PyTorch

In this example, we will use a public dataset to perform sentiment analysis on movie reviews. The goal is to classify each review as positive or negative using an RNN. Step-by-Step Implementation:

Step 1: Import Libraries

Python
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from torch.utils.data import Dataset, DataLoader

Step 2: Load and Preprocess the Dataset

We will load the IMDB movie reviews dataset directly from a URL and preprocess it.

Python
# Load dataset
url = "https://raw.githubusercontent.com/justmarkham/DAT8/master/data/sms.tsv"
df = pd.read_csv(url, delimiter='\t', header=None, names=['label', 'text'])

# Preprocess dataset
def preprocess_text(text):
    return text.lower().split()

df['text'] = df['text'].apply(preprocess_text)
df = df[['text', 'label']]

# Encode labels
le = LabelEncoder()
df['label'] = le.fit_transform(df['label'])

# Split dataset
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

# Vocabulary and indexing
vocab = set([word for phrase in df['text'] for word in phrase])
word_to_idx = {word: idx for idx, word in enumerate(vocab, 1)}

def encode_phrase(phrase):
    return [word_to_idx[word] for word in phrase]

train_data['text'] = train_data['text'].apply(encode_phrase)
test_data['text'] = test_data['text'].apply(encode_phrase)

# Padding sequences
max_length = max(df['text'].apply(len))

def pad_sequence(seq, max_length):
    return seq + [0] * (max_length - len(seq))

train_data['text'] = train_data['text'].apply(lambda x: pad_sequence(x, max_length))
test_data['text'] = test_data['text'].apply(lambda x: pad_sequence(x, max_length))

Step 3: Create Dataset and DataLoader

Python
class SentimentDataset(Dataset):
    def __init__(self, data):
        self.texts = data['text'].values
        self.labels = data['label'].values
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        return torch.tensor(text, dtype=torch.long), torch.tensor(label, dtype=torch.long)

train_dataset = SentimentDataset(train_data)
test_dataset = SentimentDataset(test_data)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

Step 4: Define the RNN Model

The SentimentRNN is a type of neural network designed to understand sequences of words and determine if a piece of text (like a sentence) is positive or negative. Think of it like a brain that can read and understand emotions in text.

Python
class SentimentRNN(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, output_size):
        super(SentimentRNN, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        x = self.embedding(x)
        h0 = torch.zeros(1, x.size(0), hidden_size).to(x.device)
        out, _ = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])
        return out

vocab_size = len(vocab) + 1
embed_size = 128
hidden_size = 128
output_size = 2  # For binary classification
model = SentimentRNN(vocab_size, embed_size, hidden_size, output_size)

Step 5: Train the Model

Python
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    model.train()
    epoch_loss = 0
    for texts, labels in train_loader:
        outputs = model(texts)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
    
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss / len(train_loader):.4f}')

Output:

Epoch [1/10], Loss: 0.4016
Epoch [2/10], Loss: 0.3999
Epoch [3/10], Loss: 0.4004
Epoch [4/10], Loss: 0.3954
Epoch [5/10], Loss: 0.3969
Epoch [6/10], Loss: 0.3978
Epoch [7/10], Loss: 0.3960
Epoch [8/10], Loss: 0.3959
Epoch [9/10], Loss: 0.3967
Epoch [10/10], Loss: 0.3953

Step 6: Evaluate the Model

Python
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for texts, labels in test_loader:
        outputs = model(texts)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Accuracy: {accuracy:.2f}%')

Output:

Accuracy: 86.64%

Step 7: Visualize Training Loss

Python
losses = []

for epoch in range(num_epochs):
    model.train()
    epoch_loss = 0
    for texts, labels in train_loader:
        outputs = model(texts)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
    
    losses.append(epoch_loss / len(train_loader))
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss / len(train_loader):.4f}')

# Plot training loss
plt.figure(figsize=(10, 6))
plt.plot(range(1, num_epochs + 1), losses, marker='o')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.show()

Output :

Epoch [1/10], Loss: 0.3946
Epoch [2/10], Loss: 0.3990
Epoch [3/10], Loss: 0.3968
Epoch [4/10], Loss: 0.3988
Epoch [5/10], Loss: 0.3949
Epoch [6/10], Loss: 0.3983
Epoch [7/10], Loss: 0.3997
Epoch [8/10], Loss: 0.3991
Epoch [9/10], Loss: 0.3991
Epoch [10/10], Loss: 0.3956

Movie Reviews Using RNN


We have used two examples to show how to use PyTorch to create Recurrent Neural Networks.

  • In the first example, sentiment analysis was done on movie reviews using the IMDB dataset, and the next value in a sequence was predicted using a synthetic sine wave dataset.
  • You can get a strong understanding of RNNs and their applications to a range of tasks by, following these steps.

Conclusion

In this article, we have implemented a simple RNN from scratch using PyTorch. We covered the basics of RNNs, built an RNN class, trained it on a sine wave prediction task, and evaluated its performance. This implementation serves as a foundation for more complex RNN architectures and tasks.