Data Preprocessing

Plot the time series trend using Matplotlib

Python3




def data_plot(df):
    df_plot = df.copy()
 
    ncols = 2
    nrows = int(round(df_plot.shape[1] / ncols, 0))
 
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols,
                           sharex=True, figsize=(14, 7))
    for i, ax in enumerate(fig.axes):
        sns.lineplot(data=df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
    fig.tight_layout()
    plt.show()
 
data_plot(df)


Output :

Line plots showing the features of Apple Inc. stock through time

Splitting the dataset into test and train

We follow the common practice of splitting the data into training and testing set. We calculate the length of the training datasets and print their respective shapes to confirm the split. Generally, the split is 80:20 for training and test set.

Python3




# Train-Test Split
# Setting 80 percent data for training
training_data_len = math.ceil(len(df) * .8)
training_data_len
 
#Splitting the dataset
train_data = df[:training_data_len].iloc[:,:1]
test_data = df[training_data_len:].iloc[:,:1]
print(train_data.shape, test_data.shape)


Output:

(6794, 1) (1698, 1)

Preparing Training and Testing Dataset

Here, we are choosing the feature (‘Open’ prices), reshaping it into the necessary 2D format, and validating the resulting shape to make sure it matches the anticipated format for model input, this method prepares the training data for use in a neural network.

Training Data

Python3




# Selecting Open Price values
dataset_train = train_data.Open.values
# Reshaping 1D to 2D array
dataset_train = np.reshape(dataset_train, (-1,1))
dataset_train.shape


Output:

(6794, 1)

Testing Data

Python3




# Selecting Open Price values
dataset_test = test_data.Open.values
# Reshaping 1D to 2D array
dataset_test = np.reshape(dataset_test, (-1,1))
dataset_test.shape


Output:

(1698, 1)

We carefully prepared the training and testing datasets to guarantee that our model could produce accurate predictions. We made the issue one that was suited for supervised learning by creating sequences with the proper lengths and their related labels.

Normalization

We have applied Min-Max scaling which is a standard preprocessing step in machine learning and time series analysis, to the dataset_test data. It adjusts the values to be between [0, 1], allowing neural networks and other models to converge more quickly and function better. The normalized values are contained in the scaled_test array as a consequence, ready to be used in modeling or analysis.

Python3




from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
# scaling dataset
scaled_train = scaler.fit_transform(dataset_train)
 
print(scaled_train[:5])
# Normalizing values between 0 and 1
scaled_test = scaler.fit_transform(dataset_test)
print(*scaled_test[:5]) #prints the first 5 rows of scaled_test


Output:

[0.] [0.00162789] [0.00062727] [0.00203112] [0.00212074]

Transforming the data into Sequence

In this step, it is necessary to separate the time-series data into X_train and y_train from the training set and X_test and y_test from the testing set. Time series data are transformed into a supervised learning problem that may be used to develop the model. While iterating through the time series data, the loop generates input/output sequences of length 50 for training data and sequences of length 30 for the test data. We can predict future values using this technique while taking into account the data’s temporal dependence on earlier observations.

We prepare the training and testing data for a neural network by generating sequences of a given length and their related labels. It then converts these sequences to NumPy arrays and PyTorch tensors.

Training Data

Python3




# Create sequences and labels for training data
sequence_length = 50  # Number of time steps to look back
X_train, y_train = [], []
for i in range(len(scaled_train) - sequence_length):
    X_train.append(scaled_train[i:i+sequence_length])
    y_train.append(scaled_train[i+1:i+sequence_length+1])
X_train, y_train = np.array(X_train), np.array(y_train)
 
# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_train.shape,y_train.shape


Output:

(torch.Size([6744, 50, 1]), torch.Size([6744, 50, 1]))

Testing Data

Python3




# Create sequences and labels for testing data
sequence_length = 30  # Number of time steps to look back
X_test, y_test = [], []
for i in range(len(scaled_test) - sequence_length):
    X_test.append(scaled_test[i:i+sequence_length])
    y_test.append(scaled_test[i+1:i+sequence_length+1])
X_test, y_test = np.array(X_test), np.array(y_test)
 
# Convert data to PyTorch tensors
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)
X_test.shape, y_test.shape


Output:

(torch.Size([1668, 30, 1]), torch.Size([1668, 30, 1]))

To make the sequences compatible with our deep learning model, the data was subsequently transformed into NumPy arrays and PyTorch tensors.

Time Series Forecasting using Pytorch

Time series forecasting plays a major role in data analysis, with applications ranging from anticipating stock market trends to forecasting weather patterns. In this article, we’ll dive into the field of time series forecasting using PyTorch and LSTM (Long Short-Term Memory) neural networks. We’ll uncover the critical preprocessing procedures that underpin the accuracy of our forecasts along the way.

Table of Content

  • Time Series Forecasting
  • Implementation of Time Series Forecasting:
  • Step 1: Import the necessary libraries
  • Step2: Loading the Dataset
  • Step 3: Data Preprocessing
  • Step 4: Define LSTM class model
  • Step 5: Creating Data Loader for batch training
  • Step 6: Model Training & Evaluations
  • Step 7: Forecasting

Similar Reads

Time Series Forecasting

...

Implementation of Time Series Forecasting:

Time series data is essentially a set of observations taken at regular periods of time. Time series forecasting attempts to estimate future values based on patterns and trends detected in historical data....

Step 1: Import the necessary libraries

Prerequisite...

Step2: Loading the Dataset

Python3 import pandas as pd import numpy as np import math import matplotlib.pyplot as plt # Visualization import matplotlib.dates as mdates # Formatting dates import seaborn as sns # Visualization from sklearn.preprocessing import MinMaxScaler import torch # Library for implementing Deep Neural Network import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader...

Step 3: Data Preprocessing

...

Step 4: Define LSTM class model

In this step, we are using ‘yfinance’ library to download historical stock market data for Apple Inc. (AAPL) from Yahoo Finance....

Step 5: Creating Data Loader for batch training

...

Step 6: Model Training & Evaluations

Plot the time series trend using Matplotlib...

Step 7: Forecasting

...