Data Preprocessing

Plot the time series trend using Matplotlib

Python3

def data_plot(df):
    df_plot = df.copy()
 
    ncols = 2
    nrows = int(round(df_plot.shape[1] / ncols, 0))
 
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols,
                           sharex=True, figsize=(14, 7))
    for i, ax in enumerate(fig.axes):
        sns.lineplot(data=df_plot.iloc[:, i], ax=ax)
        ax.tick_params(axis="x", rotation=30, labelsize=10, length=0)
        ax.xaxis.set_major_locator(mdates.AutoDateLocator())
    fig.tight_layout()
    plt.show()
 
data_plot(df)

Output :

Line plots showing the features of Apple Inc. stock through time

Splitting the dataset into test and train

We follow the common practice of splitting the data into training and testing set. We calculate the length of the training datasets and print their respective shapes to confirm the split. Generally, the split is 80:20 for training and test set.

Python3

# Train-Test Split 
# Setting 80 percent data for training
training_data_len = math.ceil(len(df) * .8)
training_data_len
 
#Splitting the dataset
train_data = df[:training_data_len].iloc[:,:1]
test_data = df[training_data_len:].iloc[:,:1]
print(train_data.shape, test_data.shape)

Output:

(6794, 1) (1698, 1)

Preparing Training and Testing Dataset

Here, we are choosing the feature (‘Open’ prices), reshaping it into the necessary 2D format, and validating the resulting shape to make sure it matches the anticipated format for model input, this method prepares the training data for use in a neural network.

Training Data

Python3

# Selecting Open Price values
dataset_train = train_data.Open.values
# Reshaping 1D to 2D array
dataset_train = np.reshape(dataset_train, (-1,1))
dataset_train.shape

Output:

(6794, 1)

Testing Data

Python3

# Selecting Open Price values
dataset_test = test_data.Open.values
# Reshaping 1D to 2D array
dataset_test = np.reshape(dataset_test, (-1,1))
dataset_test.shape

Output:

(1698, 1)

We carefully prepared the training and testing datasets to guarantee that our model could produce accurate predictions. We made the issue one that was suited for supervised learning by creating sequences with the proper lengths and their related labels.

Normalization

We have applied Min-Max scaling which is a standard preprocessing step in machine learning and time series analysis, to the dataset_test data. It adjusts the values to be between [0, 1], allowing neural networks and other models to converge more quickly and function better. The normalized values are contained in the scaled_test array as a consequence, ready to be used in modeling or analysis.

Python3

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
# scaling dataset
scaled_train = scaler.fit_transform(dataset_train)
 
print(scaled_train[:5])
# Normalizing values between 0 and 1
scaled_test = scaler.fit_transform(dataset_test)
print(*scaled_test[:5]) #prints the first 5 rows of scaled_test

Output:

[0.] [0.00162789] [0.00062727] [0.00203112] [0.00212074]

Transforming the data into Sequence

In this step, it is necessary to separate the time-series data into X_train and y_train from the training set and X_test and y_test from the testing set. Time series data are transformed into a supervised learning problem that may be used to develop the model. While iterating through the time series data, the loop generates input/output sequences of length 50 for training data and sequences of length 30 for the test data. We can predict future values using this technique while taking into account the data’s temporal dependence on earlier observations.

We prepare the training and testing data for a neural network by generating sequences of a given length and their related labels. It then converts these sequences to NumPy arrays and PyTorch tensors.

Training Data

Python3

# Create sequences and labels for training data
sequence_length = 50  # Number of time steps to look back
X_train, y_train = [], []
for i in range(len(scaled_train) - sequence_length):
    X_train.append(scaled_train[i:i+sequence_length])
    y_train.append(scaled_train[i+1:i+sequence_length+1])
X_train, y_train = np.array(X_train), np.array(y_train)
 
# Convert data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_train.shape,y_train.shape

Output:

(torch.Size([6744, 50, 1]), torch.Size([6744, 50, 1]))

Testing Data

Python3

# Create sequences and labels for testing data
sequence_length = 30  # Number of time steps to look back
X_test, y_test = [], []
for i in range(len(scaled_test) - sequence_length):
    X_test.append(scaled_test[i:i+sequence_length])
    y_test.append(scaled_test[i+1:i+sequence_length+1])
X_test, y_test = np.array(X_test), np.array(y_test)
 
# Convert data to PyTorch tensors
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)
X_test.shape, y_test.shape

Output:

(torch.Size([1668, 30, 1]), torch.Size([1668, 30, 1]))

To make the sequences compatible with our deep learning model, the data was subsequently transformed into NumPy arrays and PyTorch tensors.

Time Series Forecasting using Pytorch

Time series forecasting plays a major role in data analysis, with applications ranging from anticipating stock market trends to forecasting weather patterns. In this article, we’ll dive into the field of time series forecasting using PyTorch and LSTM (Long Short-Term Memory) neural networks. We’ll uncover the critical preprocessing procedures that underpin the accuracy of our forecasts along the way.

Table of Content

Time Series Forecasting
Implementation of Time Series Forecasting:
Step 1: Import the necessary libraries
Step2: Loading the Dataset
Step 3: Data Preprocessing
Step 4: Define LSTM class model
Step 5: Creating Data Loader for batch training
Step 6: Model Training & Evaluations
Step 7: Forecasting

Data Preprocessing

Plot the time series trend using Matplotlib

Python3

Splitting the dataset into test and train

Python3

Preparing Training and Testing Dataset

Training Data

Python3

Testing Data

Python3

Normalization

Python3

Transforming the data into Sequence

Training Data

Python3

Testing Data

Python3

Time Series Forecasting using Pytorch

Table of Content

Similar Reads