Data Preprocessing
Plot the time series trend using Matplotlib
Python3
def data_plot(df): df_plot = df.copy() ncols = 2 nrows = int ( round (df_plot.shape[ 1 ] / ncols, 0 )) fig, ax = plt.subplots(nrows = nrows, ncols = ncols, sharex = True , figsize = ( 14 , 7 )) for i, ax in enumerate (fig.axes): sns.lineplot(data = df_plot.iloc[:, i], ax = ax) ax.tick_params(axis = "x" , rotation = 30 , labelsize = 10 , length = 0 ) ax.xaxis.set_major_locator(mdates.AutoDateLocator()) fig.tight_layout() plt.show() data_plot(df) |
Output :
Splitting the dataset into test and train
We follow the common practice of splitting the data into training and testing set. We calculate the length of the training datasets and print their respective shapes to confirm the split. Generally, the split is 80:20 for training and test set.
Python3
# Train-Test Split # Setting 80 percent data for training training_data_len = math.ceil( len (df) * . 8 ) training_data_len #Splitting the dataset train_data = df[:training_data_len].iloc[:,: 1 ] test_data = df[training_data_len:].iloc[:,: 1 ] print (train_data.shape, test_data.shape) |
Output:
(6794, 1) (1698, 1)
Preparing Training and Testing Dataset
Here, we are choosing the feature (‘Open’ prices), reshaping it into the necessary 2D format, and validating the resulting shape to make sure it matches the anticipated format for model input, this method prepares the training data for use in a neural network.
Training Data
Python3
# Selecting Open Price values dataset_train = train_data.Open.values # Reshaping 1D to 2D array dataset_train = np.reshape(dataset_train, (-1,1)) dataset_train.shape |
Output:
(6794, 1)
Testing Data
Python3
# Selecting Open Price values dataset_test = test_data. Open .values # Reshaping 1D to 2D array dataset_test = np.reshape(dataset_test, ( - 1 , 1 )) dataset_test.shape |
Output:
(1698, 1)
We carefully prepared the training and testing datasets to guarantee that our model could produce accurate predictions. We made the issue one that was suited for supervised learning by creating sequences with the proper lengths and their related labels.
Normalization
We have applied Min-Max scaling which is a standard preprocessing step in machine learning and time series analysis, to the dataset_test data. It adjusts the values to be between [0, 1], allowing neural networks and other models to converge more quickly and function better. The normalized values are contained in the scaled_test array as a consequence, ready to be used in modeling or analysis.
Python3
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range = ( 0 , 1 )) # scaling dataset scaled_train = scaler.fit_transform(dataset_train) print (scaled_train[: 5 ]) # Normalizing values between 0 and 1 scaled_test = scaler.fit_transform(dataset_test) print ( * scaled_test[: 5 ]) #prints the first 5 rows of scaled_test |
Output:
[0.] [0.00162789] [0.00062727] [0.00203112] [0.00212074]
Transforming the data into Sequence
In this step, it is necessary to separate the time-series data into X_train and y_train from the training set and X_test and y_test from the testing set. Time series data are transformed into a supervised learning problem that may be used to develop the model. While iterating through the time series data, the loop generates input/output sequences of length 50 for training data and sequences of length 30 for the test data. We can predict future values using this technique while taking into account the data’s temporal dependence on earlier observations.
We prepare the training and testing data for a neural network by generating sequences of a given length and their related labels. It then converts these sequences to NumPy arrays and PyTorch tensors.
Training Data
Python3
# Create sequences and labels for training data sequence_length = 50 # Number of time steps to look back X_train, y_train = [], [] for i in range ( len (scaled_train) - sequence_length): X_train.append(scaled_train[i:i + sequence_length]) y_train.append(scaled_train[i + 1 :i + sequence_length + 1 ]) X_train, y_train = np.array(X_train), np.array(y_train) # Convert data to PyTorch tensors X_train = torch.tensor(X_train, dtype = torch.float32) y_train = torch.tensor(y_train, dtype = torch.float32) X_train.shape,y_train.shape |
Output:
(torch.Size([6744, 50, 1]), torch.Size([6744, 50, 1]))
Testing Data
Python3
# Create sequences and labels for testing data sequence_length = 30 # Number of time steps to look back X_test, y_test = [], [] for i in range ( len (scaled_test) - sequence_length): X_test.append(scaled_test[i:i + sequence_length]) y_test.append(scaled_test[i + 1 :i + sequence_length + 1 ]) X_test, y_test = np.array(X_test), np.array(y_test) # Convert data to PyTorch tensors X_test = torch.tensor(X_test, dtype = torch.float32) y_test = torch.tensor(y_test, dtype = torch.float32) X_test.shape, y_test.shape |
Output:
(torch.Size([1668, 30, 1]), torch.Size([1668, 30, 1]))
To make the sequences compatible with our deep learning model, the data was subsequently transformed into NumPy arrays and PyTorch tensors.
Time Series Forecasting using Pytorch
Time series forecasting plays a major role in data analysis, with applications ranging from anticipating stock market trends to forecasting weather patterns. In this article, we’ll dive into the field of time series forecasting using PyTorch and LSTM (Long Short-Term Memory) neural networks. We’ll uncover the critical preprocessing procedures that underpin the accuracy of our forecasts along the way.
Table of Content
- Time Series Forecasting
- Implementation of Time Series Forecasting:
- Step 1: Import the necessary libraries
- Step2: Loading the Dataset
- Step 3: Data Preprocessing
- Step 4: Define LSTM class model
- Step 5: Creating Data Loader for batch training
- Step 6: Model Training & Evaluations
- Step 7: Forecasting