Netflix Stock Price Prediction & Forecasting using Machine Learning in R

Q: What is Netflix Stock Price Prediction & Forecasting using Machine Learning in R?

In this article, we will learn Netflix Stock Price Prediction & Forecasting using Machine Learning in R,This free Machine Learning tutorial for complete beginners will help you learn Machine Learning from scratch.

Rainfall Prediction using Machine Learning - Python

Which AI Tools are Recommended for Data Science Beginners?

Recently, many people have been paying attention to the stock market as it offers high risks and high returns. In simple words, “Stock” is the ownership of a small part of a company. The more stock you have the bigger the ownership is. Using machine learning algorithms to predict a company’s stock price aims to forecast the future value of the company stock. Due to some factors or elements stock price is dynamic and volatile and predicting it is more challenging.

Table of Content

DataSet Used for Netflix Stock Price Prediction
Model Used for Netflix Stock Price Prediction
How to Predict Netflix Stock Price using Machine Learning in R

Step 1: Importing the required libraries
Step 2: Loading the Netfix Stock Price Dataset
Step 3: Checking the dimension and missing values of our data
Step 4: Taking the summary of the data
Step 5: Plotting the data
Step 6: Model building
Step 7: Model Fitting

Executing and Checking the Model Summary

Checking Accuracy of Netflix Stock Price Prediction Model
Performance Comparison on Netflix Stock Price Prediction Model on Training vs Test Data Set

Predict Netflix Stock Price

Calculate Test accuracy score

DataSet Used for Netflix Stock Price Prediction

For this R Machine Learning Project, we have used the “2002-01-01” to “2022-12-31” Netflix stock price data. This data can be fetched from either of the below sources:

Finance Websites (such as Yahoo, etc)
- To import this dataset, we can use the external package “quantmod” and get the required data with the help of the getSymbols() method.
CSV file containing Netflix stock price data (NFLX.csv)

Model Used for Netflix Stock Price Prediction

Here we will use only the Close price of the Netflix stock for prediction and we will use the ARIMA (p, d, q) model for the prediction.

How to Predict Netflix Stock Price using Machine Learning in R

Step 1: Importing the required libraries

Below is the list of external and internal libraries and packages, that we will be requiring for this R Machine Learning Project:

Package	Uses
smooth	Smoothing techniques and forecasting models for time series analysis.
forecast	Used for forecasting time series data.
xts	Used for handling and manipulating time series data.
imputeTS	Used functions to handle missing values in time series data
fpp2	Provides datasets and additional forecasting tools
tseries	Used for functions for time series analysis, including tests for stationarity.
ggfortify	Used for easy visualization of time series objects
ggplot2	A popular package for creating complex and customizable plots in R.
quantmod	This package provides tools to fetch financial market data, analyze, and visualize it.

#Install and load libraries
#Smoothing techniques for time series analysis.
install.packages("smooth")
library(smooth)

# Used for forecasting time series data.
install.packages("forecast")
library(forecast)

#Used for handling and manipulating time series data
install.packages("xts")
library(xts)

#handle missing values in time series data
install.packages("imputeTS")
library(imputeTS)

#provides datasets
install.packages("fpp2")
library(fpp2)

#functions for time series analysis
install.packages("tseries")
library(tseries)

#visualization of time series objects 
install.packages("ggfortify")
library(ggfortify)

#customizable plots in R
install.packages("ggplot2")
library(ggplot2)

# fetch financial market data
install.packages("quantmod")
library(quantmod)

Step 2: Loading the Netfix Stock Price Dataset

Here we install and load the required libraries, based on the choice of mode of dataset (as discussed above).

Loading dataset from Finance websites

# Loading the required data
df = read.csv("/content/NFLX.csv") #if you use external data set

Loading dataset from CSV file

# Here we use getSymboles() function for collect the data from Yahoo finance
getSymbols('NFLX', from = '2002-01-01', to = '2024-01-01')
df = NFLX

# View dataset
head(df)

Output:

           NFLX.Open NFLX.High NFLX.Low NFLX.Close NFLX.Volume NFLX.Adjusted
2002-05-23  1.156429  1.242857 1.145714   1.196429   104790000      1.196429
2002-05-24  1.214286  1.225000 1.197143   1.210000    11104800      1.210000
2002-05-28  1.213571  1.232143 1.157143   1.157143     6609400      1.157143
2002-05-29  1.164286  1.164286 1.085714   1.103571     6757800      1.103571
2002-05-30  1.107857  1.107857 1.071429   1.071429    10154200      1.071429
2002-05-31  1.078571  1.078571 1.071429   1.076429     8464400      1.076429

Step 3: Checking the dimension and missing values of our data

Here we measure the dimension of the dataset and check the missing values.

# Check the dimension of the dataset
dim(df)

# Check the missing values of all the columns of the dataset
colSums(is.na(df))

Output:

[1] 5439    6

    NFLX.Open     NFLX.High      NFLX.Low    NFLX.Close   NFLX.Volume NFLX.Adjusted 
            0             0             0             0             0             0

Step 4: Taking the summary of the data

We check the summary of the data and get the basic idea of the dataset.

# Checking the summary of the data
summary(df)

Output:

     Index              NFLX.Open          NFLX.High           NFLX.Low       
 Min.   :2002-05-23   Min.   :  0.3779   Min.   :  0.4107   Min.   :  0.3464  
 1st Qu.:2007-10-16   1st Qu.:  4.1143   1st Qu.:  4.1936   1st Qu.:  4.0400  
 Median :2013-03-13   Median : 33.9957   Median : 34.5543   Median : 33.5100  
 Mean   :2013-03-11   Mean   :132.3833   Mean   :134.4291   Mean   :130.2730  
 3rd Qu.:2018-08-04   3rd Qu.:255.3800   3rd Qu.:261.5600   3rd Qu.:249.5550  
 Max.   :2023-12-29   Max.   :692.3500   Max.   :700.9900   Max.   :686.0900  
   NFLX.Close        NFLX.Volume        NFLX.Adjusted     
 Min.   :  0.3729   Min.   :   285600   Min.   :  0.3729  
 1st Qu.:  4.1214   1st Qu.:  5922600   1st Qu.:  4.1214  
 Median : 33.9600   Median : 10018000   Median : 33.9600  
 Mean   :132.4029   Mean   : 15907149   Mean   :132.4029  
 3rd Qu.:255.1150   3rd Qu.: 18833300   3rd Qu.:255.1150  
 Max.   :691.6900   Max.   :323414000   Max.   :691.6900

Step 5: Plotting the data

We will use chartSeries() function from the quantmod package in R, typically used for financial and stock market data visualization. type = ‘auto’, it automatically selects an appropriate chart type based on the data provided.

chartSeries(df, type = 'auto')

Output:

Predicting Stock Prices in R

Now we will Check that the data is stationary or not by visualize the data.

ggplot(df, aes(x = NFLX.Close))+
  geom_density(alpha = 0.5, fill = "blue") +
  geom_histogram(aes(y = ..density..), 
                 color = "black", 
                 fill = "lightgray", 
                 bins = 30, alpha = 0.4) +
  labs(title = "Density and Histogram of Close Price",
       x = "Close Price",
       y = "Density") +
  theme_minimal()

Output:

Predicting Stock Prices in R

Clearly the data is not normally distributed which implies it is a non-stationary data.

Step 6: Model building

We take out the data frame consist of closing price and then split our data df.close consist of closing price of stock in a 80:20 ratio where 80% is the training purpose and remaining for test or validation purpose.

We will split the data in train and test and now we will use arima model to Predicting Stock Prices.

# df.close is just name of the data frame consist of closing price you can take 
df.close = df[,4] # just taking the 4th column i.e. Close price

# Train test split
df.close.train = df.close[1:(0.8*length(df.close))]

df.close.test = df.close[(0.8*length(df.close)):length(df.close)]

Step 7: Model Fitting

# df.close.arima is just a name convention 
df.close.arima = auto.arima(df.close.train,
                            seasonal = T,
                            stepwise = T,
                            nmodels = 100,
                            trace = T,
                            biasadj = T)

Output:

 Fitting models using approximations to speed things up...

 ARIMA(2,1,2) with drift         : 21853.71
 ARIMA(0,1,0) with drift         : 21847.69
 ARIMA(1,1,0) with drift         : 21848.52
 ARIMA(0,1,1) with drift         : 21847.56
 ARIMA(0,1,0)                    : 21847.87
 ARIMA(1,1,1) with drift         : 21848.77
 ARIMA(0,1,2) with drift         : 21849.32
 ARIMA(1,1,2) with drift         : 21850.01
 ARIMA(0,1,1)                    : 21847.64

 Now re-fitting the best model(s) without approximations...

 ARIMA(0,1,1) with drift         : 21849.74

 Best model: ARIMA(0,1,1) with drift

Executing and Checking the Model Summary

Now we will check the summary of the model.

# Summary of the model
summary(df.close.arima)

Output:

Series: df.close.train 
ARIMA(0,1,1) with drift 

Coefficients:
         ma1   drift
      0.0220  0.0667
s.e.  0.0151  0.0462

sigma^2 = 8.883:  log likelihood = -10921.87
AIC=21849.74   AICc=21849.74   BIC=21868.87

Training set error measures:
                       ME     RMSE      MAE       MPE     MAPE     MASE         ACF1
Training set 1.070102e-05 2.979396 1.175391 -1.252125 2.838547 1.008832 0.0001708062

This ARIMA model appears to:

Fit the training data with a low overall error (as indicated by ME, RMSE, MAE, etc.).
The coefficients for ma1 and drift show a slight moving average component and a small linear drift.
The log-likelihood, AIC, and BIC values are reported for understanding the quality of model fit and for comparing with other models.

Checking Accuracy of Netflix Stock Price Prediction Model

Comparing Training and Testing Accuracy of of the Netflix Stock Price Prediction Model:

accuracy(df.close.forecast, df.close.test)

Output:

                       ME       RMSE        MAE       MPE      MAPE       MASE         ACF1
Training set 1.070102e-05   2.979396   1.175391 -1.252125  2.838547   1.008832 0.0001708062
Test set     8.253246e+01 150.853167 122.694874 11.009547 29.253149 105.308391           NA

where:

ME (Mean Error) indicates the average error.
RMSE (Root Mean Square Error) shows the square root of the average squared errors.
MAE (Mean Absolute Error) is the average of the absolute errors.
MPE (Mean Percentage Error) represents the average percentage error.
MAPE (Mean Absolute Percentage Error) is the average of the absolute percentage errors.
MASE (Mean Absolute Scaled Error) measures the accuracy of a model compared to a naive forecasting method.
ACF1 (Autocorrelation at Lag 1) indicates how much current values are related to past values.

The training set has much lower error values across all metrics compared to the test set. This suggests the model performs well on the data it was trained on but does not generalize well to new data (the test set).

Performance Comparison on Netflix Stock Price Prediction Model on Training vs Test Data Set

The training set has minimal errors (almost perfect), which might indicate overfitting—the model learned the training data too well but is not adaptable to new or unseen data.
The test set shows much higher errors, suggesting that the model doesn’t predict well for data it hasn’t seen before.

Predict Netflix Stock Price

With the help of ARIMA() function for different value of (p, d, q) we are seeing the model accuracy and try to find best predicted values.

df.arima1 =Arima(df.close.train, order = c(0,2,1))
pred1 = predict(df.arima1, n.ahead = 1088)
summary(df.arima1)

Output:

Series: df.close.train 
ARIMA(0,2,1) 

Coefficients:
          ma1
      -0.9994
s.e.   0.0014

sigma^2 = 8.89:  log likelihood = -10924.97
AIC=21853.93   AICc=21853.94   BIC=21866.69

Training set error measures:
                    ME     RMSE      MAE        MPE     MAPE     MASE      ACF1
Training set 0.0380327 2.980599 1.165301 0.03106153 2.352286 1.000172 0.0219196

Calculate Test accuracy score

accuracy(pred1$pred, df.close.test)

Output:

               ME     RMSE      MAE     MPE     MAPE
Test set 60.76618 144.3566 118.3292 4.91788 29.72001

We can observe that the accuracy of the above models df.arima1 model has the minimum MAPE, but that is not the best. The possible reasons for this may be we using a very simple model to perform such a complex task, Stock price prediction. It can be improve by some parameter tuning or using some simulation technique to find the appropriate value for (p, d, q).

Tags:

#R Projects #AI-ML-DS #R Machine Learning

Rainfall Prediction using Machine Learning - Python

Which AI Tools are Recommended for Data Science Beginners?