How to Predict Netflix Stock Price using Machine Learning in R
Step 1: Importing the required libraries
Below is the list of external and internal libraries and packages, that we will be requiring for this R Machine Learning Project:
Package | Uses |
---|---|
smooth | Smoothing techniques and forecasting models for time series analysis. |
forecast | Used for forecasting time series data. |
xts | Used for handling and manipulating time series data. |
imputeTS | Used functions to handle missing values in time series data |
fpp2 | Provides datasets and additional forecasting tools |
tseries | Used for functions for time series analysis, including tests for stationarity. |
ggfortify | Used for easy visualization of time series objects |
ggplot2 | A popular package for creating complex and customizable plots in R. |
quantmod | This package provides tools to fetch financial market data, analyze, and visualize it. |
#Install and load libraries
#Smoothing techniques for time series analysis.
install.packages("smooth")
library(smooth)
# Used for forecasting time series data.
install.packages("forecast")
library(forecast)
#Used for handling and manipulating time series data
install.packages("xts")
library(xts)
#handle missing values in time series data
install.packages("imputeTS")
library(imputeTS)
#provides datasets
install.packages("fpp2")
library(fpp2)
#functions for time series analysis
install.packages("tseries")
library(tseries)
#visualization of time series objects
install.packages("ggfortify")
library(ggfortify)
#customizable plots in R
install.packages("ggplot2")
library(ggplot2)
# fetch financial market data
install.packages("quantmod")
library(quantmod)
Step 2: Loading the Netfix Stock Price Dataset
Here we install and load the required libraries, based on the choice of mode of dataset (as discussed above).
- Loading dataset from Finance websites
# Loading the required data
df = read.csv("/content/NFLX.csv") #if you use external data set
- Loading dataset from CSV file
# Here we use getSymboles() function for collect the data from Yahoo finance
getSymbols('NFLX', from = '2002-01-01', to = '2024-01-01')
df = NFLX
# View dataset
head(df)
Output:
NFLX.Open NFLX.High NFLX.Low NFLX.Close NFLX.Volume NFLX.Adjusted
2002-05-23 1.156429 1.242857 1.145714 1.196429 104790000 1.196429
2002-05-24 1.214286 1.225000 1.197143 1.210000 11104800 1.210000
2002-05-28 1.213571 1.232143 1.157143 1.157143 6609400 1.157143
2002-05-29 1.164286 1.164286 1.085714 1.103571 6757800 1.103571
2002-05-30 1.107857 1.107857 1.071429 1.071429 10154200 1.071429
2002-05-31 1.078571 1.078571 1.071429 1.076429 8464400 1.076429
Step 3: Checking the dimension and missing values of our data
Here we measure the dimension of the dataset and check the missing values.
# Check the dimension of the dataset
dim(df)
# Check the missing values of all the columns of the dataset
colSums(is.na(df))
Output:
[1] 5439 6
NFLX.Open NFLX.High NFLX.Low NFLX.Close NFLX.Volume NFLX.Adjusted
0 0 0 0 0 0
Step 4: Taking the summary of the data
We check the summary of the data and get the basic idea of the dataset.
# Checking the summary of the data
summary(df)
Output:
Index NFLX.Open NFLX.High NFLX.Low
Min. :2002-05-23 Min. : 0.3779 Min. : 0.4107 Min. : 0.3464
1st Qu.:2007-10-16 1st Qu.: 4.1143 1st Qu.: 4.1936 1st Qu.: 4.0400
Median :2013-03-13 Median : 33.9957 Median : 34.5543 Median : 33.5100
Mean :2013-03-11 Mean :132.3833 Mean :134.4291 Mean :130.2730
3rd Qu.:2018-08-04 3rd Qu.:255.3800 3rd Qu.:261.5600 3rd Qu.:249.5550
Max. :2023-12-29 Max. :692.3500 Max. :700.9900 Max. :686.0900
NFLX.Close NFLX.Volume NFLX.Adjusted
Min. : 0.3729 Min. : 285600 Min. : 0.3729
1st Qu.: 4.1214 1st Qu.: 5922600 1st Qu.: 4.1214
Median : 33.9600 Median : 10018000 Median : 33.9600
Mean :132.4029 Mean : 15907149 Mean :132.4029
3rd Qu.:255.1150 3rd Qu.: 18833300 3rd Qu.:255.1150
Max. :691.6900 Max. :323414000 Max. :691.6900
Step 5: Plotting the data
We will use chartSeries() function from the quantmod package in R, typically used for financial and stock market data visualization. type = ‘auto’, it automatically selects an appropriate chart type based on the data provided.
chartSeries(df, type = 'auto')
Output:
Now we will Check that the data is stationary or not by visualize the data.
ggplot(df, aes(x = NFLX.Close))+
geom_density(alpha = 0.5, fill = "blue") +
geom_histogram(aes(y = ..density..),
color = "black",
fill = "lightgray",
bins = 30, alpha = 0.4) +
labs(title = "Density and Histogram of Close Price",
x = "Close Price",
y = "Density") +
theme_minimal()
Output:
Clearly the data is not normally distributed which implies it is a non-stationary data.
Step 6: Model building
We take out the data frame consist of closing price and then split our data df.close consist of closing price of stock in a 80:20 ratio where 80% is the training purpose and remaining for test or validation purpose.
We will split the data in train and test and now we will use arima model to Predicting Stock Prices.
# df.close is just name of the data frame consist of closing price you can take
df.close = df[,4] # just taking the 4th column i.e. Close price
# Train test split
df.close.train = df.close[1:(0.8*length(df.close))]
df.close.test = df.close[(0.8*length(df.close)):length(df.close)]
Step 7: Model Fitting
# df.close.arima is just a name convention
df.close.arima = auto.arima(df.close.train,
seasonal = T,
stepwise = T,
nmodels = 100,
trace = T,
biasadj = T)
Output:
Fitting models using approximations to speed things up...
ARIMA(2,1,2) with drift : 21853.71
ARIMA(0,1,0) with drift : 21847.69
ARIMA(1,1,0) with drift : 21848.52
ARIMA(0,1,1) with drift : 21847.56
ARIMA(0,1,0) : 21847.87
ARIMA(1,1,1) with drift : 21848.77
ARIMA(0,1,2) with drift : 21849.32
ARIMA(1,1,2) with drift : 21850.01
ARIMA(0,1,1) : 21847.64
Now re-fitting the best model(s) without approximations...
ARIMA(0,1,1) with drift : 21849.74
Best model: ARIMA(0,1,1) with drift
Netflix Stock Price Prediction & Forecasting using Machine Learning in R
Recently, many people have been paying attention to the stock market as it offers high risks and high returns. In simple words, “Stock” is the ownership of a small part of a company. The more stock you have the bigger the ownership is. Using machine learning algorithms to predict a company’s stock price aims to forecast the future value of the company stock. Due to some factors or elements stock price is dynamic and volatile and predicting it is more challenging.
Table of Content
- DataSet Used for Netflix Stock Price Prediction
- Model Used for Netflix Stock Price Prediction
- How to Predict Netflix Stock Price using Machine Learning in R
- Step 1: Importing the required libraries
- Step 2: Loading the Netfix Stock Price Dataset
- Step 3: Checking the dimension and missing values of our data
- Step 4: Taking the summary of the data
- Step 5: Plotting the data
- Step 6: Model building
- Step 7: Model Fitting
- Executing and Checking the Model Summary
- Checking Accuracy of Netflix Stock Price Prediction Model
- Performance Comparison on Netflix Stock Price Prediction Model on Training vs Test Data Set
- Predict Netflix Stock Price
- Calculate Test accuracy score