Box-Jenkins Method

Box-Jenkins method is a type of forecasting and analyzing methodology for time series data. Box-Jenkins method comprises of three stages through which time series analysis could be performed. It comprises of different steps including identification, estimation, diagnostic checking, model refinement and forecasting. The Box-Jenkins method is an iterative process, and steps 1 to 4 from identification to model refinement are often repeated until a suitable and well-diagnosed model is obtained. It is important to note that the method assumes that the underlying time series data is generated by a stationary and linear process. The different stages of the Box-Jenkins model could be identified as:

Identification:

Identification is the first step of Box-Jenkins method it helps in determining the orders of autoregressive (AR), differencing (I), and moving average (MA) components that are appropriate for a given time series. This step helps in identifying the values of p, d and q for the given time series. Let’s see the key stages involved in this phase:

Stationarity Check: This process happens before the ARIMA modelling, stationarity check is the process in which statistical properties of time series such as mean, and variance are checked so that they do not change with time. If the data is not stationary differencing is done so that the data becomes stationary. Stationarity can be assessed visually and through statistical tests, such as the Augmented Dickey-Fuller (ADF) test.
Autocorrelation and Partial Autocorrelation Analysis: Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots are the main tools in identifying the orders of the AR and MA components. The ACF plot shows the correlation between the current observation and its past observations at various lag points. Whereas, the PACF plot shows the correlation between the current observation and its past observations, removing the effects of intermediate observations.
Seasonality Check: If the time series data has seasonality, it is important to account for it in the model. Seasonality can be identified through visual inspection of the time series plot or by using seasonal decomposition techniques.
Differencing Order: Differencing is often required to make the time series stationary. The order of differencing (d) is determined based on the number of differences needed to achieve stationarity.

Estimation:

Estimation is the second stage in the Box-Jenkins methodology for ARIMA modeling. In this stage, the identified ARIMA model parameters, including the autoregressive (AR), differencing (I), and moving average (MA) components, are estimated based on historical time series data. The primary goal is to fit the chosen ARIMA model to the observed data. Let’s see the key stages involved in this phase:

Model Selection: After the identification order of (p, d, q) of the ARIMA model the next step is to select the exact model based on these orders. This step involves selecting the autoregressive (AR) and moving average (MA) lags based on the patterns identified in the autocorrelation function (ACF) and partial autocorrelation function (PACF) during the identification phase. Even though it might not be a good selection of orders we can compare different candidate models using criteria like Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). We can choose the model with lowest AIC or BIC, balancing goodness of fit with model complexity.
Parameter Estimation: Once the ARIMA model is specified, the next step is to estimate the parameters of the model. The estimation involves finding the values of the autoregressive coefficients (), the moving average coefficients (), and any other parameters in the model.
Model Fitting: With the parameter estimates in hand, the ARIMA model is fitted to the historical data. The model is used to generate predicted values, and the fit is assessed by comparing these predictions to the actual observed values.

Diagnostic Checking:

Diagnostic checking is an important step in the Box-Jenkins methodology for ARIMA modeling. It involves evaluating the acceptance of the fitted ARIMA model by examining the residuals, which are the differences between the observed and predicted values. The goal is to ensure that the residuals are random and do not contain any patterns or structure. Now, let’s discuss the key aspects of diagnostic checking in Box-Jenkins:

Residual Analysis: Residuals are the differences between the actual observations and the values predicted by the ARIMA model. Analyzing the residuals helps identify any remaining patterns or systematic errors in the model.
Ljung-Box Test: The Ljung-Box test helps us check whether the errors or residuals in our model have any patterns or correlations. The null hypothesis it assesses is that there are no significant correlations among the residuals. In simpler terms, it tests if the leftover errors after modeling are random and don’t follow a specific pattern.
Mean and Variance Check: We have to ensure that the residuals have a mean close to zero and a constant variance. If the mean is significantly different from zero or the variance is not constant, it suggests that the model is not doing a consistent job, and its errors are becoming more unpredictable.
Iterative Refinement: Diagnostic checking is often an iterative process. If the initial diagnostic checks reveal issues, such as autocorrelation, non-constant variance, or outliers, the model may need to be refined.

Model Refinement:

The model refinement stage in the Box-Jenkins method involves a thorough evaluation of the estimated ARIMA model to ensure that it meets the required statistical assumptions and adequately captures the patterns in the time series data. If there are some issues in the model diagnostics, it will be required to refine the model by altering the orders of autoregressive, integrated and moving average or by considering additional factors which were not considered earlier. After rechecking and re-establishing the order of different components or by considering additional elements the diagnostic checks are again to be performed.

Once a satisfactory model is identified and validated, it could be used for the prediction purposes for future time series data points. Now let’s discuss the application of Box-Jenkins method.