Vector Autoregression (VAR) for Multivariate Time Series

Q: What is Vector Autoregression (VAR) for Multivariate Time Series?

In this article, we will learn Vector Autoregression (VAR) for Multivariate Time Series,This free Python tutorial for complete beginners will help you learn Python from scratch.

Q: How to use Vector Autoregression (VAR) for Multivariate Time Series in Python?

Vector Autoregression (VAR) is a statistical tool used to investigate the dynamic relationships between multiple time series variables

Multivariate Time Series Forecasting with GRUs

Vector Autoregression (VAR) is a statistical tool used to investigate the dynamic relationships between multiple time series variables. Unlike univariate autoregressive models, which only forecast a single variable based on its previous values, VAR models investigate the interconnectivity of many variables. They accomplish this by modeling each variable as a function of not only its previous values but also of the past values of other variables in the system. In this article, we are going to explore the fundamentals of Vector Autoregression.

Table of Content

What is Vector Autoregression?
Mathematical Intuition of VAR Equations
Assumptions underlying the VAR model
Steps to Implement VAR on Time Series Model

Step 1: Importing necessary libraries
Step 2: Generate Sample Data
Step 3: Function to plot time series
Step 4: Function to check stationarity
Step 5: VAR analysis

Output Explanation

Applications of VAR Models

What is Vector Autoregression?

Vector Autoregression was first presented in the 1960s by economist Clive Granger. Granger’s significant discoveries laid the framework for understanding and modeling the dynamic interactions that exist among economic factors. VAR models acquired significant momentum in econometrics and macroeconomics during the 1970s and 1980s.

Vector Autoregression (VAR) is a multivariate extension of autoregression (AR) models. While traditional AR models analyze the relationship between a single variable and its lagged values, VAR models consider multiple variables simultaneously. In a VAR model, each variable is regressed on its own lagged values as well as lagged values of other variables in the system.

Mathematical Intuition of VAR Equations

VAR models are mathematically represented as a system of simultaneous equations, where each equation describes the behavior of one variable as a function of its own lagged values and the lagged values of all other variables in the system.

Mathematically, a VAR(p) model with ‘p’ lags can be represented as:

[Tex]Y_t = c + \Phi_1 Y_{t-1} + \Phi_2 Y_{t-2} + \dots + \Phi_p Y_{t-p} + \varepsilon_t [/Tex]

Here,

[Tex]Y_t[/Tex]: This represents the value of the time series at time t.
c: This represents the constant intercept term in the model.
[Tex]\Phi_1, \Phi_2, …, \Phi_p[/Tex]: These represent the autoregressive coefficients for lags 1, 2, …, p, respectively.
[Tex]Y_{t-1}, Y_{t-2}, …, Y_{t-p}[/Tex]: These represent the values of the time series at lags 1, 2, …, p before time t.
[Tex]\varepsilon_t[/Tex]: This represents the error term at time t.

To ensure the validity and trustworthiness of the results from VAR analysis, various assumptions and requirements must be met.

Assumptions underlying the VAR model

VAR analysis is subject to several assumptions and requirements to ensure the validity and reliability of the results:

Linearity: Relationships between variables are linear.
Stationarity: Time series data are stationary.
No Perfect Multicollinearity: No perfect linear relationships exist between variables.
No Autocorrelation in Residuals: Residuals are not serially correlated.
Homoscedasticity: Residual variance is constant.
No Endogeneity: Variables are not affected by omitted factors.
Exogeneity: Explanatory variables are not influenced by other variables.
Sufficient Observations: Adequate data for parameter estimation.
Weak Exogeneity: Some variables may be endogenous but not contemporaneously correlated with errors.

Steps to Implement VAR on Time Series Model

The code conducts Vector Autoregression (VAR) analysis on randomly generated time series data, including stationarity testing, VAR modeling, forecasting, and visualization of the forecasted outcomes.

Step 1: Importing necessary libraries

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller

Step 2: Generate Sample Data

Python

# Sample data generation
np.random.seed(0)
dates = pd.date_range(start='2024-01-01', periods=100)
data = pd.DataFrame(np.random.randn(100, 3), index=dates, columns=['A', 'B', 'C'])

Step 3: Function to plot time series

Python

# Function to plot time series
def plot_series(data):
    fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8))
    for i, col in enumerate(data.columns):
        data[col].plot(ax=axes[i], title=col)
        axes[i].set_ylabel('Values')
        axes[i].set_xlabel('Date')
    plt.tight_layout()
    plt.show()
    
plot_series(data)

Output:

Generated Sample Data

Step 4: Function to check stationarity

Checking for stationarity in time series data is crucial for VAR (Vector Autoregression) modeling because VAR assumes that the time series variables are stationary. Stationarity implies that the statistical properties of the time series remain constant over time, such as mean, variance, and autocorrelation.

Python

# Check stationarity of time series using ADF test
def check_stationarity(timeseries):
    result = adfuller(timeseries)
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))

Step 5: VAR analysis

This part defines a function var_analysis(data) that conducts Vector Autoregression (VAR) analysis on the given dataset. It consists of four steps: checking stationarity and visualizing the original data, applying the VAR model, forecasting future values, and visualizing the forecast. Finally, it calls the var_analysis() function with the provided data to execute the analysis.

In the third step, the code forecasts future values using the VAR model. It first determines the lag order of the model (lag_order) and then uses this information to generate forecasts for the next 10 steps (steps=10) and in fourth step, the forecasted values are visualized. A new set of date indices (forecast_index) starting from ‘2024-04-11’ for the next 10 periods is created.

Python

# Section for VAR analysis
def var_analysis(data):
    # Step 1: Check stationarity and visualize the original data
    print("Step 1: Checking stationarity")
    for col in data.columns:
        print('Stationarity test for', col)
        check_stationarity(data[col])

    # Step 2: Applying VAR model
    print("\nStep 2: Applying VAR model")
    model = VAR(data)
    results = model.fit()

    # Step 3: Forecasting
    print("\nStep 3: Forecasting")
    lag_order = results.k_ar
    forecast = results.forecast(data.values[-lag_order:], steps=10)

    # Step 4: Visualizing forecast
    print("\nStep 4: Visualizing forecast")
    forecast_index = pd.date_range(start='2024-04-11', periods=10)
    forecast_data = pd.DataFrame(forecast, index=forecast_index, columns=data.columns)
    plot_series(pd.concat([data, forecast_data]))

# Perform VAR analysis
var_analysis(data)

Output:

Step 1: Checking stationarity and visualizing the original data Stationarity test for A ADF Statistic: -8.43759993424834 p-value: 1.7990274249398063e-13 Critical Values: 1%: -3.498 5%: -2.891 10%: -2.583 Stationarity test for B ADF Statistic: -11.229664527662438 p-value: 1.9214648218450937e-20 Critical Values: 1%: -3.498 5%: -2.891 10%: -2.583 Stationarity test for C ADF Statistic: -9.028783852793346 p-value: 5.516998045646418e-15 Critical Values: 1%: -3.498 5%: -2.891 10%: -2.583 Step 2: Applying VAR model Step 3: Forecasting Step 4: Visualizing forecast

Forecasting for period of next 10 steps

Output Explanation

The results of the Augmented Dickey-Fuller (ADF) test for each variable in the dataset.

Stationarity test for A: The ADF statistic is -8.438, and the p-value is approximately 1.799e-13. Since the p-value is much smaller than 0.05 (a common significance level), we reject the null hypothesis of non-stationarity. The critical values at 1%, 5%, and 10% significance levels are also provided for reference.
Stationarity test for B: The ADF statistic is -11.230, and the p-value is approximately 1.921e-20. Again, since the p-value is much smaller than 0.05, we reject the null hypothesis of non-stationarity. The critical values at different significance levels are also provided.
Stationarity test for C: The ADF statistic is -9.029, and the p-value is approximately 5.517e-15. Similar to variables A and B, the small p-value indicates that we reject the null hypothesis of non-stationarity for variable C. Critical values at different significance levels are also provided.

All three variables (A, B, and C) in the dataset are stationary based on the results of the Augmented Dickey-Fuller test.

Applications of VAR Models

Economic Forecasting: VAR models are widely used in economics to forecast the behavior of economic variables such as GDP, inflation, and interest rates.
Causal Inference: By studying the impulse responses generated by VAR models, researchers can infer the causal impact of one variable on another. This is particularly valuable in policy evaluation.
Financial Markets: VAR models can be used to predict financial indices, stocks and asset prices.

Tags:

#AI-ML-DS With Python #Data Science Blogathon 2024 #Time Series #AI-ML-DS #Blogathon #Machine Learning #Machine Learning

Multivariate Time Series Forecasting with GRUs