Vector Autoregression (VAR) for Multivariate Time Series

Vector Autoregression (VAR) is a statistical tool used to investigate the dynamic relationships between multiple time series variables. Unlike univariate autoregressive models, which only forecast a single variable based on its previous values, VAR models investigate the interconnectivity of many variables. They accomplish this by modeling each variable as a function of not only its previous values but also of the past values of other variables in the system. In this article, we are going to explore the fundamentals of Vector Autoregression.

Table of Content

  • What is Vector Autoregression?
  • Mathematical Intuition of VAR Equations
  • Assumptions underlying the VAR model
  • Steps to Implement VAR on Time Series Model
    • Step 1: Importing necessary libraries
    • Step 2: Generate Sample Data
    • Step 3: Function to plot time series
    • Step 4: Function to check stationarity
    • Step 5: VAR analysis
      • Output Explanation
  • Applications of VAR Models

What is Vector Autoregression?

Vector Autoregression was first presented in the 1960s by economist Clive Granger. Granger’s significant discoveries laid the framework for understanding and modeling the dynamic interactions that exist among economic factors. VAR models acquired significant momentum in econometrics and macroeconomics during the 1970s and 1980s.

Vector Autoregression (VAR) is a multivariate extension of autoregression (AR) models. While traditional AR models analyze the relationship between a single variable and its lagged values, VAR models consider multiple variables simultaneously. In a VAR model, each variable is regressed on its own lagged values as well as lagged values of other variables in the system.

Mathematical Intuition of VAR Equations

VAR models are mathematically represented as a system of simultaneous equations, where each equation describes the behavior of one variable as a function of its own lagged values and the lagged values of all other variables in the system.

Mathematically, a VAR(p) model with ‘p’ lags can be represented as:

[Tex]Y_t = c + \Phi_1 Y_{t-1} + \Phi_2 Y_{t-2} + \dots + \Phi_p Y_{t-p} + \varepsilon_t [/Tex]

Here,

  • [Tex]Y_t[/Tex]: This represents the value of the time series at time t.
  • c: This represents the constant intercept term in the model.
  • [Tex]\Phi_1, \Phi_2, …, \Phi_p[/Tex]: These represent the autoregressive coefficients for lags 1, 2, …, p, respectively.
  • [Tex]Y_{t-1}, Y_{t-2}, …, Y_{t-p}[/Tex]: These represent the values of the time series at lags 1, 2, …, p before time t.
  • [Tex]\varepsilon_t[/Tex]: This represents the error term at time t.

To ensure the validity and trustworthiness of the results from VAR analysis, various assumptions and requirements must be met.

Assumptions underlying the VAR model

VAR analysis is subject to several assumptions and requirements to ensure the validity and reliability of the results:

  1. Linearity: Relationships between variables are linear.
  2. Stationarity: Time series data are stationary.
  3. No Perfect Multicollinearity: No perfect linear relationships exist between variables.
  4. No Autocorrelation in Residuals: Residuals are not serially correlated.
  5. Homoscedasticity: Residual variance is constant.
  6. No Endogeneity: Variables are not affected by omitted factors.
  7. Exogeneity: Explanatory variables are not influenced by other variables.
  8. Sufficient Observations: Adequate data for parameter estimation.
  9. Weak Exogeneity: Some variables may be endogenous but not contemporaneously correlated with errors.

Steps to Implement VAR on Time Series Model

The code conducts Vector Autoregression (VAR) analysis on randomly generated time series data, including stationarity testing, VAR modeling, forecasting, and visualization of the forecasted outcomes.

Step 1: Importing necessary libraries

Python

import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.api import VAR from statsmodels.tsa.stattools import adfuller


Step 2: Generate Sample Data

Python

# Sample data generation np.random.seed(0) dates = pd.date_range(start='2024-01-01', periods=100) data = pd.DataFrame(np.random.randn(100, 3), index=dates, columns=['A', 'B', 'C'])


Step 3: Function to plot time series

Python

# Function to plot time series def plot_series(data): fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 8)) for i, col in enumerate(data.columns): data[col].plot(ax=axes[i], title=col) axes[i].set_ylabel('Values') axes[i].set_xlabel('Date') plt.tight_layout() plt.show() plot_series(data)

Output:


Generated Sample Data


Step 4: Function to check stationarity

Checking for stationarity in time series data is crucial for VAR (Vector Autoregression) modeling because VAR assumes that the time series variables are stationary. Stationarity implies that the statistical properties of the time series remain constant over time, such as mean, variance, and autocorrelation.

Python

# Check stationarity of time series using ADF test def check_stationarity(timeseries): result = adfuller(timeseries) print('ADF Statistic:', result[0]) print('p-value:', result[1]) print('Critical Values:') for key, value in result[4].items(): print('\t%s: %.3f' % (key, value))


Step 5: VAR analysis

This part defines a function var_analysis(data) that conducts Vector Autoregression (VAR) analysis on the given dataset. It consists of four steps: checking stationarity and visualizing the original data, applying the VAR model, forecasting future values, and visualizing the forecast. Finally, it calls the var_analysis() function with the provided data to execute the analysis.

In the third step, the code forecasts future values using the VAR model. It first determines the lag order of the model (lag_order) and then uses this information to generate forecasts for the next 10 steps (steps=10) and in fourth step, the forecasted values are visualized. A new set of date indices (forecast_index) starting from ‘2024-04-11’ for the next 10 periods is created.

Python

# Section for VAR analysis def var_analysis(data): # Step 1: Check stationarity and visualize the original data print("Step 1: Checking stationarity") for col in data.columns: print('Stationarity test for', col) check_stationarity(data[col]) # Step 2: Applying VAR model print("\nStep 2: Applying VAR model") model = VAR(data) results = model.fit() # Step 3: Forecasting print("\nStep 3: Forecasting") lag_order = results.k_ar forecast = results.forecast(data.values[-lag_order:], steps=10) # Step 4: Visualizing forecast print("\nStep 4: Visualizing forecast") forecast_index = pd.date_range(start='2024-04-11', periods=10) forecast_data = pd.DataFrame(forecast, index=forecast_index, columns=data.columns) plot_series(pd.concat([data, forecast_data])) # Perform VAR analysis var_analysis(data)

Output:

Step 1: Checking stationarity and visualizing the original data Stationarity test for A ADF Statistic: -8.43759993424834 p-value: 1.7990274249398063e-13 Critical Values: 1%: -3.498 5%: -2.891 10%: -2.583 Stationarity test for B ADF Statistic: -11.229664527662438 p-value: 1.9214648218450937e-20 Critical Values: 1%: -3.498 5%: -2.891 10%: -2.583 Stationarity test for C ADF Statistic: -9.028783852793346 p-value: 5.516998045646418e-15 Critical Values: 1%: -3.498 5%: -2.891 10%: -2.583 Step 2: Applying VAR model Step 3: Forecasting Step 4: Visualizing forecast


Forecasting for period of next 10 steps


Output Explanation

The results of the Augmented Dickey-Fuller (ADF) test for each variable in the dataset.

  • Stationarity test for A: The ADF statistic is -8.438, and the p-value is approximately 1.799e-13. Since the p-value is much smaller than 0.05 (a common significance level), we reject the null hypothesis of non-stationarity. The critical values at 1%, 5%, and 10% significance levels are also provided for reference.
  • Stationarity test for B: The ADF statistic is -11.230, and the p-value is approximately 1.921e-20. Again, since the p-value is much smaller than 0.05, we reject the null hypothesis of non-stationarity. The critical values at different significance levels are also provided.
  • Stationarity test for C: The ADF statistic is -9.029, and the p-value is approximately 5.517e-15. Similar to variables A and B, the small p-value indicates that we reject the null hypothesis of non-stationarity for variable C. Critical values at different significance levels are also provided.

All three variables (A, B, and C) in the dataset are stationary based on the results of the Augmented Dickey-Fuller test.

Applications of VAR Models

  1. Economic Forecasting: VAR models are widely used in economics to forecast the behavior of economic variables such as GDP, inflation, and interest rates.
  2. Causal Inference: By studying the impulse responses generated by VAR models, researchers can infer the causal impact of one variable on another. This is particularly valuable in policy evaluation.
  3. Financial Markets: VAR models can be used to predict financial indices, stocks and asset prices.