Partial Autocorrelation Functions using Python
Using Custom Generated dataset
Letâs compute the Partial Autocorrelation Function (PACF) using statsmodels library in Python.
Importing Libraries:
Python3
import pandas as pd import numpy as np from statsmodels.tsa.stattools import pacf from statsmodels.graphics.tsaplots import plot_pacf |
- pandas as pd: Imports the Pandas library with an alias pd. Pandas is commonly used for handling structured data.
- numpy as np: Imports the NumPy library with an alias np. NumPy is used for numerical computations.
- from statsmodels.tsa.stattools import pacf: Imports the pacf function from the statsmodels library. This function is used to compute the Partial Autocorrelation Function (PACF) values.
Generating Time Series Data:
Python3
np.random.seed( 42 ) time_steps = np.linspace( 0 , 10 , 100 ) data = np.sin(time_steps) + np.random.normal(scale = 0.2 , size = len (time_steps)) |
- np.random.seed(42): Sets the seed for random number generation in NumPy to ensure reproducibility.
- time_steps = np.linspace(0, 10, 100): Creates an array of 100 evenly spaced numbers from 0 to 10.
- data = np.sin(time_steps) + np.random.normal(scale=0.2, size=len(time_steps)): Generates a sine wave using np.sin(time_steps) and adds random noise using np.random.normal() to create a synthetic time series data. This data mimics a sine wave pattern with added noise.
Computing and Plotting PACF :
- pacf_values = pacf(data, nlags=20): Calculates the Partial Autocorrelation Function (PACF) values using the pacf function from statsmodels. It computes PACF values for the provided data with a specified number of lags (nlags=20). Change nlags according to the length of your time series data or the number of lags you want to investigate.
- PACF Plotting: Create a plot representing the PACF values against lags to visualize partial correlations. Set title, labels for axes, and display the PACF plot.
- for lag, pacf_val in enumerate(pacf_values): Iterates through the computed PACF values. The enumerate() function provides both the lag number (lag) and the corresponding PACF value (pacf_val), which are then printed for each lag.
Python3
pacf_values = pacf(data, nlags = 20 ) # Print PACF values print ( "Partial Autocorrelation Function (PACF) values:" ) for lag, pacf_val in enumerate (pacf_values): print (f "Lag {lag}: {pacf_val}" ) # Plot PACF plt.figure(figsize = ( 10 , 5 )) plot_pacf(data, lags = 20 ) # Change lags according to your data plt.title( 'Partial Autocorrelation Function (PACF)' ) plt.xlabel( 'Lags' ) plt.ylabel( 'PACF' ) plt.grid( True ) plt.show() |
Output:
Partial Autocorrelation Function (PACF) values:
Lag 0: 1.0
Lag 1: 0.9277779190634952
Lag 2: 0.39269022809503606
Lag 3: 0.15463548623480705
Lag 4: -0.03886302489844257
Lag 5: -0.042933753723446405
Lag 6: -0.3632570559137871
Lag 7: -0.2817338901669104
Lag 8: -0.3931692351265865
Lag 9: -0.16550939301708287
Lag 10: -0.27973978478073214
Lag 11: 0.1370484695314932
Lag 12: -0.20445377972909687
Lag 13: -0.12087299096297043
Lag 14: 0.046229571707022764
Lag 15: -0.3654906423192799
Lag 16: -0.36058859364402557
Lag 17: -0.4949891744857339
Lag 18: -0.3466588099640611
Lag 19: -0.30607850279663795
Lag 20: -0.3277911710431029
These values represent the Partial Autocorrelation Function (PACF) values calculated for each lag.
Each line in the output indicates the lag number and its corresponding PACF value. Positive or negative values indicate positive or negative correlations respectively, while values close to zero suggest weaker correlations at that lag.
Using Real world Dataset
Importing Required Libraries and Dataset Retrieval
- Imports: Import necessary libraries such as Pandas for data manipulation, Matplotlib for plotting, pacf from statsmodels.tsa.stattools for PACF computation, and get_rdataset from statsmodels.datasets to obtain the âAirPassengersâ dataset.
- Loading Dataset: Retrieve the âAirPassengersâ dataset using get_rdataset. Convert the index to datetime format.
Python3
import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.stattools import pacf from statsmodels.datasets import get_rdataset # Load the 'AirPassengers' dataset from statsmodels data = get_rdataset( 'AirPassengers' ).data # Convert the index to datetime format data.index = pd.to_datetime(data[ 'time' ]) |
Plotting Time Series Data
- Time Series Plotting: Create a figure and plot the âAirPassengersâ time series data using Matplotlib. Set title, labels for axes, and display the plot.
Python3
# Plot the time series data plt.figure(figsize = ( 10 , 5 )) plt.plot(data[ 'value' ]) plt.title( 'Airline Passengers Over Time' ) plt.xlabel( 'Year' ) plt.ylabel( 'Passenger Count' ) plt.grid( True ) plt.show() |
Output:
Calculating and Plotting PACF
- PACF Computation: Compute the Partial Autocorrelation Function (PACF) values for the âAirPassengersâ dataset using pacf from statsmodels. Define the number of lags as 20.
- PACF Plotting: Create a bar plot representing the PACF values against lags to visualize partial correlations. Set title, labels for axes, and display the PACF plot.
Python3
# Calculate PACF using statsmodels pacf function pacf_values = pacf(data[ 'value' ], nlags = 20 ) # Plot PACF plt.figure(figsize = ( 10 , 5 )) plt.bar( range ( len (pacf_values)), pacf_values) plt.title( 'Partial Autocorrelation Function (PACF)' ) plt.xlabel( 'Lags' ) plt.ylabel( 'PACF' ) plt.grid( True ) plt.show() |
Output:
Interpreting PACF plots involves identifying these significant spikes or âpartial correlations.â A significant spike at a particular lag implies a strong correlation between the variable and its value at that lag, independent of the other lags. For instance, a PACF plot showcasing a significant spike at lag 1 but no significant spikes at subsequent lags suggests a first-order autoregressive process, often denoted as AR(1) in time series analysis.
Understanding Partial Autocorrelation Functions (PACF) in Time Series Data
Partial autocorrelation functions (PACF) play a pivotal role in time series analysis, offering crucial insights into the relationship between variables while mitigating confounding influences. In essence, PACF elucidates the direct correlation between a variable and its lagged values after removing the effects of intermediary time steps. This statistical tool holds significance across various disciplines, including economics, finance, meteorology, and more, enabling analysts to unveil hidden patterns and forecast future trends with enhanced accuracy.