Implementation of Removing Non Stationarity
This section presents essential data preprocessing techniques for achieving stationarity in time series analysis. Techniques include detrending, seasonal adjustment, logarithmic transformation, and differencing, followed by stationarity tests to validate the transformations, ensuring robust and accurate analysis of the data.
Importing Necessary Libraries and Creating Sample Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample data
date_rng = pd.date_range(start='2022-01-01', end='2022-12-31', freq='D')
ts = pd.Series(np.random.randn(len(date_rng)), index=date_rng)
Detrending using a rolling window
ts_detrended = ts - ts.rolling(window=30).mean()
: This calculates the detrended series by subtracting the rolling mean from the original time seriests
. Therolling(window=30).mean()
computes the rolling mean over a window of size 30.- Plotting: This code plots both the original and detrended series using
matplotlib
.
# Detrending using a rolling window
ts_detrended = ts - ts.rolling(window=30).mean()
# Plot original and detrended series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_detrended, label='Detrended', linestyle='--')
plt.legend()
plt.show()
Output:
Test to determine stationarity
from statsmodels.tsa.stattools import adfuller
# Test for stationarity after detrending
result_detrended = adfuller(ts_detrended.dropna())
print(f'ADF Statistic (Detrended): {result_detrended[0]}')
print(f'p-value (Detrended): {result_detrended[1]}')
print(f'Critical Values (Detrended): {result_detrended[4]}')
Output:
ADF Statistic (Detrended): -18.559254822829608 p-value (Detrended): 2.0882820619850462e-30 Critical Values (Detrended): {'1%': -3.4500219858626227, '5%': -2.870206553997666, '10%': -2.571387268879483}
The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the detrended series is stationary.
The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. Here, it is significantly negative, suggesting strong evidence in favor of stationarity.
Seasonal Adjustment
from statsmodels.tsa.seasonal import STL
# Seasonal adjustment
stl = STL(ts, seasonal=13) # Assuming yearly seasonality
res = stl.fit()
ts_seasonal_adj = ts - res.seasonal
# Plot original and seasonally adjusted series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_seasonal_adj, label='Seasonally Adjusted', linestyle='--')
plt.legend()
plt.show()
Output:
Test for stationarity:
# Test for stationarity after seasonal adjustment
result_seasonal_adj = adfuller(ts_seasonal_adj.dropna())
print(f'ADF Statistic (Seasonally Adjusted): {result_seasonal_adj[0]}')
print(f'p-value (Seasonally Adjusted): {result_seasonal_adj[1]}')
print(f'Critical Values (Seasonally Adjusted): {result_seasonal_adj[4]}')
Output:
ADF Statistic (Seasonally Adjusted): -4.651034555303582 p-value (Seasonally Adjusted): 0.00010390367939221074 Critical Values (Seasonally Adjusted): {'1%': -3.4491725955218655, '5%': -2.8698334971428574, '10%': -2.5711883591836733}
The p-value is small, indicating that there is strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the seasonally adjusted series is stationary.
The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is negative, indicating evidence in favor of stationarity.
Logarithmic Transformation
# Transformation (e.g., logarithmic)
ts_log = np.log(ts)
# Plot original and transformed series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_log, label='Log Transformed', linestyle='--')
plt.legend()
plt.show()
Output:
Test for Stationarity
# Test for stationarity after variance stabilization (log transformation)
result_log = adfuller(ts_log.dropna())
print(f'ADF Statistic (Log Transformed): {result_log[0]}')
print(f'p-value (Log Transformed): {result_log[1]}')
print(f'Critical Values (Log Transformed): {result_log[4]}')
Output:
ADF Statistic (Log Transformed): -14.60629558553864 p-value (Log Transformed): 4.08969119294649e-27 Critical Values (Log Transformed): {'1%': -3.467004502498507, '5%': -2.8776444997243558, '10%': -2.575355189707274}
The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the log-transformed series is stationary.
The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, indicating strong evidence in favor of stationarity.
Differencing to Remove Auto Correlation
# Differencing to reduce autocorrelation
ts_diff = ts.diff().dropna()
# Plot original and differenced series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_diff, label='Differenced', linestyle='--')
plt.legend()
plt.show()
Output:
Test for Stationarity
# Test for stationarity after differencing
result_diff = adfuller(ts_diff.dropna())
print(f'ADF Statistic (Differenced): {result_diff[0]}')
print(f'p-value (Differenced): {result_diff[1]}')
print(f'Critical Values (Differenced): {result_diff[4]}')
Output:
ADF Statistic (Differenced): -8.439660110734907 p-value (Differenced): 1.7773358987173984e-13 Critical Values (Differenced): {'1%': -3.4492815848836296, '5%': -2.8698813715275406, '10%': -2.5712138845950587}
The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the differenced series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the differenced series is stationary.
The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, suggesting strong evidence in favor of stationarity.
How to Remove Non-Stationarity in Time Series Forecasting
Removing non-stationarity in time series data is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irregular patterns in the data.
The article comprehensively covers techniques and tests for removing non-stationarity in time series data, crucial for accurate forecasting, including detrending, seasonal adjustment, logarithmic transformation, differencing, and ADF/KPSS tests for stationarity validation.