Implementation of Removing Non Stationarity

Tests to Determine Stationarity

This section presents essential data preprocessing techniques for achieving stationarity in time series analysis. Techniques include detrending, seasonal adjustment, logarithmic transformation, and differencing, followed by stationarity tests to validate the transformations, ensuring robust and accurate analysis of the data.

Importing Necessary Libraries and Creating Sample Data

Python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sample data
date_rng = pd.date_range(start='2022-01-01', end='2022-12-31', freq='D')
ts = pd.Series(np.random.randn(len(date_rng)), index=date_rng)

Detrending using a rolling window

ts_detrended = ts - ts.rolling(window=30).mean(): This calculates the detrended series by subtracting the rolling mean from the original time series ts. The rolling(window=30).mean() computes the rolling mean over a window of size 30.
Plotting: This code plots both the original and detrended series using matplotlib.

Python3

# Detrending using a rolling window
ts_detrended = ts - ts.rolling(window=30).mean()

# Plot original and detrended series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_detrended, label='Detrended', linestyle='--')
plt.legend()
plt.show()

Output:

Test to determine stationarity

Python3

from statsmodels.tsa.stattools import adfuller

# Test for stationarity after detrending
result_detrended = adfuller(ts_detrended.dropna())
print(f'ADF Statistic (Detrended): {result_detrended[0]}')
print(f'p-value (Detrended): {result_detrended[1]}')
print(f'Critical Values (Detrended): {result_detrended[4]}')

Output:

ADF Statistic (Detrended): -18.559254822829608 p-value (Detrended): 2.0882820619850462e-30 Critical Values (Detrended): {'1%': -3.4500219858626227, '5%': -2.870206553997666, '10%': -2.571387268879483}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the detrended series is stationary.

The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. Here, it is significantly negative, suggesting strong evidence in favor of stationarity.

Seasonal Adjustment

Python3

from statsmodels.tsa.seasonal import STL

# Seasonal adjustment
stl = STL(ts, seasonal=13)  # Assuming yearly seasonality
res = stl.fit()
ts_seasonal_adj = ts - res.seasonal

# Plot original and seasonally adjusted series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_seasonal_adj, label='Seasonally Adjusted', linestyle='--')
plt.legend()
plt.show()

Output:

Test for stationarity:

Python3

# Test for stationarity after seasonal adjustment
result_seasonal_adj = adfuller(ts_seasonal_adj.dropna())
print(f'ADF Statistic (Seasonally Adjusted): {result_seasonal_adj[0]}')
print(f'p-value (Seasonally Adjusted): {result_seasonal_adj[1]}')
print(f'Critical Values (Seasonally Adjusted): {result_seasonal_adj[4]}')

Output:

ADF Statistic (Seasonally Adjusted): -4.651034555303582 p-value (Seasonally Adjusted): 0.00010390367939221074 Critical Values (Seasonally Adjusted): {'1%': -3.4491725955218655, '5%': -2.8698334971428574, '10%': -2.5711883591836733}

The p-value is small, indicating that there is strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the seasonally adjusted series is stationary.

The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is negative, indicating evidence in favor of stationarity.

Logarithmic Transformation

Python3

# Transformation (e.g., logarithmic)
ts_log = np.log(ts)

# Plot original and transformed series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_log, label='Log Transformed', linestyle='--')
plt.legend()
plt.show()

Output:

Test for Stationarity

Python3

# Test for stationarity after variance stabilization (log transformation)
result_log = adfuller(ts_log.dropna())
print(f'ADF Statistic (Log Transformed): {result_log[0]}')
print(f'p-value (Log Transformed): {result_log[1]}')
print(f'Critical Values (Log Transformed): {result_log[4]}')

Output:

ADF Statistic (Log Transformed): -14.60629558553864 p-value (Log Transformed): 4.08969119294649e-27 Critical Values (Log Transformed): {'1%': -3.467004502498507, '5%': -2.8776444997243558, '10%': -2.575355189707274}

The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, indicating strong evidence in favor of stationarity.

Differencing to Remove Auto Correlation

Python3

# Differencing to reduce autocorrelation
ts_diff = ts.diff().dropna()

# Plot original and differenced series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_diff, label='Differenced', linestyle='--')
plt.legend()
plt.show()

Output:

Test for Stationarity

Python3

# Test for stationarity after differencing
result_diff = adfuller(ts_diff.dropna())
print(f'ADF Statistic (Differenced): {result_diff[0]}')
print(f'p-value (Differenced): {result_diff[1]}')
print(f'Critical Values (Differenced): {result_diff[4]}')

Output:

ADF Statistic (Differenced): -8.439660110734907 p-value (Differenced): 1.7773358987173984e-13 Critical Values (Differenced): {'1%': -3.4492815848836296, '5%': -2.8698813715275406, '10%': -2.5712138845950587}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the differenced series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the differenced series is stationary.

The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, suggesting strong evidence in favor of stationarity.

How to Remove Non-Stationarity in Time Series Forecasting

Removing non-stationarity in time series data is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irregular patterns in the data.

The article comprehensively covers techniques and tests for removing non-stationarity in time series data, crucial for accurate forecasting, including detrending, seasonal adjustment, logarithmic transformation, differencing, and ADF/KPSS tests for stationarity validation.

Tags:

#AI-ML-DS With Python #Time Series #AI-ML-DS #Machine Learning #Machine Learning

Tests to Determine Stationarity

Implementation of Removing Non Stationarity

Importing Necessary Libraries and Creating Sample Data

Detrending using a rolling window

Test to determine stationarity

Seasonal Adjustment

Logarithmic Transformation

Test for Stationarity

Differencing to Remove Auto Correlation

Test for Stationarity

How to Remove Non-Stationarity in Time Series Forecasting

Similar Reads