What is Cross-correlation Analysis in Python?

In this article, we will learn Cross-correlation Analysis in Python,This free Python tutorial for complete beginners will help you learn Python from scratch.

Cross-correlation Analysis in Python

Covariance and Correlation in R Programming

How to become Certified Business Analyst?

Cross-correlation analysis is a powerful technique in signal processing and time series analysis used to measure the similarity between two series at different time lags. It reveals how one series (reference) is correlated with the other (target) when shifted by a specific amount. This information is valuable in various domains, including finance (identifying stock market correlations), neuroscience (analyzing brain activity), and engineering (evaluating system responses).

In this article, we’ll explore four methods for performing cross-correlation analysis in Python, providing clear explanations and illustrative examples.

Understanding Cross-correlation
Implementation of Cross-correlation Analysis in Python

Method 1. Cross-correlation Analysis Using Python
Method 2. Cross-correlation Analysis Using Numpy
Method 3. Cross-correlation Analysis Using Scipy
Method 4. Cross-correlation Analysis Using Statsmodels

Understanding Cross-correlation

Cross-correlation measures the similarity between two sequences as a function of the displacement of one relative to the other. denoted by [Tex]R_{XY}(\tau)[/Tex] for various time or spatial lags where [Tex]\tau[/Tex] represents the lag between the two datasets. Calculating Cross-correlation analysis in Python helps in:

Time series data: This means data that’s collected over time, like stock prices, temperature readings, or sound waves.
Compares similarity at different lags: By shifting one set of data (like sliding the comb), it finds how well aligned they are at different points in time.
Ranges from -1 to 1: A value of 1 means the data sets perfectly overlap (like perfectly aligned combs), 0 means no correlation, and -1 means they are opposite (like the gaps in the combs lining up exactly out of sync).

Implementation of Cross-correlation Analysis in Python

There are major 4 methods to perform cross-correlation analysis in Python:

Python-Manual Function: Using basic Python functions and loops to compute cross-correlation.
NumPy: Utilizing NumPy’s fast numerical operations for efficient cross-correlation computation.
SciPy: Leveraging SciPy’s signal processing library for advanced cross-correlation calculations.
Statsmodels: Employing Statsmodels for statistical analysis, including cross-correlation.

Method 1. Cross-correlation Analysis Using Python

To show implementation let’s generate an dataset comprising two time series signals, signal1 and signal2, using a combination of sine and cosine functions with added noise. This dataset simulates real-world scenarios where signals often exhibit complex patterns and noise.

In the code, we define two different functions for calculating mean, second cross_correlation fucntion that takes two signals x and y where:

mean(x) and mean(y): Calculates the mean of each signal.
sum((a - x_mean) * (b - y_mean) for a, b in zip(x, y)): Calculates the numerator of the cross-correlation formula by summing the product of the differences between corresponding elements of x and y, centered around their means.
x_sq_diff and y_sq_diff calculate the sum of squared differences for each signal.
math.sqrt(x_sq_diff * y_sq_diff): Calculates the denominator of the cross-correlation formula by taking the square root of the product of the squared differences.

Python

import math
import random

# Generate signals
t = [i * 0.1 for i in range(100)]
signal1 = [math.sin(2 * math.pi * 2 * i) + 0.5 * math.cos(2 * math.pi * 3 * i) + random.normalvariate(0, 0.1) for i in t]
signal2 = [math.sin(2 * math.pi * 2 * i) + 0.5 * math.cos(2 * math.pi * 3 * i) + random.normalvariate(0, 0.1) for i in t]

# Define a function to calculate mean
def mean(arr):
    return sum(arr) / len(arr)
# function to calculate cross-correlation
def cross_correlation(x, y):
    # Calculate means
    x_mean = mean(x)
    y_mean = mean(y)
    
    # Calculate numerator
    numerator = sum((a - x_mean) * (b - y_mean) for a, b in zip(x, y))
    
    # Calculate denominators
    x_sq_diff = sum((a - x_mean) ** 2 for a in x)
    y_sq_diff = sum((b - y_mean) ** 2 for b in y)
    denominator = math.sqrt(x_sq_diff * y_sq_diff)
    correlation = numerator / denominator
    
    return correlation
  
correlation = cross_correlation(signal1, signal2)
print('Correlation:', correlation)

Output:

Manual Correlation: 0.9837294963190838

Method 2. Cross-correlation Analysis Using Numpy

NumPy’s corrcoef function is utilized to calculate the cross-correlation between signal1 and signal2.

Python

import numpy as np

# time array
t = np.arange(0, 10, 0.1)

# Generate signals
signal1 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
signal2 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))

numpy_correlation = np.corrcoef(signal1, signal2)[0, 1]
print('NumPy Correlation:', numpy_correlation)

Output:

NumPy Correlation: 0.9796920509627758

Method 3. Cross-correlation Analysis Using Scipy

SciPy’s pearsonr function is employed to calculate the cross-correlation between signal1 and signal2. The Pearson correlation coefficient measures the linear relationship between two datasets.

Python

import numpy as np

# time array
t = np.arange(0, 10, 0.1)

# Generate signals
signal1 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
signal2 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))

from scipy.stats import pearsonr

scipy_correlation, _ = pearsonr(signal1, signal2)
print('SciPy Correlation:', scipy_correlation)

Output:

SciPy Correlation: 0.9865169592702046

Method 4. Cross-correlation Analysis Using Statsmodels

Statsmodels OLS function is used to calculate the cross-correlation between signal1 and signal2.

Python

import numpy as np

# time array
t = np.arange(0, 10, 0.1)

# Generate signals
signal1 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
signal2 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))

import statsmodels.api as sm

statsmodels_correlation = sm.OLS(signal1, signal2).fit().rsquared
print('Statsmodels Correlation:', statsmodels_correlation)

Output:

Statsmodels Correlation: 0.9730755677920275

Conclusion

The manual implementation, NumPy, SciPy, and Statsmodels methods all yield correlation coefficients that indicate a strong positive correlation between signal1 and signal2. This underscores the versatility of Python in performing cross-correlation analysis, catering to a wide range of requirements and complexities.