What is Shrinkage Covariance Estimation in Scikit Learn?

In this article, we will learn Shrinkage Covariance Estimation in Scikit Learn,This free Python tutorial for complete beginners will help you learn Python from scratch.

Shrinkage Covariance Estimation in Scikit Learn️‍🔥

The Ledoit and Wolf proposed a formula for shrinkage which is generally used for regularizing the usual maximum likelihood estimation. This formula is called the Ledoit-Wolf covariance estimation formula. This formula is able to compute asymptotically optimal shrinkage parameters by minimizing the mean-square-error criterion features.

OAS Estimator: Researcher Chen et al. proposed an improvement on the Ledoit-Wolf shrinkage parameter with Oracle Approximating Shrinkage(OAS) estimator. This estimator’s convergence is significantly better but the assumption is that the data are Gaussian.

Importing Libraries and generating datasets

By using python libraries like NumPy, Matplotlib, SKlearn, and SciPy it will become easy to handle the datasets and perform complex computations with a single line of code.

Python3

# importing libraries
from sklearn.covariance import ShrunkCovariance,\
    empirical_covariance, log_likelihood
from scipy import linalg
from sklearn.model_selection import GridSearchCV
from sklearn.covariance import LedoitWolf, OAS
import matplotlib.pyplot as plt
import numpy as np
# Generating  sample datasets
noOfFeatures, noOfSamples = 38, 22
np.random.seed(50)
X_train_baseline = np.random.normal(size=(noOfSamples,
                                          noOfFeatures))
X_test_baseline = np.random.normal(size=(noOfSamples,
                                         noOfFeatures))
 
# Color samples
# defining color matrix
colorMatrix = np.random.normal(size=(noOfFeatures,
                                     noOfFeatures))
X_train = np.dot(X_train_baseline, colorMatrix)
X_test = np.dot(X_test_baseline, colorMatrix)

Defining the Range of shrinkage values and making it optimal

Now we will define a span of all possible shrinkage coefficient values and perform the Grid-search method to identify optimal shrinkage coefficient values.

Python3

# defining a Spanning range of all possible shrinkage coefficient values
shrinkageFactor = np.logspace(-2, 0, 32)
negative_logliks = [
    -ShrunkCovariance(shrinkage=s).fit(X_train).score(X_test)\
  for s in shrinkageFactor]
realCovariance = np.dot(colorMatrix.T, colorMatrix)
empiricalCovariance = empirical_covariance(X_train)
logRealLikelihood = - \
    log_likelihood(empiricalCovariance, linalg.inv(realCovariance))
 
# GridSearch method for an optimal shrinkage coefficient
tunedParameters = [{"shrinkage": shrinkageFactor}]
cv = GridSearchCV(ShrunkCovariance(), tunedParameters)
cv.fit(X_train)

Performing optimal shrinkage coefficient estimation for Ledoit-Wolf and OAS coefficient

Now We will estimate the shrinkage coefficients for Ledoit-Wolf and OAS. But before that let’s see what is max-likelihood estimation.

Max-Likelihood Estimation: It is an optimization algorithm that searches for the most suitable parameters. It iteratively searches the most likely mean and standard deviation(sd) that may be generated in the distribution. So, it is a probabilistic approach that can be applied to data belonging to any distribution.

Python3

# Ledoit-Wolf optimal shrinkage coefficient estimate
ledoitWolf = LedoitWolf()
logLikelihoodLedoitWolf = ledoitWolf.fit(X_train).score(X_test)
 
# OAS coefficient estimate
oas = OAS()
logLikelihoodOAS = oas.fit(X_train).score(X_test)
fig = plt.figure()

Defining Shrinkage Curve Range

Now we will define a range of the shrinkage curve and adjust the view of the graph to see the output in an easily understandable manner. Finally, we will calculate the likelihood estimation for Ledoit-Wolf, OAS, and also the Best Covariance estimator to visualize the actual comparative results.

Python3

# defining range of shrinkage curve
plt.loglog(shrinkageFactor,
           negative_logliks,
           "m--",
           label="Negative log-likelihood")
 
plt.plot(plt.xlim(),
         2 * [logRealLikelihood],
         "b-.",
         label="Real Covariance Likelihood")
 
# Adjusting View in Graph
maxLikelihood = np.amax(negative_logliks)
minLikelihood = np.amin(negative_logliks)
min_y = minLikelihood - 7.0 * np.log((plt.ylim()[1] - plt.ylim()[0]))
max_y = maxLikelihood + 16.0 * np.log(maxLikelihood - minLikelihood)
min_x = shrinkageFactor[0]
max_x = shrinkageFactor[-1]
 
# ledoitWolf likelihood
plt.vlines(
    ledoitWolf.shrinkage_,
    min_y,
    -logLikelihoodLedoitWolf,
    color="cyan",
    linewidth=3,
    label="Ledoit-Wolf Estimate",
)
# OAS likelihood
plt.vlines(
    oas.shrinkage_,
    min_y,
    -logLikelihoodOAS,
    color="green",
    linewidth=3,
    label="OAS Estimate"
)
# Best Covariance estimator likelihood
plt.vlines(
    cv.best_estimator_.shrinkage, min_y,
    -cv.best_estimator_.score(X_test),
    color="yellow",
    linewidth=3,
    label="Cross-validation Best estimatation",
)
#plotting in Graph
plt.title("Regularized Covariance: Likelihood & Shrinkage Coefficient")
plt.xlabel("Regularization parameter: Shrinkage coefficient")
plt.ylabel("Error calculation in negative log-likelihood on test-data")
plt.ylim(min_y, max_y)
plt.xlim(min_x, max_x)
plt.legend()
plt.show()

Output: