Shrinkage Covariance Estimation in Scikit Learn
The Ledoit and Wolf proposed a formula for shrinkage which is generally used for regularizing the usual maximum likelihood estimation. This formula is called the Ledoit-Wolf covariance estimation formula. This formula is able to compute asymptotically optimal shrinkage parameters by minimizing the mean-square-error criterion features.
OAS Estimator: Researcher Chen et al. proposed an improvement on the Ledoit-Wolf shrinkage parameter with Oracle Approximating Shrinkage(OAS) estimator. This estimator’s convergence is significantly better but the assumption is that the data are Gaussian.
Importing Libraries and generating datasets
By using python libraries like NumPy, Matplotlib, SKlearn, and SciPy it will become easy to handle the datasets and perform complex computations with a single line of code.
Python3
# importing libraries from sklearn.covariance import ShrunkCovariance,\ empirical_covariance, log_likelihood from scipy import linalg from sklearn.model_selection import GridSearchCV from sklearn.covariance import LedoitWolf, OAS import matplotlib.pyplot as plt import numpy as np # Generating sample datasets noOfFeatures, noOfSamples = 38 , 22 np.random.seed( 50 ) X_train_baseline = np.random.normal(size = (noOfSamples, noOfFeatures)) X_test_baseline = np.random.normal(size = (noOfSamples, noOfFeatures)) # Color samples # defining color matrix colorMatrix = np.random.normal(size = (noOfFeatures, noOfFeatures)) X_train = np.dot(X_train_baseline, colorMatrix) X_test = np.dot(X_test_baseline, colorMatrix) |
Defining the Range of shrinkage values and making it optimal
Now we will define a span of all possible shrinkage coefficient values and perform the Grid-search method to identify optimal shrinkage coefficient values.
Python3
# defining a Spanning range of all possible shrinkage coefficient values shrinkageFactor = np.logspace( - 2 , 0 , 32 ) negative_logliks = [ - ShrunkCovariance(shrinkage = s).fit(X_train).score(X_test)\ for s in shrinkageFactor] realCovariance = np.dot(colorMatrix.T, colorMatrix) empiricalCovariance = empirical_covariance(X_train) logRealLikelihood = - \ log_likelihood(empiricalCovariance, linalg.inv(realCovariance)) # GridSearch method for an optimal shrinkage coefficient tunedParameters = [{ "shrinkage" : shrinkageFactor}] cv = GridSearchCV(ShrunkCovariance(), tunedParameters) cv.fit(X_train) |
Performing optimal shrinkage coefficient estimation for Ledoit-Wolf and OAS coefficient
Now We will estimate the shrinkage coefficients for Ledoit-Wolf and OAS. But before that let’s see what is max-likelihood estimation.
Max-Likelihood Estimation: It is an optimization algorithm that searches for the most suitable parameters. It iteratively searches the most likely mean and standard deviation(sd) that may be generated in the distribution. So, it is a probabilistic approach that can be applied to data belonging to any distribution.
Python3
# Ledoit-Wolf optimal shrinkage coefficient estimate ledoitWolf = LedoitWolf() logLikelihoodLedoitWolf = ledoitWolf.fit(X_train).score(X_test) # OAS coefficient estimate oas = OAS() logLikelihoodOAS = oas.fit(X_train).score(X_test) fig = plt.figure() |
Defining Shrinkage Curve Range
Now we will define a range of the shrinkage curve and adjust the view of the graph to see the output in an easily understandable manner. Finally, we will calculate the likelihood estimation for Ledoit-Wolf, OAS, and also the Best Covariance estimator to visualize the actual comparative results.
Python3
# defining range of shrinkage curve plt.loglog(shrinkageFactor, negative_logliks, "m--" , label = "Negative log-likelihood" ) plt.plot(plt.xlim(), 2 * [logRealLikelihood], "b-." , label = "Real Covariance Likelihood" ) # Adjusting View in Graph maxLikelihood = np.amax(negative_logliks) minLikelihood = np.amin(negative_logliks) min_y = minLikelihood - 7.0 * np.log((plt.ylim()[ 1 ] - plt.ylim()[ 0 ])) max_y = maxLikelihood + 16.0 * np.log(maxLikelihood - minLikelihood) min_x = shrinkageFactor[ 0 ] max_x = shrinkageFactor[ - 1 ] # ledoitWolf likelihood plt.vlines( ledoitWolf.shrinkage_, min_y, - logLikelihoodLedoitWolf, color = "cyan" , linewidth = 3 , label = "Ledoit-Wolf Estimate" , ) # OAS likelihood plt.vlines( oas.shrinkage_, min_y, - logLikelihoodOAS, color = "green" , linewidth = 3 , label = "OAS Estimate" ) # Best Covariance estimator likelihood plt.vlines( cv.best_estimator_.shrinkage, min_y, - cv.best_estimator_.score(X_test), color = "yellow" , linewidth = 3 , label = "Cross-validation Best estimatation" , ) #plotting in Graph plt.title( "Regularized Covariance: Likelihood & Shrinkage Coefficient" ) plt.xlabel( "Regularization parameter: Shrinkage coefficient" ) plt.ylabel( "Error calculation in negative log-likelihood on test-data" ) plt.ylim(min_y, max_y) plt.xlim(min_x, max_x) plt.legend() plt.show() |
Output: