Linear and Quadratic Discriminant Analysis using Sklearn

Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are two well-known classification methods that are used in machine learning to find patterns and put things into groups. They are especially helpful when you have labeled data and want to classify new observations notes into pre-defined categories.

In this we will implement both these techniques, Linear and Quadratic Discriminant Analysis using Sklearn.

Table of Content

  • Understanding Linear and Quadratic Discriminant Analysis
  • Implementing Linear and Quadratic Discriminant Analysis with Scikit-Learn
    • Applying Linear Discriminant Analysis (LDA)
    • Applying Quadratic Discriminant Analysis (QDA)
    • Visualizing Linear and Quadratic Discriminant Analysis

Understanding Linear and Quadratic Discriminant Analysis

Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis assumes that the data in each class is normally distributed and has the same correlation matrix. It finds a linear combination of features that best separates the classes apart, sometimes referred to as Fisher’s linear discriminant. The idea is to maximize the distance between classes while projecting the data into a lower-dimensional space.

Under the presumptions, LDA determines the best linear decision boundary by minimizing the ratio of variation within a class to variance across classes.

The steps to compute LDA using sklearn are:

  • Compute the mean vectors for each class.
  • Compute the within-class and between-class scatter matrices.
  • Compute the eigenvalues and eigenvectors for the scatter matrices.
  • Select the top k eigenvectors that match to the k biggest eigenvalues to make a new feature space.
  • Project the data onto the new feature space.

Quadratic Discriminant Analysis (QDA)

QDA is similar to LDA but does not assume that the correlation matrices of each class are equal. This helps QDA to build more flexible decision limits by describing each class with its own correlation matrix.

The steps to compute QDA using sklearn are:

  • Compute the mean vector and correlation matrix for each class.
  • Use the quadratic form of the discriminant function to describe new data.

Implementing Linear and Quadratic Discriminant Analysis with Scikit-Learn

Scikit-Learn is a well-known Python machine learning package that offers effective implementations of Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) via their respective classes. To use LDA or QDA in Scikit-Learn, Let’s go through with below steps

1. Import the Necessary Modules

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

2. Generate Data

Python
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0,
                           n_clusters_per_class=1, n_classes=3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Applying Linear Discriminant Analysis (LDA)

Python
# Initialize and train the LDA model
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)
y_pred_lda = lda.predict(X_test)

print("LDA Accuracy:", accuracy_score(y_test, y_pred_lda))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_lda))
print("Classification Report:\n", classification_report(y_test, y_pred_lda))

Output:

LDA Accuracy: 0.8266666666666667
Confusion Matrix (LDA):
[[ 75 4 22]
[ 16 71 0]
[ 0 10 102]]
Classification Report (LDA):
precision recall f1-score support

0 0.82 0.74 0.78 101
1 0.84 0.82 0.83 87
2 0.82 0.91 0.86 112

accuracy 0.83 300
macro avg 0.83 0.82 0.82 300
weighted avg 0.83 0.83 0.83 300

Applying Quadratic Discriminant Analysis (QDA)

Python
# Initialize and train the QDA model
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)

# Make predictions
y_pred_qda = qda.predict(X_test)

# Evaluate the model
print("QDA Accuracy:", accuracy_score(y_test, y_pred_qda))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_qda))
print("Classification Report:\n", classification_report(y_test, y_pred_qda))

Output:

QDA Accuracy: 0.93
Confusion Matrix (QDA):
[[ 96 2 3]
[ 10 77 0]
[ 4 2 106]]
Classification Report (QDA):
precision recall f1-score support

0 0.87 0.95 0.91 101
1 0.95 0.89 0.92 87
2 0.97 0.95 0.96 112

accuracy 0.93 300
macro avg 0.93 0.93 0.93 300
weighted avg 0.93 0.93 0.93 300

Visualizing Linear and Quadratic Discriminant Analysis

For visualization let’s plot decision boundaries , the decision border is a line that divides the two classes of data points. The goal of a classifier is to predict the class of a new data point, based on its features. The decision border shows the classifier’s rule for splitting the classes.

Python
def plot_decision_boundaries(X, y, model, title, subplot_index):
    plt.subplot(subplot_index)
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')


plt.figure(figsize=(10, 4))
# Plot decision boundaries for LDA
plot_decision_boundaries(X_test, y_test, lda, "LDA Decision Boundary", 121)

# Plot decision boundaries for QDA
plot_decision_boundaries(X_test, y_test, qda, "QDA Decision Boundary", 122)

plt.tight_layout()
plt.show()

Output:

Decision Boundary Plots for LDA and QDA

The number of dots in the picture does not appear to be linked with the leftovers. Residue, in this case, refers to the difference between the expected value of a data point and its real value.

LDA projects data from a higher-dimensional space onto a lower-dimensional space in a way that maximizes the separation between different classes. In this case, the decision boundary likely separates the data points into two or more classes while QDA allows for a more complex connection. The QDA decision boundary looks to be more flexible than the LDA decision boundary, which may help it to better fit the data in some cases.

Conclusion

Finally, for supervised classification problems, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are effective methods. QDA allows each class to have its own covariance matrix, while LDA relaxes this condition by assuming that the classes have equal covariance matrices. Both approaches are practical and have their merits; Scikit-Learn offers handy implementations that make integrating them into machine learning pipelines simple.

Linear and Quadratic Discriminant Analysis using Sklearn- FAQs

When is it better to employ LDA than QDA?

If you want a simpler model and the classes have comparable covariance matrices, use LDA. When the decision boundary is non-linear or the classes have distinct covariance matrices, use QDA.

Can high-dimensional data be handled by LDA and QDA?

Yes, both QDA and LDA can handle high-dimensional data; however, if there are significantly more features than samples, overfitting may occur.