Model Reduction Methods with example
1. Feature Selection
Concept: Feature selection involves choosing the most relevant features (input variables) for a model while discarding less important ones. Various techniques, like statistical tests (ANOVA F-statistic), rank features based on their relevance to the target variable. Feature selection is the process of choosing the most relevant features (input variables) for training a model while discarding less important ones. This reduces the dimensionality of the data and can improve model performance.
Python Example:
Python
import matplotlib.pyplot as plt import numpy as np from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.feature_selection import SelectKBest, f_classif # Load the Iris dataset data = load_iris() X, y = data.data, data.target # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) # Function to plot decision boundaries def plot_decision_boundary(model, X, y): h = 0.02 # Step size in the mesh x_min, x_max = X[:, 0 ]. min () - 1 , X[:, 0 ]. max () + 1 y_min, y_max = X[:, 1 ]. min () - 1 , X[:, 1 ]. max () + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap = plt.cm.RdYlBu, alpha = 0.6 ) plt.scatter(X[:, 0 ], X[:, 1 ], c = y, cmap = plt.cm.RdYlBu) plt.xlabel( "Feature 1" ) plt.ylabel( "Feature 2" ) plt.title( "Decision Boundary" ) # Create a complex decision tree model complex_model = DecisionTreeClassifier() complex_model.fit(X_train[:, : 2 ], y_train) # We'll use only the first two features for visualization # Feature selection: Select the top two most relevant features selector = SelectKBest(score_func = f_classif, k = 2 ) X_train_reduced = selector.fit_transform(X_train, y_train) X_test_reduced = selector.transform(X_test) # Create a decision tree model using reduced features model_with_reduced_features = DecisionTreeClassifier() model_with_reduced_features.fit(X_train_reduced, y_train) # Plot decision boundaries plt.figure(figsize = ( 15 , 4 )) plt.subplot( 1 , 3 , 1 ) plot_decision_boundary(complex_model, X_train[:, : 2 ], y_train) plt.title( "Complex Decision Tree (All Features)" ) plt.subplot( 1 , 3 , 2 ) plot_decision_boundary(model_with_reduced_features, X_train_reduced, y_train) plt.title( "Decision Tree with Reduced Features" ) # Calculate accuracy for both models y_pred_complex = complex_model.predict(X_test[:, : 2 ]) accuracy_complex = accuracy_score(y_test, y_pred_complex) y_pred_reduced = model_with_reduced_features.predict(X_test_reduced) accuracy_reduced = accuracy_score(y_test, y_pred_reduced) plt.subplot( 1 , 3 , 3 ) plt.bar([ 'Complex Model' , 'Reduced Model' ], [accuracy_complex, accuracy_reduced], color = [ 'blue' , 'orange' ]) plt.ylim( 0 , 1 ) plt.ylabel( 'Accuracy' ) plt.title( 'Model Accuracy' ) plt.tight_layout() plt.show() |
Output:
You’ll see two side-by-side graphs that compare the sophisticated and simplified models’ decision bounds. The simpler model will have smoother, easier to understand choice boundaries than the complicated model, which is likely to have convoluted limits.
This graphic demonstrates how model reduction lowers the decision limits, possibly improving the model’s suitability for practical application and making it easier to understand.
Bonus Example :
Python
from sklearn.datasets import load_iris from sklearn.feature_selection import SelectKBest, f_classif import matplotlib.pyplot as plt # Load the Iris dataset data = load_iris() X, y = data.data, data.target # Perform feature selection using ANOVA F-statistic num_features = X.shape[ 1 ] selected_features = [] # Create an animation to show feature selection for k in range ( 1 , min ( 3 , num_features) + 1 ): # Ensure k is at most 2 for visualization selector = SelectKBest(score_func = f_classif, k = k) X_new = selector.fit_transform(X, y) selected_features.append(selector.get_support()) plt.figure(figsize = ( 12 , 4 )) plt.subplot( 1 , 2 , 1 ) plt.scatter(X[:, 0 ], X[:, 1 ], c = y, cmap = plt.cm.coolwarm) plt.xlabel( 'Feature 0' ) plt.ylabel( 'Feature 1' ) plt.title(f 'Original Data ' ) plt.subplot( 1 , 2 , 2 ) if X_new.shape[ 1 ] > 1 : plt.scatter(X_new[:, 0 ], X_new[:, 1 ], c = y, cmap = plt.cm.coolwarm) plt.xlabel(f 'Selected Features (k={k})' ) plt.ylabel(f 'Selected Features (k={k})' ) plt.title(f 'Selected Features (k={k})' ) plt.tight_layout() plt.show() |
Output:
2. Dimensionality Reduction (PCA):
Concept: Principal Component Analysis (PCA), for example, aims to project high-dimensional data into a lower-dimensional space while maintaining critical information. This streamlines the model without seriously compromising its accuracy. By identifying a set of orthogonal axes (principal components) along which the data fluctuates most, Principal Component Analysis (PCA) decreases the dimensionality of the data. This technique reduces the number of dimensions while keeping crucial information by projecting data onto these components.
Python Example:
Python
from sklearn.decomposition import PCA # Apply PCA for dimensionality reduction (reduce to 2 components) pca = PCA(n_components = 2 ) X_train_pca = pca.fit_transform(X_train) X_test_pca = pca.transform(X_test) # Create a decision tree model using PCA-reduced data model_with_pca = DecisionTreeClassifier() model_with_pca.fit(X_train_pca, y_train) # Plot decision boundaries plt.figure(figsize = ( 15 , 4 )) plt.subplot( 1 , 3 , 1 ) plot_decision_boundary(complex_model, X_train[:, : 2 ], y_train) plt.title( "Complex Decision Tree (All Features)" ) plt.subplot( 1 , 3 , 2 ) plot_decision_boundary(model_with_pca, X_train_pca, y_train) plt.title( "Decision Tree with PCA-Reduced Features" ) # Calculate accuracy for both models y_pred_pca = model_with_pca.predict(X_test_pca) accuracy_pca = accuracy_score(y_test, y_pred_pca) plt.subplot( 1 , 3 , 3 ) plt.bar([ 'Complex Model' , 'PCA-Reduced Model' ], [accuracy_complex, accuracy_pca], color = [ 'blue' , 'green' ]) plt.ylim( 0 , 1 ) plt.ylabel( 'Accuracy' ) plt.title( 'Model Accuracy' ) plt.tight_layout() plt.show() |
Output:
In this code, we apply PCA to reduce the dimensionality of the data to just two principal components. We then train a decision tree model using these reduced components and visualize the decision boundary.
By comparing the decision boundaries in both cases (feature selection and PCA), you can see how these model reduction methods simplify the model while retaining the essential patterns in the data.
Bonus Example:
Python
from sklearn.decomposition import PCA from sklearn.datasets import load_iris import matplotlib.pyplot as plt # Load the Iris dataset data = load_iris() X, y = data.data, data.target # Perform PCA for dimensionality reduction pca = PCA(n_components = 2 ) X_pca = pca.fit_transform(X) # Create an animation to show PCA in action plt.figure(figsize = ( 12 , 4 )) plt.subplot( 1 , 2 , 1 ) plt.scatter(X[:, 0 ], X[:, 1 ], c = y, cmap = plt.cm.coolwarm) plt.xlabel( 'Feature 0' ) plt.ylabel( 'Feature 1' ) plt.title(f 'Original Data' ) plt.subplot( 1 , 2 , 2 ) plt.scatter(X_pca[:, 0 ], X_pca[:, 1 ], c = y, cmap = plt.cm.coolwarm) plt.xlabel( 'Principal Component 1' ) plt.ylabel( 'Principal Component 2' ) plt.title(f 'PCA Reduced Data (k=2)' ) plt.tight_layout() plt.show() |
Output:
3. Regularization
Concept: Regularization involves adding constraints to a model to prevent it from becoming too complex. One common approach is to limit the maximum depth of a decision tree, effectively simplifying the tree’s structure.
Python Example:
Python
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_moons from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Generate the dataset X, y = make_moons(n_samples = 300 , noise = 0.25 , random_state = 42 ) # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3 , random_state = 42 ) # Create a complex decision tree model complex_model = DecisionTreeClassifier(random_state = 42 ) complex_model.fit(X_train, y_train) # Create a regularized decision tree model with limited depth regularized_model = DecisionTreeClassifier(max_depth = 3 , random_state = 42 ) regularized_model.fit(X_train, y_train) # Calculate accuracy for both models y_pred_complex = complex_model.predict(X_test) accuracy_complex = accuracy_score(y_test, y_pred_complex) y_pred_regularized = regularized_model.predict(X_test) accuracy_regularized = accuracy_score(y_test, y_pred_regularized) # Plot decision boundaries def plot_decision_boundary(model, X, y, ax, title): h = 0.02 x_min, x_max = X[:, 0 ]. min () - 1 , X[:, 0 ]. max () + 1 y_min, y_max = X[:, 1 ]. min () - 1 , X[:, 1 ]. max () + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) ax.contourf(xx, yy, Z, cmap = plt.cm.RdYlBu, alpha = 0.6 ) ax.scatter(X[:, 0 ], X[:, 1 ], c = y, cmap = plt.cm.RdYlBu) ax.set_xlabel( "Feature 1" ) ax.set_ylabel( "Feature 2" ) ax.set_title(title) plt.figure(figsize = ( 12 , 4 )) plt.subplot( 1 , 3 , 1 ) plot_decision_boundary(complex_model, X, y, plt.gca(), "Complex Decision Tree (No Regularization)" ) plt.subplot( 1 , 3 , 2 ) plot_decision_boundary(regularized_model, X, y, plt.gca(), "Regularized Decision Tree (Max Depth = 3)" ) # Display accuracy comparison plt.subplot( 1 , 3 , 3 ) plt.bar([ 'Complex Model' , 'Regularized Model' ], [accuracy_complex, accuracy_regularized], color = [ 'blue' , 'purple' ]) plt.ylim( 0 , 1 ) plt.ylabel( 'Accuracy' ) plt.title( 'Model Accuracy Comparison' ) plt.tight_layout() plt.show() |
Output:
In this example, we use the make_moons dataset, which is nonlinear and not linearly separable. You will notice that the complex model (without regularization) overfits the data, while the regularized model (with limited depth) has a more generalized decision boundary. This clearly demonstrates the effect of regularization in simplifying the model and improving its ability to generalize to unseen data.
Regularization (L1 and L2)
Regularization is a method to reduce model complexity by adding penalties to the model’s loss function based on the magnitude of its parameters (weights). L1 regularization (Lasso) adds a penalty proportional to the absolute values of the parameters, which can lead some parameters to become exactly zero. L2 regularization (Ridge) adds a penalty proportional to the square of parameter values, discouraging large parameter values.
Python Implementation
Python
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn import datasets from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split # Load the Iris dataset iris = datasets.load_iris() X = iris.data[:, : 2 ] # We'll use the first two features for simplicity y = iris.target # Standardize the features scaler = StandardScaler() X = scaler.fit_transform(X) # Regularization paths alphas = np.logspace( - 2 , 2 , 50 ) coefs_lasso = [] coefs_ridge = [] for alpha in alphas: lasso = LogisticRegression(penalty = 'l1' , C = 1 / alpha, solver = 'liblinear' , multi_class = 'ovr' ) ridge = LogisticRegression(penalty = 'l2' , C = 1 / alpha, solver = 'lbfgs' , multi_class = 'ovr' ) lasso.fit(X, y) ridge.fit(X, y) coefs_lasso.append(lasso.coef_.ravel()) coefs_ridge.append(ridge.coef_.ravel()) # Create subplots for L1 and L2 regularization paths plt.figure(figsize = ( 12 , 6 )) plt.subplot( 121 ) plt.plot(alphas, coefs_lasso) plt.title( 'L1 Regularization Path (Lasso)' ) plt.xlabel( 'Alpha (Regularization Strength)' ) plt.ylabel( 'Coefficient Values' ) plt.legend(iris.feature_names[: 2 ]) plt.subplot( 122 ) plt.plot(alphas, coefs_ridge) plt.title( 'L2 Regularization Path (Ridge)' ) plt.xlabel( 'Alpha (Regularization Strength)' ) plt.ylabel( 'Coefficient Values' ) plt.legend(iris.feature_names[: 2 ]) plt.tight_layout() plt.show() |
Output:
4. t-Distributed Stochastic Neighbor Embedding (t-SNE)
Concept: t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction technique that is widely used for visualizing high-dimensional data in a lower-dimensional space, typically 2D or 3D. Unlike linear techniques like Principal Component Analysis (PCA), t-SNE focuses on preserving the pairwise similarities between data points, making it particularly effective for visualizing complex, nonlinear structures in the data.
Python example :
Python
from sklearn.manifold import TSNE from sklearn.datasets import load_iris import matplotlib.pyplot as plt # Load the Iris dataset data = load_iris() X, y = data.data, data.target # Perform t-SNE for dimensionality reduction tsne = TSNE(n_components = 2 ) X_tsne = tsne.fit_transform(X) # Create an animation to demonstrate t-SNE plt.figure(figsize = ( 5 , 4 )) plt.scatter(X_tsne[:, 0 ], X_tsne[:, 1 ], c = y, cmap = plt.cm.coolwarm) plt.xlabel( 't-SNE Dimension 1' ) plt.ylabel( 't-SNE Dimension 2' ) plt.title( 't-SNE for Dimensionality Reduction' ) plt.show() |
Output:
Benefits of Model Reduction
Model reduction techniques serve several essential purposes:
- Improved Interpretability: Simplifying complex models makes them easier to understand and explain, crucial for stakeholders.
- Computational Efficiency: Reduced complexity leads to faster model training and prediction, critical in real-time applications.
- Generalization: Simpler models are less prone to overfitting and often generalize better to unseen data.
Model with Reduction Methods
Machine learning models are now more powerful and sophisticated than ever before, able to handle challenging problems and enormous datasets. But with great power also comes huge complexity, and occasionally these models grow too complicated to be useful for implementation in the real world. Methods of model reduction are useful in this situation. This article will discuss the idea of model reduction in machine learning, explaining it simply for newcomers, clarifying essential terms, and providing concrete Python examples to show how it works. We will introduce some common dimensionality reduction techniques and show how to apply them to a machine-learning model using Python.