Gaussian Process Classification (GPC) on Iris Dataset
A potent machine learning approach that may be used for both regression and classification problems is Gaussian process classification or GPC. It is predicated on the notion of using a probabilistic model that depicts a distribution across functions, known as a Gaussian process. Using this distribution, one may forecast a function’s output given a collection of input data.
GPC may be used in the classification context to forecast a new data point’s class label based on its attributes. This is accomplished by modelling the likelihood of each class label for the data point using the Gaussian process. Next, it is predicted that the class label with the greatest probability is the actual class label.
Gaussian Process Classification
A Gaussian Process Extension for Classification Problems is called GPC. To enable a probabilistic approach to class label prediction in classification tasks, GPC models the probability distribution over possible functions. Using GPC is helpful when dealing with issues involving imbalanced datasets or complex decision boundaries. Not only does it offer forecasts, but it also measures the degree of uncertainty surrounding them. Understanding the confidence or reliability of the model’s output is particularly important in applications. GPC is a flexible tool in machine learning classification tasks because it can apply past knowledge and adjust to various kinds of data.
A Gaussian process (GP) is a kind of stochastic process that has a multivariate normal distribution for each finite collection of its random variables. This indicates that the random variables have a normal distribution for every linear combination of them. Applications for GPs are many and include statistics, machine learning, and Bayesian inference.
- Mean function: The anticipated value of the GP at any given input is represented by the mean function.
- Covariance function: This function calculates how similar two inputs are to one another. It establishes the degree to which the GP at one input is dependent upon the GP at another.
- Hyperparameters: The covariance function’s shape is controlled by these parameters. Usually, a method known as marginal likelihood maximization is used to adjust them.
- Marginal likelihood: A GP model’s fit to a particular set of data is measured by the marginal likelihood. It is used to determine the optimal hyperparameters and assess the performance of various models.
Concepts of Gaussian Process Classification
- Kernel functions: A kernel function, often referred to as a covariance function, is an essential element that establishes the form of the GP’s prior and posterior distributions in the setting of Gaussian processes (GPs). It is vital for capturing the underlying correlations between input characteristics and goal values since it basically assesses how similar two data points are to each other.
- Posterior Distribution: Our updated belief about the underlying function following data observation is represented by the posterior distribution in the Gaussian Process (GP). With the combination of the likelihood of observed data and the prior, which represents initial beliefs, a probability distribution over functions is produced. By offering a more accurate estimate of the true underlying function based on the evidence at hand, the posterior captures the process of reducing uncertainty. Through the incorporation of observed data into the modeling process, it facilitates well-informed decision making.
- Prior Distribution: The prior distribution in Gaussian Process (GP) describes our presumptions about the underlying function before any data are observed. It defines the potential functions that might be used to characterize the underlying process, acting as a versatile and expressive model. Prior to taking into account actual data, our expectations and presumptions were reflected in the prior selection. Because of its rich prior, the GP is an effective tool for modeling complex and uncertain systems in statistics and machine learning. It can capture a wide range of function behaviors.
Implementation of Gaussian process classification (GPC) on Iris dataset
Importing Libraries
Python3
# Import necessary libraries import numpy as np import matplotlib.pyplot as plt from sklearn.gaussian_process import GaussianProcessClassifier from sklearn.gaussian_process.kernels import RBF from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score from sklearn.datasets import load_iris |
Load Iris Dataset
Python3
# Load the iris dataset iris = load_iris() X = iris.data[:, : 2 ] # Using only the first two features for visualization y = iris.target |
This imports the iris dataset from the sklearn.datasets module of Scikit-Learn. The characteristics (iris flower measurements) and target labels (iris species) are present in the loaded dataset.
Split Data into Training and Test Sets
Python3
# Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.2 , random_state = 50 ) |
This code uses sklearn.model_selection.train_test_split() to divide the data into training and test sets. The test set (30%) will be used to assess the GPC model’s performance, while the training set (70%) will be used to fit it. Reproducible outcomes are guaranteed by the random_state parameter.
Standardize Features
Python3
# Standardize the features scaler = StandardScaler() X_train_standardized = scaler.fit_transform(X_train) X_test_standardized = scaler.transform(X_test) |
This code uses sklearn.preprocessing to standardize the features.Use StandardScaler(). Every feature is scaled to have a zero mean and a one standard deviation. This aids in enhancing the GPC model’s performance.
Define Kernel
Python3
# Define the kernel kernel = 1.0 * RBF(length_scale = 1.0 ) # RBF kernel with default parameters |
In GPC, the kernel function is essential for calculating the similarity between data points. The Radial Basis Function (RBF) kernel and the Matérn kernel are popular alternatives for kernels.
Fit the Model to the Training Data
Python3
# Create the Gaussian process classifier gp = GaussianProcessClassifier(kernel = kernel) # Fit the model to the training data gp.fit(X_train_standardized, y_train) |
Using gp.fit(X_train, y_train), this code fits the GPC model to the training set of data. Building the underlying probabilistic model and acquiring the kernel parameters are required for this.
Make Predictions on the Test Data
Python3
# Make predictions on the test data y_pred = gp.predict(X_test_standardized) |
Using gp.predict(X_test), this code forecasts the test data. It generates the anticipated class labels by fitting the GPC model to the test data.
Evaluate the Model
Python3
# Evaluate the model from sklearn.metrics import accuracy_score accuracy = accuracy_score(y_test, y_pred) print ( "Accuracy:" , accuracy) |
Output:
Accuracy: 0.9
This code uses the accuracy measure to assess the model’s performance. The accuracy score is determined by comparing the genuine class labels (y_test) with the anticipated class labels (y_pred).
Mesh Grid Visualization
Create a Mesh Grid
Python3
# Create a mesh grid for visualization x_min, x_max = X[:, 0 ]. min () - 0.5 , X[:, 0 ]. max () + 0.5 y_min, y_max = X[:, 1 ]. min () - 0.5 , X[:, 1 ]. max () + 0.5 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1 ), np.arange(y_min, y_max, 0.1 )) |
The range of the specified characteristics is covered by the mesh grid created by this code. The minimum and maximum values (x_min, x_max, y_min, y_max) are defined, and np.meshgrid() is used to create a grid of points.
Predict on the Mesh Grid
Python3
# Predict on the mesh grid X_grid = np.c_[xx.ravel(), yy.ravel()] X_grid_standardized = scaler.transform(X_grid) y_pred_grid = gp.predict(X_grid_standardized) y_pred_grid = y_pred_grid.reshape(xx.shape) |
The variables xx and yy define a mesh grid on which the code is predicting. The coordinate matrices are flattened and concatenated to create the mesh grid. The grid’s (X_grid) input features are then scaled using a scaler. The target variable for the standardized grid is then predicted by the Gaussian Process (gp), and the predictions are then reshaped to fit the mesh grid’s shape, yielding a surface of predicted values.
Ploting the Mesh Grid Visualization
Python3
# Plot the mesh grid visualization plt.contourf(xx, yy, y_pred_grid, cmap = 'coolwarm' , alpha = 0.8 ) plt.scatter(X_train[:, 0 ], X_train[:, 1 ], c = y_train, edgecolors = 'k' , cmap = plt.cm.Paired) plt.xlabel( 'Sepal Length (standardized)' ) plt.ylabel( 'Sepal Width (standardized)' ) plt.title( 'Gaussian Process Classification on Iris Dataset' ) plt.show() |
Output:
An accuracy score, representing the model’s performance on the test set, is produced as a result of the Gaussian Process Classification on the Iris dataset. Furthermore, a mesh grid visualization shows the decision boundaries and illustrates how the model categorizes various areas according to the characteristics (sepal length and breadth). The model’s predictions for every point on the grid are shown as outlines on the figure. Overall, the output shows the model’s categorization boundaries qualitatively as well as quantitatively evaluating the model’s correctness.