What is Gaussian Process Classification (GPC) on Iris Dataset?

In this article, we will learn Gaussian Process Classification (GPC) on Iris Dataset,This free Python tutorial for complete beginners will help you learn Python from scratch.

Gaussian Process Classification (GPC) on Iris Dataset️‍🔥

A potent machine learning approach that may be used for both regression and classification problems is Gaussian process classification or GPC. It is predicated on the notion of using a probabilistic model that depicts a distribution across functions, known as a Gaussian process. Using this distribution, one may forecast a function’s output given a collection of input data.

GPC may be used in the classification context to forecast a new data point’s class label based on its attributes. This is accomplished by modelling the likelihood of each class label for the data point using the Gaussian process. Next, it is predicted that the class label with the greatest probability is the actual class label.

Gaussian Process Classification

A Gaussian Process Extension for Classification Problems is called GPC. To enable a probabilistic approach to class label prediction in classification tasks, GPC models the probability distribution over possible functions. Using GPC is helpful when dealing with issues involving imbalanced datasets or complex decision boundaries. Not only does it offer forecasts, but it also measures the degree of uncertainty surrounding them. Understanding the confidence or reliability of the model’s output is particularly important in applications. GPC is a flexible tool in machine learning classification tasks because it can apply past knowledge and adjust to various kinds of data.

A Gaussian process (GP) is a kind of stochastic process that has a multivariate normal distribution for each finite collection of its random variables. This indicates that the random variables have a normal distribution for every linear combination of them. Applications for GPs are many and include statistics, machine learning, and Bayesian inference.

Mean function: The anticipated value of the GP at any given input is represented by the mean function.
Covariance function: This function calculates how similar two inputs are to one another. It establishes the degree to which the GP at one input is dependent upon the GP at another.
Hyperparameters: The covariance function’s shape is controlled by these parameters. Usually, a method known as marginal likelihood maximization is used to adjust them.
Marginal likelihood: A GP model’s fit to a particular set of data is measured by the marginal likelihood. It is used to determine the optimal hyperparameters and assess the performance of various models.

Concepts of Gaussian Process Classification

Kernel functions: A kernel function, often referred to as a covariance function, is an essential element that establishes the form of the GP’s prior and posterior distributions in the setting of Gaussian processes (GPs). It is vital for capturing the underlying correlations between input characteristics and goal values since it basically assesses how similar two data points are to each other.
Posterior Distribution: Our updated belief about the underlying function following data observation is represented by the posterior distribution in the Gaussian Process (GP). With the combination of the likelihood of observed data and the prior, which represents initial beliefs, a probability distribution over functions is produced. By offering a more accurate estimate of the true underlying function based on the evidence at hand, the posterior captures the process of reducing uncertainty. Through the incorporation of observed data into the modeling process, it facilitates well-informed decision making.
Prior Distribution: The prior distribution in Gaussian Process (GP) describes our presumptions about the underlying function before any data are observed. It defines the potential functions that might be used to characterize the underlying process, acting as a versatile and expressive model. Prior to taking into account actual data, our expectations and presumptions were reflected in the prior selection. Because of its rich prior, the GP is an effective tool for modeling complex and uncertain systems in statistics and machine learning. It can capture a wide range of function behaviors.

Implementation of Gaussian process classification (GPC) on Iris dataset

Importing Libraries

Python3

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

Load Iris Dataset

Python3

# Load the iris dataset
iris = load_iris()
X = iris.data[:, :2]  # Using only the first two features for visualization
y = iris.target

This imports the iris dataset from the sklearn.datasets module of Scikit-Learn. The characteristics (iris flower measurements) and target labels (iris species) are present in the loaded dataset.

Split Data into Training and Test Sets

Python3

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=50)

This code uses sklearn.model_selection.train_test_split() to divide the data into training and test sets. The test set (30%) will be used to assess the GPC model’s performance, while the training set (70%) will be used to fit it. Reproducible outcomes are guaranteed by the random_state parameter.

Standardize Features

Python3

# Standardize the features
scaler = StandardScaler()
X_train_standardized = scaler.fit_transform(X_train)
X_test_standardized = scaler.transform(X_test)

This code uses sklearn.preprocessing to standardize the features.Use StandardScaler(). Every feature is scaled to have a zero mean and a one standard deviation. This aids in enhancing the GPC model’s performance.

Define Kernel

Python3

# Define the kernel
kernel = 1.0 * RBF(length_scale=1.0)  # RBF kernel with default parameters

In GPC, the kernel function is essential for calculating the similarity between data points. The Radial Basis Function (RBF) kernel and the Matérn kernel are popular alternatives for kernels.

Fit the Model to the Training Data

Python3

# Create the Gaussian process classifier
gp = GaussianProcessClassifier(kernel=kernel)
 
# Fit the model to the training data
gp.fit(X_train_standardized, y_train)

Using gp.fit(X_train, y_train), this code fits the GPC model to the training set of data. Building the underlying probabilistic model and acquiring the kernel parameters are required for this.

Make Predictions on the Test Data

Python3

# Make predictions on the test data
y_pred = gp.predict(X_test_standardized)

Using gp.predict(X_test), this code forecasts the test data. It generates the anticipated class labels by fitting the GPC model to the test data.

Evaluate the Model

Python3

# Evaluate the model
from sklearn.metrics import accuracy_score
 
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Output:

Accuracy: 0.9

This code uses the accuracy measure to assess the model’s performance. The accuracy score is determined by comparing the genuine class labels (y_test) with the anticipated class labels (y_pred).

Mesh Grid Visualization

Create a Mesh Grid

Python3

# Create a mesh grid for visualization
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
 
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                     np.arange(y_min, y_max, 0.1))

The range of the specified characteristics is covered by the mesh grid created by this code. The minimum and maximum values (x_min, x_max, y_min, y_max) are defined, and np.meshgrid() is used to create a grid of points.

Predict on the Mesh Grid

Python3

# Predict on the mesh grid
X_grid = np.c_[xx.ravel(), yy.ravel()]
X_grid_standardized = scaler.transform(X_grid)
y_pred_grid = gp.predict(X_grid_standardized)
y_pred_grid = y_pred_grid.reshape(xx.shape)

The variables xx and yy define a mesh grid on which the code is predicting. The coordinate matrices are flattened and concatenated to create the mesh grid. The grid’s (X_grid) input features are then scaled using a scaler. The target variable for the standardized grid is then predicted by the Gaussian Process (gp), and the predictions are then reshaped to fit the mesh grid’s shape, yielding a surface of predicted values.

Ploting the Mesh Grid Visualization

Python3

# Plot the mesh grid visualization
plt.contourf(xx, yy, y_pred_grid, cmap='coolwarm', alpha=0.8)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train,
            edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal Length (standardized)')
plt.ylabel('Sepal Width (standardized)')
plt.title('Gaussian Process Classification on Iris Dataset')
plt.show()

Output:

Gaussian process classification

An accuracy score, representing the model’s performance on the test set, is produced as a result of the Gaussian Process Classification on the Iris dataset. Furthermore, a mesh grid visualization shows the decision boundaries and illustrates how the model categorizes various areas according to the characteristics (sepal length and breadth). The model’s predictions for every point on the grid are shown as outlines on the figure. Overall, the output shows the model’s categorization boundaries qualitatively as well as quantitatively evaluating the model’s correctness.