Mathematical Concept of Gaussian Process Regression (GPR)

Key Concepts of Gaussian Process Regression (GPR)

Implementation of Gaussian Process in Python

For regression tasks, a non-parametric, probabilistic machine learning model called Gaussian Process (GP) regression is employed. When modeling intricate and ambiguous interactions between input and output variables, it’s a potent tool. A multivariate Gaussian distribution is assumed to produce the data points in GP regression, and the objective is to infer this distribution.

The GP regression model has the following mathematical expression. Let’s assume x₁ , x₂,…..,x_n are the input data points , where x belong to real numbers(-2,-1,0,1…), (x_i[Tex]\epsilon [/Tex] R)

Let’s assume y₁, y_{2 ,}……., y_nare the output values, where y_ibelongs to real number (y_i[Tex]\epsilon [/Tex] R)

The GP regression model makes the assumption that a Gaussian process with a mean function ([Tex]\mu [/Tex]) and covariance function (k) provides the function f that connects the inputs to the outputs.

Then, at a set of test locations x*, the distribution of f is given by:

[Tex]f(x^*) ∼ N(\mu(x^*), k(x^*, x^*)) [/Tex]

Typically, kernel functions are used to define the mean function and covariance function. As an illustration, the squared exponential kernel that is frequently employed is described as:

[Tex]k(x_{i},x_{j}) = \sigma^2 exp(-\frac{||x_{i} – x_{j}||^2}{2l^2}) [/Tex]

Where,

[Tex]k(x_{i}, x_{j}) [/Tex] = The kernel function is represented by this, and it calculates the correlation or similarity between two input data points, x_iand x_j.
[Tex]\sigma^2 [/Tex] = The kernel’s variance parameter is this. It establishes the kernel function’s scale or vertical spread. It regulates how strongly the data points are correlated. A higher [Tex]\sigma^2 [/Tex] yields a kernel function with greater variance.
exp: The exponential function is responsible for raising e to the argument’s power.
[Tex]||x_{i} – x_{j}||^2 [/Tex]: The difference between the input data points, x_i and x_j, is the squared Euclidean distance. The geometric separation between the points in the feature space is measured.
l²: This is a representation of the kernel’s length scale or characteristic length. It regulates the rate at which the kernel function deteriorates as data points are farther apart. A lower l causes the kernel to degrade faster.

The GP regression model applies Bayesian inference to determine the distribution of f that is most likely to have produced the data given a set of training data (x, y). In order to do this, the posterior distribution of f given the data must be calculated, which is defined as follows:

[Tex]p(f|x,y) =\frac {p(y|x,f)p(f)} {p(y|x)} [/Tex]

where the marginal probability of the data is p(y|x), the prior distribution of f is p(f), and the likelihood of the data given the function f is (y|x,f).

After learning the posterior distribution of f, the model computes the posterior predictive distribution to make predictions at additional test points x*. It can be defined as follows:

[Tex]p(f^*|x^*, y,x) = \int p(f^*|x^*, f), p(f|y,x)df [/Tex]

Where,

[Tex]p(f^*|x*, y, x) [/Tex] = This shows, given the training data y and x, the conditional probability of the predicted function values f^* at a fresh input point ⁬x^* To put it another way, it’s the probability distribution over all potential function values at the new input site x^*, conditioned on the observed data y and their matching input locations x.
[Tex]\int p(f^*|x^*, f)p(f|y,x)df [/Tex] = An integral is employed in this section of the equation to determine the conditional probability. The integral encompasses all potential values of the function f.
[Tex]p(f^*|x^*, f) [/Tex] = This is the conditional probability distribution of the expected function values f^* at x^*, given the function values f at some intermediate locations.
[Tex]p(f|y,x) [/Tex] = Given the observed data (y) and their input locations (x), this is the conditional probability distribution of the function values (f).

For tasks like uncertainty-aware decision making and active learning, this distribution offers a measure of the prediction’s uncertainty, which can be helpful.

Steps in Gaussian Process Regression

Data Collection: Gather the input-output data pairs for your regression problem.
Choose a Kernel Function: Select an appropriate covariance function (kernel) that suits your problem. The choice of kernel influences the shape of the functions that GPR can model.
Parameter Optimization: Estimate the hyperparameters of the kernel function by maximizing the likelihood of the data. This can be done using optimization techniques like gradient descent.
Prediction: Given a new input, use the trained GPR model to make predictions. GPR provides both the predicted mean and the associated uncertainty (variance).

Implementation of Gaussian Process Regression (GPR)

Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.model_selection import train_test_split

# Generate sample data
np.random.seed(0)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel()

# Add noise to the data
y += 0.1 * np.random.randn(80)

# Define the kernel (RBF kernel)
kernel = 1.0 * RBF(length_scale=1.0)

# Create a Gaussian Process Regressor with the defined kernel
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

# Fit the Gaussian Process model to the training data
gp.fit(X_train, y_train)

# Make predictions on the test data
y_pred, sigma = gp.predict(X_test, return_std=True)

# Visualize the results
x = np.linspace(0, 5, 1000)[:, np.newaxis]
y_mean, y_cov = gp.predict(x, return_cov=True)

plt.figure(figsize=(10, 5))
plt.scatter(X_train, y_train, c='r', label='Training Data')
plt.plot(x, y_mean, 'k', lw=2, zorder=9, label='Predicted Mean')
plt.fill_between(x[:, 0], y_mean - 1.96 * np.sqrt(np.diag(y_cov)), y_mean + 1.96 *
                 np.sqrt(np.diag(y_cov)), alpha=0.2, color='k', label='95% Confidence Interval')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Output:

In this code, first generate some sample data points with added noise then define an RBF kernel and create a Gaussian Process Regressor with it. The model is trained on the training data and used to make predictions on the test data. Finally, the results are visualized with a plot showing the training data, the predicted mean, and the 95% confidence interval.

Gaussian Process Regression (GPR)

Regression and probabilistic classification issues can be resolved using the Gaussian process (GP), a supervised learning technique. Since each Gaussian process can be thought of as an infinite-dimensional generalization of multivariate Gaussian distributions, the term “Gaussian” appears in the name. We will discuss Gaussian processes for regression in this post, which is also referred to as Gaussian process regression (GPR). Numerous real-world issues in the fields of materials science, chemistry, physics, and biology have been resolved with the use of GPR.

Table of Content