What is Gaussian Process Regression (GPR)?

Gaussian Process Regression is the process of making predictions about an unknown function and dealing with regression problems with the help of statistical methods. GPR allows for flexibility it can model complex and non-linear relationships without explicitly defining a formula for the underlying function. Here in GPR, the predictions are treated as probability distributions over possible functions. Before getting the data GPR makes a wild guess about what the unknown function might look like, as the data component is accumulated the beliefs about the unknown function change and the model adapts to the data and the predictions start becoming more certain. After incorporation of data with the GPR model, it gives a posterior distribution which is a close estimate of the underlying function. When we make predictions for new, unseen data points, GPR provides a range of possible values with associated confidence intervals.

Kernel Function:

The kernel function is the most important component of a Gaussian Process Regression, it defines the measure of similarity between the data points and influences the shape and behaviour of the resulting Gaussian process. The choice of kernel influences the flexibility of the model, different kernel functions allow the GPR model to capture different relationships between data points. The kernel function also gives us rough knowledge about the smoothness, periodicity and other properties of the model before training the model. Therefore, the consideration of kernel function becomes an important part of the Gaussian Process in any case. Let’s discuss some of the most common kernel functions that we can use in modelling data in the Gaussian Process.

1. Radial Basis Function (RBF): The RBF kernel is often used when the function is expected to be smooth and continuous. It implies that the data points close to each other in the input space will have similar output values. With the appropriate choice of hyperparameter, RBF can capture the significance of any function, given the input data is enough. RBF kernel makes the function infinitely differentiable; this means the function is differentiable at any point in the input space it has been given. We can represent the RBF kernel as:

.

Where,

  • k(x, x’) is the value of RBF kernel between two points x, x’.
  • ||x-x’|| is the Euclidean distance between two points x, x’.
  • l is the length scale parameter.

Here ‘l’ is the length scale parameter, which determines the width of the kernel and tells us about how quickly the similarity between the points decreases with distance. The choice of ‘l’ is often determined through hyperparameter optimization during the training process.

2. Matérn Kernel:

The Matérn kernel is more flexible than RBF kernel as it allows more flexibility in the smoothness of the function. In the Matérn kernel we use the shape parameter () which determines how quickly the correlation between points decay with distance. The higher the value of shape parameter () the higher the smoothness the function will have, as tends to infinity the Matérn kernel converges to the RBF kernel. We can represent the Matérn kernel as:

Here,

  • k(x, x’) is the value of Matérn kernel between two points x, x’.
  • ||x-x’|| is the Euclidean distance between two points x, x’.
  • l is the length scale parameter.
  • is the shape parameter.
  • is the gamma function
  • is the modified Bessel function of second kind.

3. Linear Kernel: Linear kernel is useful in capturing the linear relationship between the input and output, it helps in capturing the linear trends in the data. It is represented as:

4. Periodic Kernel: Periodic kernel is useful in modelling periodic relationships in data, it helps in capturing the periodic trends in the dataset. Periodic kernel is represented as:

5. Constant Kernel: Constant kernel states that there is a constant relationship between the data points which does not vary. Constant kernel can be represented as:

6. Rational Quadratic Kernel: This kernel could be seen as a scale mixture of RBF kernel with different length-scales. This kernel is especially useful when the data exhibits behavior at multiple length-scales. The Rational Quadratic kernel function for two points x and x’ could be defined as:

Where, is the scale mixture parameter, which controls the weight of large-scale versus small-scale variations in the data.

7. White Kernel: This kernel plays a special role in modeling the noise aspect of the data. Unlike other kernels that define the shape or structure of the underlying function, the White Kernel specifically addresses the noise properties of the observations. White kernel could also be represented as:

Where, is the noise level or variance.

is 1 if x=x’ otherwise it is 0.

Gaussian Process Regression (GPR) on Mauna Loa CO2 data

In article explores the application of Gaussian Process Regression (GPR) on the Mauna Loa CO2 dataset using Scikit-Learn.

Similar Reads

What is Gaussian Process Regression (GPR)?

Gaussian Process Regression is the process of making predictions about an unknown function and dealing with regression problems with the help of statistical methods. GPR allows for flexibility it can model complex and non-linear relationships without explicitly defining a formula for the underlying function. Here in GPR, the predictions are treated as probability distributions over possible functions. Before getting the data GPR makes a wild guess about what the unknown function might look like, as the data component is accumulated the beliefs about the unknown function change and the model adapts to the data and the predictions start becoming more certain. After incorporation of data with the GPR model, it gives a posterior distribution which is a close estimate of the underlying function. When we make predictions for new, unseen data points, GPR provides a range of possible values with associated confidence intervals....

Gaussian Process Regression on Mauna Loa CO2 data

Now let’s get to the point of applying GPR on the Mauna Loa CO2 data. Moana Loa is a volcano present in the US state of Hawaii. Mauna Loa is considered as Earth’s largest active volcano in terms of volume and the area covered. Mauna Loa, due to its remote location is known for the atmospheric research which measures the continuous monitoring of carbon dioxide (CO2) levels on its surface. The dataset observing the increase in the CO2 concerns the Global Warming aspect and the involvement on human activities in it....