SGD Regressor Loss Functions

SGD regressor supports the following loss functions.

Squared Error: Ordinary Least Squares

The ordinary least squares is the square of the difference between the actual value and predicted value.

The lost function can be mathematically be expressed as:

Here,

  • y- it is the actual value
  • f(x) – it is value predicted by the model

It tends to penalize model more and more for larger differences thereby giving more weight to outliers

Graphically, it can be represented for one point as below:

Squared Error

Huber Loss: Robust Regression

The mean squared error (MSE) or squared error gives too much importance to outliers and Mean Average error (MAE) (here instead of squaring we take absolute value of errors) gives equal weightage to all points. Huber loss combines MSE and MAE to give best of both wold- it is quadratic(MSE) when the error is small else MAE. For a loss value less than delta we use MSE and for loss value greater then delta we use MAE. The delta value is a hyperparameter.

The equation of Huber loss is given by:

Here,

  • y = observed value
  • = predicted value

The use of delta in the second part of the equation is to make the equation differentiable and continuous.

Huber Loss

Epsilon Insensitive: Linear Support Vector Regression

The epsilon insensitive loss can be mathematically be expressed as:

Here,

  • – the actual output (class ) required from the classifier(true class)
  • – the raw output of the classifier or classifier score (not the predicted class)

The value of epsilon determines the distance within which errors are considered to be zero . The loss function ignores error which are less than or equal to epsilon value by treating them zero.

Thus the loss function effectively forces the optimizer to find such a hyperplane that a tube of width epsilon around this hyperplane will contain all the datapoints.

Different Loss functions in SGD

In machine learning, optimizers and loss functions are two components that help improve the performance of the model. A loss function measures the performance of a model by measuring the difference between the output expected from the model and the actual output obtained from the model. Mean square loss and log loss are some examples of loss functions. The optimizer helps to improve the model by adjusting its parameters so that the loss function value is minimized. SGD, ADAM, and RMSProp are some examples of optimizers. The focus of this article will be the various loss functions supported by the SGD module of Sklearn. Sklearn provides two classes of SGD: SGDClassifier for classification tasks and SGDRegressor for regression tasks.

Similar Reads

SGD Optimizer

Stochastic Gradient Descent (SGD) is a variant of the gradient descent algorithm. Gradient descent is an iterative optimization technique used to minimize a given loss function and find the global minimum or maximum. This loss function can take various forms, as long as it is differentiable. Here’s a breakdown of the process:...

SGD Classifier Loss Function

The SGD classifier supports the following loss functions:...

SGD Regressor Loss Functions

SGD regressor supports the following loss functions....

Implementing Different SGD Loss Functions in Python

Classification...