SGD Classifier Loss Function

The SGD classifier supports the following loss functions:

Hinge Loss: Support Vector Machine

Hinge loss serves as a loss function in training of classifiers. It is employed specifically in ‘maximum margin’ classification with SVMs being a prominent example.

Mathematically, Hinge loss can be represented as :

Here,

  • t – the actual output (class ) required from the classifier(true class)
  • y – the raw output of the classifier or classifier score (not the predicted class)

Lets understand it with the help of below graph:


Hinge Loss


We can identify three cases in the loss function

Case 1: Correct Classification and |y| ≥ 1

In this case the product t.y will always be positive and its value greater than 1 and therefore the value of 1-t.y will be negative. So, the loss function value max(0,1-t.y) will always be zero. This is indicated by the green region in above graph. Here there is no penalty to the model.

Case 2: Correct Classification and |y| < 1

In this case the product t.y will always be positive, but its value will be less than 1 and therefore the value of 1-t.y will be positive with value ranging between 0 to 1. Hence the loss function value will be the value of 1-t.y. This is indicated by the yellow region in above graph. Here though the model has correctly classified the data we are penalizing the model because it has not classified it with much confidence (|y| < 1) as the classification score is less than 1.

Case 3: Incorrect Classification

In this case the product t.y will always be negative therefore the value of 1-t.y will be always positives. So the loss function value max(0,1-t.y) will always be the value given by 1-t.y. Here the loss value will increase linearly with increase in value of y. This is indicated by the red region in above graph.

Modified Huber Loss: Smoothed Hinge Loss

Huber loss is a loss function used in regression. Its variant for classification is called as modified Huber loss.

Mathematically the Huber Loss can be expressed as

if, -1 " title="Rendered by QuickLaTeX.com" height="23" width="91" style="vertical-align: 28px;"> and otherwise.

Here,

  • t – the actual output (class ) required from the classifier(true class)
  • y – the raw output of the classifier or classifier score (not the predicted class)

The modified Huber loss can be graphically represented as:



Modified Huber Loss



For values of yt > -1 (the light red, yellow and green area in the graph) it is basically hinge loss squared.

For values of yt <-1 the loss function value is -4yt indicated by the dark red area in graph.

Log Loss: Logistic Regression

Log loss or binary entropy loss is the loss function used for logistic regression.

Mathematically, it can be expressed as:

Here, we have two classes – Class 1 and Class 0

  • p = probability of the datapoint belonging to class 1
  • is the class label (1 for Class 1 and 0 for Class 0)

Case Class 1: Second term in the equation becomes 0 and we will be left with first term only.

Case Class 0: First term in the equation becomes 0 an wee will be left with the second term only.

Let us understand the log loss with the help of graph:


Log Loss for individual data point


Class 1 – The green line represents Class 1. When the predicted probability is close to 1, the loss approaches zero and when the predicted probability is close to 0, loss approaches infinity.

Class 0 – The blue line represents Class 0. When the predicted probability is close to 0, the loss approaches zero and when the predicted probability is close to 1, loss approaches infinity.

Note the above the graph is the log loss for individual data point and not for the whole equation. Actual loss is obtained by summing the loss values of individual data point and it depends on the probability value .

Different Loss functions in SGD

In machine learning, optimizers and loss functions are two components that help improve the performance of the model. A loss function measures the performance of a model by measuring the difference between the output expected from the model and the actual output obtained from the model. Mean square loss and log loss are some examples of loss functions. The optimizer helps to improve the model by adjusting its parameters so that the loss function value is minimized. SGD, ADAM, and RMSProp are some examples of optimizers. The focus of this article will be the various loss functions supported by the SGD module of Sklearn. Sklearn provides two classes of SGD: SGDClassifier for classification tasks and SGDRegressor for regression tasks.

Similar Reads

SGD Optimizer

Stochastic Gradient Descent (SGD) is a variant of the gradient descent algorithm. Gradient descent is an iterative optimization technique used to minimize a given loss function and find the global minimum or maximum. This loss function can take various forms, as long as it is differentiable. Here’s a breakdown of the process:...

SGD Classifier Loss Function

The SGD classifier supports the following loss functions:...

SGD Regressor Loss Functions

SGD regressor supports the following loss functions....

Implementing Different SGD Loss Functions in Python

Classification...