Understanding Loss Functions

Published in

Heartbeat

6 min readNov 29, 2022

In deep learning or machine learning, the loss function is crucial. Assume you are working on an issue and are prepared to present your client with a machine learning model that you have trained on the dataset. However, how can you be sure that this model will produce the best outcome? Is there a statistic or method you can use to rapidly assess your model against the dataset?

Yes, deep learning or machine learning use loss functions. We will examine several loss functions in this post.

What is the Loss function?

A loss or cost function (also known as an error function) is a function that converts an event or the values of one or more variables into a real number that intuitively represents some “cost” connected to the occurrence.

The Loss function may be thought of as a way to assess how effectively your algorithm models your dataset. It is a mathematical function of the machine learning algorithm’s parameters.

In basic linear regression, slope(m) and intercept are used to calculate predictions (b). The (Yi — Yihat)2 loss function is used for this. The function of slope and intercept is the loss function.

Source: https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-linear-regression/

Why is Loss Function crucial?

Any statistical model relies on loss functions because they define the standard against which the model’s performance is measured and because minimizing a given loss function yields the parameters that the model learns. If the loss function value is lower, the model is good; if not, we must adjust the model’s parameters to reduce loss.

Loss function in Deep Learning

Classification Loss

Binary Cross Entropy Loss

It provides a probability value for a classification task between 0 and 1. The average difference between expected and actual probability is calculated using cross-entropy.

Output:

Where to use Binary Cross Entropy Loss?

One of the examples is Spam Detection, which tells whether an email is spam or not.

Hinge Loss

The hinge loss is a type of cost function in which a margin or distance from the classification boundary is factored into the cost calculation. Even if new observations are correctly classified, they may incur a penalty if the margin from the decision boundary is insufficient. The hinge loss increases in a linear fashion.

These are particularly used in SVM models.

Output:

Where to use Hinge loss ?

The hinge loss function, which was primarily created for use with Support Vector Machine (SVM) models, is an alternative to cross-entropy for binary classification issues. It is designed to be used with binary classification with target values falling between -1 and 1.The hinge loss function assigns greater error when there is a mismatch in sign between the actual and projected class values, encouraging instances to have the right sign.

Real-time model analysis allows your team to track, monitor, and adjust models already in production. Learn more lessons from the field with Comet experts.

Regression Loss

Mean Squared Error/Squared loss/ L2 loss –

The mean squared error (MSE) of a regression line indicates how close it is to a set of points. It accomplishes this by squaring the distances between the points and the regression line (the “errors”). Squaring is required to remove any negative signs. Larger differences are also given more weight. Because you’re calculating the average of a set of errors, it’s called the mean squared error. Accuracy improves with the decrease in value of MSE.

Where to use MSE loss?

The default loss to use for regression problems is the Mean Squared Error, or MSE, loss. If the distribution of the target variable is Gaussian, it is the preferred loss function mathematically speaking within the maximum likelihood inference paradigm. The loss function should be assessed first, and only altered if necessary.

Mean Absolute Error/ L1 loss

The simplest loss function is called the Mean Absolute Error (MAE). The MAE is calculated by averaging the difference between the actual value and the model prediction throughout the whole dataset.

Output:

Note: Use linear activation function at the final neuron in regression.

Where to use MAE ?

The distribution of the target variable in some regression problems may be primarily Gaussian, but it may also contain outliers because it is more resistant to outliers, the Mean Absolute Error, or MAE, loss, is a suitable loss function in this scenario.

Huber Loss

The Huber loss, which is employed in robust regression and is a loss function in statistics, is less sensitive to outliers in the data than the squared error loss.

n — the number of data points.
y — the actual value of the data point. Also known as true value.
ŷ — the predicted value of the data point. This value is returned by the model.
δ — transition point ( from quadratic to linear)

Mean Squared Logarithmic Error Loss:

If we wish to lessen the discrepancy between the expected and actual value of a variable, we can take the mean squared error and the natural logarithm of the predicted value. This will solve the issue the Mean Square Error Method had. The model will now penalize less than it did with the previous approach.

Output:

When to use Logarithmic Mean Squared Logarithmic Error Loss ?

When predicting target values that have a wide range of values, we don’t want to punish our model severely by using mean square; instead, we might first calculate the natural log of each projected value before computing MSE.

Conclusion

This article provided an overview of the loss functions that are applied to classification and regression issues. One must realize that there is no one-size-fits-all solution and that selecting a loss function is just as crucial as selecting the appropriate machine learning model for the particular task.

The complete code of the above implementation is available at this notebook.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.