What are cost functions?
3 min read

What are cost functions?

If you're interested in honing your skills and knowledge in programming and machine learning, subscribe to my newsletter to stay updated with each new article

Basic explanation

For every machine learning model that we use, we also have to measure its accuracy. Cost functions are a way to do that: they are functions whose variables are the data points and the fitted parameters of the model used (this is important as those are the variables we'll optimize to get a more accurate model).
For each data point, the cost function will use the model to get a "predicted" data point and measure the distance between the real and the predicted data point (an error). Once this is done for each data point, the cost function will spit out a final cost value. The higher this value is, the more error there is, therefore, the whole idea to find an accurate model is to minimize the value of the cost function

Step-by-step example

This might be not very clear, so let's go through an example step by step. For this example, we'll use the Mean Squared Error function (MSE), which looks like this:

For each data point, yk is the real value and ŷk is the value predicted by the model.

Now, let's introduce data points and a model. We'll use a classic example of linear regression.

The points we'll use are (1,2), (3, 1.5), (4,4) and as for the linear model, we'll choose a linear function represented by the equation y=1.5x (this is definitely not the optimized model, but we'll still be able to calculate the value of the cost function). This is how it looks on a graph:

To calculate the cost value of this model using the MSE function, for each of the points, we'll have to find the point predicted by the model, for example, using the first point (1, 2): the model used here is y=1.5x, so for x=1, we'd find y=1.5 and the corresponding point would be (1, 1.5). Therefore, we could rewrite (yk - ŷk) as
(yk - 1.5xk).
now let's calculate this value for each data point here:

point x y (y-1.5x) (y-1.5x)2
(1,2) 1 2 0.5 0.25
(3,1.5) 3 1.5 -3 9
(4,4) 4 4 -2 4

Now, if we sum that last column, we'd get 4+9+0.25=13.25.
Finally, in the formula, we divide by N, which is the number of data points (hence the name, mean squared error): Here, we'd get 13.25/3 = 4.42

And this is how you get a cost value step by step.

Minimizing a cost function

As we can see on the graph and by the result of the cost value, the model is definitely not optimized and the cost could be reduced. To see how this would be done, read the article on gradient descent

Example of cost functions

In the previous example, we use the MSE cost functions, but there is a plethora of cost functions, here are two other basic cost functions that can be used:

Mean absolute error
Root mean squared error


The important notions to remember are:

  • cost functions are multivariable functions, whose variables are the data points (those variables are usually fixed) and the fitted parameters of the model (these are the variables that will change and make the cost value vary)
  • The smaller the cost value is, the more accurate our model is
  • There are multiple cost functions to choose from

Thanks for reading until the end! subscribe to my newsletter to stay updated which each new article!