Machine learning - examples of benchmarks for various business problems

Zespół Sages

4 listopada 2021

Machine learning has been successfully used to solve business problems in a variety of areas. A natural part of the development of algorithms in practice is to improve their effectiveness - if the existing performance can be beaten, the solution can be updated (provided that the new version meets other assumptions made, such as speed of operation).

The challenge arises when we want to apply machine learning to solve a problem for the first time. In such a situation, we have no benchmark to compare against. This means that it is hard to judge whether a solution is of good quality or not, and in particular, it can be a problem to assess the very correctness of the results obtained. How, then, can one tell if a model is working or not?

The key to answering this question is to look for reference points for the business problems being solved. They are the basis to verify the quality of the model. In practice, unfortunately, we do not always keep this in mind, and in particular, people who are new to machine learning sometimes skip this step, and the mere fact of creating a model overshadows reflection on its actual usefulness. We suggest how to check if your model really works.

Machine learning: reference points

A benchmark is something against which we compare the effectiveness of our algorithm. These can be existing solutions to a given problem, but if they don't already exist, we need to determine them ourselves. Looking for reference points is a must if your goal is to create useful models, that is, models that give reliable answers to real problems. This is important to protect those who will make decisions based on model results from serious errors. Just as in some areas the consequences may be less painful in consequences (recommending a movie to watch), in some cases - they can be very serious (investments in financial markets). Depending on the circumstances, defining a reference point can be more or less complicated, but it is always necessary for the sake of ensuring the reliability of model results.

Machine learning: examples of reference points

You already know how important the search for reference points is in modeling. Now you will learn 3 scenarios on how to do this in practice.

Scenario 1: Classification

For the sake of example, let's assume that an insurance company is struggling with the challenge of assessing whether claims submissions are legitimate or an attempted scam. In practice, the vast majority of submissions are not attempted fraud, and attempted extortion may account for, say, 1% of all cases.

Let's assume that an algorithm has been developed that will be used by the insurance company to evaluate the submission, and the effectiveness of the developed classifier (assigning the submission from one of two "classes": genuine submission vs. scam) is 99%. The question arises as to whether this is a "good" algorithm. While it would seem that the efficiency of 99% may indicate that the model is outright sensational, the real answer to the question of its quality is "It depends!".

And it depends precisely on the benchmark. Note that an algorithm that considers all submissions to be fair will have an efficiency of... 99%! Thus, for the model to be useful, its accuracy should exceed the percentage of observations of the most frequent class, i.e. in this case up to 99% of honest submissions. And it is this figure that will be the primary reference here. As an aside, note that in such situations, evaluating an algorithm based on the percentage of correct decisions may not be the best approach.

Scenario 2: Time series

For the sake of example, let's assume that the goal is to build a model that will predict the price of bitcoin over time (e.g., the average price of bitcoin on a given day). To test the quality of the model, we need to compare the real price of bitcoin with the predicted price. Obviously, the closer these values are, the better the model will be. However, to determine whether the model has any value at all, again the question of a benchmark must be asked.

One very simple way to make a forecast without any model is to assume that the price of bitcoin on a given day will be the same as the previous day. If the price changes little dynamically, this way of forecasting can prove to be quite accurate (for our example, we can assume that on average we will be wrong by 1 PLN). However, this is not modeling, but actually "guessing".

So, if we want to create a useful intelligent algorithm that predicts the price of bitcoin, then in this particular case, the reference point for the model can be the described "guess" of the price. This means that the model will be considered valuable if it is wrong by less than PLN 1 on average.

Scenario 3: Recommendation system

Let's assume that our goal is to build a model that, based on historical ratings, will recommend movies for viewers to watch. To be more precise: the goal is to predict how a person will rate a given movie (e.g. on a scale of 1-10). We can use such an algorithm for this, but it will suggest to viewers those titles that they are likely to like. Let's assume that we are wondering how a person will rate a certain movie, and let's look at two perspectives of rating prediction:

Let's start by looking at the past ratings of this person's other films. If, for example, this person has always rated the movies he or she has seen with an 8 or 9, it is very likely that the next movie will rate similarly.
On the other hand, if a given film has always been rated 7 or 8 by the people watching it, then in all likelihood the next person will rate it similarly.

Thus, we can take two points of reference in this case:

the prediction for a given movie as the average of the previous ratings for that movie (a value independent of the user),
a prediction of the movie's rating for a given user as the average of the ratings given so far (a value independent of the movie).

We will consider as successful such a model that will have more accurate predictions than those for the above-mentioned two approaches. Otherwise (if the error is larger), the model will have no merit and will be useless.

Machine learning for beginners

Those just starting out in machine learning should pay special attention to the search for benchmarks. It is worth adding here that, in addition to simple but reasonable methods of "guessing", it is good practice to start with very simple models. These in themselves can be reference points, for more complex solutions created later. So before you start creating advanced tools based on neural networks, make a simple model with solid reference points. If it turns out to be good, do an upgrade and see if the improvements in general add any value to the results. Take care to develop this habit now, and it will pay off in the future.

Machine learning is applicable in virtually every industry. If you want to gain valuable knowledge in this area and gain the competencies of the future, sign up for the Machine Learning Masterclass. Machine learning training for beginners is a chance to learn the most important algorithms and learn how to apply them in practice to solve real business problems.

Read also

26 luty

From Idea to Success: Behind the scenes of organizing Meet Raiff for Raiffeisen Tech

Raiffeisen Tech has been focused on development and collaboration for years. A key element of this strategy is sharing knowledge and ...

20 luty

Microcredentials in HR: The New Era of Competency Verification

One of the biggest challenges in recruitment is effectively verifying the competencies of candidates. Recruiters and team managers of...

29 styczeń

Fastest Growing Occupations: Big Data Specialist

According to the World Economic Forum's Future of Jobs Report 2025, one of the fastest growing occupations between 2025 and 2030 is B...

Return to blogs

Courses catalogue

Open courses schedule

Dedicated courses

E-learning

FAQ

Stacja IT - workshops

Tailor-made Academies

AI & Data Driven Business

UX Design

Data Science

Big Data

Visual analysis

Promotional offers

Vouchers

Founding and co-financing

Chatbot

All

Omega-PSIR

Chatbot

USOS

SSOP

UNIplaner

Employee Portal

Rectorate

What make us different

Our Team

Career

All posts

Case studies

News

Articles

Courses catalogue

Open courses schedule

Dedicated courses

E-learning

FAQ

Stacja IT - workshops

Tailor-made Academies

AI & Data Driven Business

UX Design

Data Science

Big Data

Visual analysis

Promotional offers

Vouchers

Founding and co-financing

Chatbot

All

Omega-PSIR

Chatbot

USOS

SSOP

UNIplaner

Employee Portal

Rectorate

What make us different

Our Team

Career

All posts

Case studies

News

Articles

Machine learning - examples of benchmarks for various business problems

Machine learning: reference points

Machine learning: examples of reference points

Scenario 1: Classification

Scenario 2: Time series

Scenario 3: Recommendation system

Machine learning for beginners

Read also