Concept Drift, Data Drift, and Machine Learning Monitoring: How to Keep Your Model Accurate

Markus Odenthal

--

Photo by Stephen Dawson on Unsplash

Machine learning models are beautiful tools that can predict all sorts of things. However, they are not perfect, and they can sometimes become inaccurate over time. This is often due to something called concept drift of data drift. In this blog post, we will discuss what these terms mean.

Concept drift

You have finished a Machine Learning Project. You have collected data, built a model, and evaluated it. Your boss is happy with the results. But then something happens in Production: The accuracy of your model starts to decrease. This is not good! What went wrong?

One possible explanation is that the phenomenon you are trying to predict has changed over time. For example, if you are predicting whether or not a customer will buy a product, the customer’s behavior may have changed since you collected the training data. This behavior change is called concept drift.

Data Drift

Data drift is a related phenomenon. It occurs when the data your model uses to make predictions changes over time. For example, if you are using customer data to predict whether or not they will buy a product, the data may become stale over time. The customer’s addresses, phone numbers, and other information may change, making it harder for your model to make accurate predictions.

Machine Learning Monitoring

On big problem with Models is that they are not acting like humans. If I show you a pair of shoes that you’ve never seen before, your answer will be, “I don’t know what they are.” But a Machine Learning model doesn’t know its limitations. It will make a (wrong) prediction. So how can you deal with concept drift and data drift? One way is to monitor your Machine Learning models on an ongoing basis. This way, you can detect when accuracy starts to decrease and take action to fix the problem.

This is why it is essential to monitor your Machine Learning models over time. By monitoring, you can detect when the accuracy of your model starts to decrease. This allows you to take action to improve the accuracy of your model before it causes any problems in production.

There are many ways to monitor Machine Learning models, but some standard methods are:

  • Split your data into training and test sets, and evaluate the model on the test set periodically.
  • Use a validation set when training your model, and track the error on the validation set over time.
  • Keep track of actual predictions made by the model in production, and compare them, e.g. via DataDog.

Concept drift and data drift are two phenomena that can cause Machine Learning models to become inaccurate over time. Monitoring is one way to deal with these problems. Thanks for reading.

Please let me know if you have any questions, and stay tuned for future blog posts.

--

--

Markus Odenthal
Markus Odenthal

Written by Markus Odenthal

Natural language processing | Machine Learning & AI

No responses yet