contents
TL;DR: Data and prediction drift often need contextual interpretation. In this blog, we walk you through possible scenarios for when you detect these types of drift together or independently.
Production machine learning systems have issues. But sometimes, you cannot even learn about them! You are just making predictions without immediate feedback. In this case, it is usual to keep tabs on the model inputs and outputs as a proxy of model performance.
But what if one of the alarms fired, but the other did not? Let's look at how you can interpret it.
Let's first cover the bases.
Data drift is the change in the input data distributions. One can also call it feature or input drift. For example, your model uses weather data as inputs. It used to be freezing and now is tropical. Data drift detected!
Output drift is the change in the model predictions, recommendations, or whatever else it returns. For example, the model rarely suggested shoppers buy sunglasses, and now it is in every recommendation block. Prediction drift detected!
You can evaluate both input and output drift with the help of statistical tests to compare the "new" data distributions to the "old." Or, you can rely on simpler descriptive statistics.
Assuming you have both tests set up, how can you interpret them together?
It is natural to pay attention to the output drift as a first-class citizen. Many of the data issues would make the model predictions go rogue anyway.
But what if the model predictions are stable, but the data inputs shift significantly? Is it something you should worry about?
As usual with machine learning, it depends on the context.
There are two ways how we can interpret it: positive or negative.
Positive interpretation: the model is all set, and the drift does not matter!
There is no need to adjust or retrain the model in either case. You might want to adjust the data drift detection approach, though. For example, you might limit drift detection only to the most important features, change the size of comparison windows, or pick a less sensitive statistical test.
Negative interpretation: the model is unreasonable!
In this case, relying on the seeming stability of the model output would have been a mistake. You need to intervene to retrain or rebuild the model!
Panic mode! Everything changed overnight:
It might not be as bad as it sounds, though. Let's consider both cases.
Positive interpretation: model handles drift well!
In this case, there is no need to intervene. Yes, the reality changed, and the model predictions too, but the model behavior follows the expectations. Like in our fictitious example, when an e-commerce system starts up-selling sunglasses in response to the sunny weather outside. If changes continue accumulating, you might need to calibrate or rebuild the model, but for now, it's good to go!
Negative interpretation: things have gone rogue!
That is probably the first idea to cross your mind when all alarms fire. And that might be true sometimes.
In this case, you need to intervene. You should start from the investigation of causes and choose the appropriate action: solve the data quality issues, retrain or rebuild the model.
A more concerning example that might not feel right. What if the predictions drifted while the features look stable?
Output drift is always a solid signal to dig deeper. In most cases, this situation would mean some bug, data quality issue, or wrongly set drift detector.
In the first scenario, this is a symptom of an error. The drift detector does not work right.
For example, there might be a bug in the code that performs the drift test. You might accidentally point to the incorrect reference dataset. Or the evaluation job fails without an alert. It is best to pick reliable tooling!
If you run the prediction drift test after making some transformations to the model output, this transformation code or business logic might be the culprit, too.
In the second scenario, this is the signal to review the sensitivity of your drift test.
You might often tune prediction drift and data drift thresholds independently. For example, you adjust your input drift detector to only react to major shifts to avoid alert fatigue. In this case, it might not fire when you have a lot of small changes in each feature. But this drift is real, and it already affects the model outputs.
In another situation, the opposite might be true. Your output drift detector might be overly sensitive and react even to a minor variation. If you don't mind a few false positives, that might still be the best approach for highly important use cases.
In any case, this discrepancy between data and prediction drift is a good signal to run an investigation or review your settings.
Positive scenario: awesome! Let's grab a coffee.
Negative scenario: did the drift detection job even work?
Okay, let's not dig into that too much. Nevertheless, it's always a good practice to re-evaluate your monitoring approaches every once in a while to avoid false negatives.
Drift detection is a nuanced thing! Here are some takeaways on evaluating ML model inputs and outputs drift.
Stay tuned for our following blogs, where we discuss the statistical approaches behind drift detection!
Have a question about production machine learning? Join our Discord community and ask in the #ds-ml-questions channel.
Try our open-source library with over 20 million downloads, or sign up to Evidently Cloud to run no-code checks and bring all the team to a single workspace to collaborate on AI quality.
Sign up free ⟶
Or try open source ⟶