contents
Our CTO, Emeli Dral, was an instructor for the ML Monitoring module of MLOps Zoomcamp 2024, a free MLOps course from DataTalks.Club. In case you missed it or fancy a refresher, we put together ML monitoring module course notes for you. Let’s recap!
MLOps Zoomcamp is a free hands-on MLOps course from DataTalks.Club, a global online community of data enthusiasts with over 40,000 members.
The course covers MLOps best practices and the ML model lifecycle end-to-end, from experiment tracking to building ML pipelines to monitoring machine learning models after deployment. The curriculum of MLOps Zoomcamp is practice-oriented. Students implement best MLOps practices from experiments to production, working with open-source MLOps tools like MLflow, Prefect, Grafana, and Evidently.
MLOps Zoomcamp is built for data scientists and ML engineers, as well as software engineers and data engineers interested in production machine learning.
The course includes six modules. In this blog, we prepared a summary of the ML Monitoring Module. It is self-contained, so you can go through this module without taking the previous ones.
Sign up for our Open-source ML observability course. Designed for data scientists and ML engineers. Yes, it's free!
Save my seat ⟶
Machine learning model monitoring is an essential component of MLOps. Many things can go wrong once you deploy ML models to the real world. To detect and resolve issues before they affect your production ML service, you need an ML monitoring system.
MLOps Zoomcamp ML monitoring module is just about that! It covers the basics of monitoring ML models in production and demonstrates how to implement an ML monitoring system using open-source tools step-by-step.
The Module includes eight videos. The first video goes through the key ML monitoring concepts. It also introduces an ML monitoring architecture that uses Evidently for metric calculation and the Grafana dashboard for metric visualization. The following videos walk through the code implementation. They cover training a simple ML model, designing an ML monitoring dashboard, and going through the debugging process when drift is detected.
Below, we will summarize the course notes and link to all the practical videos.
Let’s take a look at the key monitoring concepts!
Video 1. MLOps Zoomcamp: Introduction to ML monitoring, by Emeli Dral.
When monitoring a production service, one usually keeps tabs on service health metrics like uptime, memory, or latency. While service health monitoring is a must, an extra layer is related to the data and the ML model itself.
ML model performance
ML model performance metrics help to ensure that ML models work as expected. A particular set of metrics will depend on the use case. For example:
Data quality and integrity
Often, ground truth is not available immediately to calculate ML model performance metrics. In this case, you can use proxy metrics. In most cases, when something is wrong with the model, this is due to data quality and integrity issues. Some metrics to look for are the share of missing values, column types, and value range for each column.
Data drift and concept drift
Even if the data is fine, you can still face some problems. ML models work in the real world, and things change. To ensure ML models are still relevant, you can look at data and concept drift. Distribution changes between the current and reference data may signal potential problems with the model.
To sum up. Service health, ML model performance, data quality and integrity, and data and concept drift are good starting points for monitoring ML models in production. Depending on the use case and available resources, you can introduce more comprehensive monitoring metrics like model fairness, bias, outliers, explainability, etc.
If there are already some production services deployed that you monitor, or you use some business intelligence tools, consider reusing existing systems to start with ML monitoring. Depending on the infrastructure and systems in place, you can:
To sum up. Reusing the existing monitoring architecture for ML models can save time and resources as you don't need to build a new monitoring system from scratch. You can start by adding a couple of dashboards and expand to a more sophisticated system later.
The way we deploy our models influences how we implement ML monitoring.
Batch models allow calculating metrics in batch mode. For example, to calculate drift metrics, you need to compare two distributions: a reference data set (i.e., validation data or a previous batch) and the most recent batch of data. Model quality metrics (i.e., precision and recall) can also be calculated on top of a data batch.
Non-batch models (e.g., models operating as REST API services) are more complicated. While metrics like missing values or range violations can be calculated in real-time, for data drift or model performance, generating a batch of data is recommended to calculate those metrics.
Pro-tip. You can use window functions for non-batch ML models to perform statistical tests on continuous data streams. Pick up the window function (i.e., moving windows with or without moving reference), choose the window and step size, and “compare” the windows.
As a practice for the MLOps Zoomcamp ML Monitoring module, we implemented an ML monitoring scheme that can be used for batch and non-batch machine learning models. The following videos will explain step-by-step how to:
It’s practice time! Let's now walk through an end-to-end example to connect the dots and implement the ML monitoring scheme using Evidently, Prefect, and Grafana.
You can find the complete code in the MLOps Zoomcamp GitHub repository.
Video 2. MLOps Zoomcamp: Environment setup, by Emeli Dral.
In this video, we set up the environment for the machine learning monitoring system.
Outline:
00:00 Create a working environment and requirements
02:00 Create and configure Docker Compose
03:35 Configure services: PostgreSQL, Adminer, and Grafana
07:15 Create and test services with Docker Compose
That’s it! You have successfully created your working environment, installed Python packages, and created a Docker Compose file.
Video 3. MLOps Zoomcamp: Prepare reference data and model, by Emeli Dral
In this part, we prepare a reference dataset and train a baseline model to use as a reference point in calculating ML monitoring metrics.
Outline:
01:31 Import libraries
04:28 Download and load data
11:30 Preprocess data, filter out outliers, check target function distribution
13:25 Select features and train a linear regression model
17:45 Evaluate ML model quality
18:50 Create a reference dataset
Done! Now, we have a reference dataset and a baseline ML model to simulate the production use of our prediction service.
Video 4. MLOps Zoomcamp: ML monitoring metrics calculation, by Emeli Dral
In this video, we use Evidently open-source library to calculate ML monitoring metrics.
Outline:
00:00 Introduction to Evidently library: Reports and Metrics
04:40 Generate and display Evidently Report in HTML
06:00 How to interpret Evidently Report: data drift detection example
06:50 Display Evidently Report as a Python dictionary and derive selected values
That’s it! We calculated ML monitoring metrics and learned how to display an Evidently Report and derive values from it.
Video 5. MLOps Zoomcamp: Evidently Monitoring Dashboard, by Emeli Dral
In this video, we use Evidently open-source library to build a monitoring dashboard for your data and ML models.
Outline:
00:00 Import Evidently Data Quality preset
01:35 Create workspace and project
03:11 Build Evidently data quality Report and visualize the results
06:05 Add the report to our project and call Evidently UI
08:50 Configure Evidently monitoring dashboard and add panels
14:52 View the resulting dashboard in Evidently UI
That’s it! We created an ML monitoring dashboard, learned how to configure it and display it in Evidently UI.
Video 6. MLOps Zoomcamp: Dummy monitoring, by Emeli Dral
In this video, we create dummy metrics and set up a database for our Grafana dashboard.
Outline:
00:00 Create a Python script for dummy metrics calculation
03:10 Prepare a database and create a table for dummy metrics
06:00 Calculate dummy metrics and load them into the table
07:40 Create a cycle and define the main function
09:00 Add sending timeout to simulate production usage of the service
10:00 Test the script: access the PostgreSQL database and create a dashboard in Grafana
Congratulations! Our configuration files are now correct: we can access our database, load the data, and build a dashboard in Grafana.
Video 7. MLOps Zoomcamp: Data quality monitoring, by Emeli Dral
In this video, we create an actual dashboard for drift detection.
We will use Evidently to calculate the monitoring metrics:
We will use Prefect to orchestrate calculating and storing drift metrics. We will store these metrics in a PostgreSQL database and visualize them using Grafana.
Outline:
00:00 Alter the script and load the reference data and the model
02:45 Create Evidently data drift report and derive values of selected metrics
07:00 Test and debug the script
08:30 Transform the script to Prefect pipelines
10:40 Build and customize the Grafana dashboard
Video 8. MLOps Zoomcamp: Save Grafana dashboard, by Emeli Dral
In this video, we show how to save the Grafana dashboard so we can load it every time we rerun the Docker container without rebuilding the dashboard from scratch.
Outline:
00:00 Save and reuse Grafana dashboard configs
03:30 Rerun Docker and access the saved dashboard in Grafana
Video 9. MLOps Zoomcamp: Debugging with Evidently Reports and Test Suites, by Emeli Dral
In this video, we use the open-source Evidently library to go through the debugging process when data drift is detected.
Here is a quick refresher on the Evidently components we will use:
Outline:
00:00 How to use Evidently to debug ML models and data
04:20 Load data and model
06:50 Run Evidently data drift Test Suite and visualize the results
09:50 How to interpret the results and analyze data drift with Test Suites
13:30 Build Evidently data drift Report and visualize the results
14:15 How to interpret the results and analyze data drift with Reports
Sign up for our Open-source ML observability course. Designed for data scientists and ML engineers. Yes, it's free!
Save my seat ⟶
ML monitoring is a crucial component of MLOPs. It helps to ensure that machine learning models remain reliable and relevant to the environment in which they function. By tracking data inputs, predictions, and outcomes, we can get visibility into how well the model is doing and resolve issues before they affect an ML service performance.
Start small and add complexity as you scale. Service health, ML model performance, data quality and integrity, and data and concept drift are good starting points for monitoring ML models in production. You can start small and introduce more comprehensive monitoring metrics iteratively.
You can implement the complete ML monitoring workflow using open-source tools. This tutorial demonstrated how to build an ML monitoring scheme for batch and non-batch machine learning models using Evidently, Prefect, PostgreSQL, and Grafana.