contents
In this tutorial, you will learn how to run batch ML model inference and deploy a model monitoring dashboard for production ML models.
It is a simple solution that uses two open-source tools: Evidently and Prefect. The tutorial contains an end-to-end code blueprint with all the steps: from model training to batch model serving and production ML monitoring.
You can copy the repository and use this reference architecture to adapt for your use case.
Code example: if you prefer to head straight to the code, open this example folder on GitHub.
Batch inference involves making predictions on a group of observations at once. You typically schedule batch prediction jobs on a schedule, such as hourly or daily. The predictions get stored in a database and are then accessible to consumers.
Batch inference is a good option when predictions are needed at set intervals and not necessarily in real time: such as periodic demand forecasting.
Production ML monitoring. Once you deploy an ML model in production, you must track its performance. This typically means tracking the ongoing model quality, like accuracy or mean error. However, it is not always possible due to delayed feedback. For instance, when forecasting next week's sales, you must wait until the end of that period to calculate the error.
Because of this, you might also need early monitoring:
For batch model inference, you can implement ML monitoring as scheduled batch jobs. In this tutorial, you will learn how to run such data validation, drift detection and ML model quality checks in a production pipeline.
This tutorial shows how to run batch ML inference, monitoring jobs and deploy a dashboard to track the model performance over time. By the end of this tutorial, you will know how to implement a batch ML monitoring architecture using:
You will be able to run the complete tutorial locally.
Here is a brief overview of the architecture. You will:
You can later use this reference architecture to expand and customize for your use case.
Note: you can also use the same batch monitoring architecture even if you deploy an ML model as a service. You can run a set of monitoring jobs over the prediction logs.
We expect that you:
To follow this tutorial, you'll need the following tools installed on your local machine:
Note: we tested this example on macOS/Linux.
This section explains the instructions in the example README. Check the original README file for more technical details and notes.
1. Fork / Clone the repository
First, clone the Evidently GitHub repository containing the example code:
2. Launch Prefect
Launch the Prefect application by using Docker Compose. This app is responsible for running monitoring pipelines that calculate the monitoring reports.
3. Train the model and prepare the “reference” dataset
This example is based on the NYC Taxi dataset. The data and the model training are out of the scope of this tutorial. Therefore, we prepared a few scripts to download data, preprocess, and train a machine-learning model.
This script generates a simple machine learning model that solves a regression problem: predicting the duration of the trip (in minutes) based on features like distance, fare price and number of passengers. We create a new forecast each hour and assume the ground truth is available with an hourly delay.
In this script, we also prepare a reference dataset: a representative dataset that shows expected feature behavior. It will serve as a baseline for data drift detection.
4. Run inference and monitoring pipelines
Execute the scheduler script to run the Prefect flows for inference and monitoring.
The scheduler.py script runs the following pipelines
For simplicity, the scheduler.py script uses the following hardcoded parameters to schedule other pipelines.
By fixing the parameters, we ensure the reproducibility. When you run the tutorial, you should get the same visuals. We will further discuss how to customize this example.
5. Explore the scheduled pipelines in the Prefect UI
Access the Prefect UI by navigating to http://localhost:4200 in a web browser. The Perfect UI shows the executed pipelines and their current status.
You executed the batch model inference and monitoring pipelines in the previous step. This means you have all the relevant information about the model and data quality. Now, let’s take a look at the monitoring dashboard!
1. Launch the Evidently UI
Launch the Evidently application by using Docker Compose. This app is responsible for visualizing the metrics computed from the monitoring jobs.
2. Explore the model performance
Open the Evidently monitoring dashboards by visiting http://localhost:8001 in a web browser. You can now access the dashboards that show the model and data quality.
There are four different dashboards designed for this example.
Note: in this example, each dashboard exists as a separate Project. We did it for demonstration purposes. In practice, you can log all the data related to one model to a single project and visualize it on the same dashboard.
You may open each dashboard to get an overview of the metric change over time and access the underlying Reports that sum up the model performance for a specific period.
Here is an example of a Model Quality dashboard:
Here is an example of the Target Drift dashboard that shows the changes in the behavior of the model target.
Note: this example includes several dashboard types to demonstrate the tool's capabilities. This metric selection is not fixed: you can customize the metrics and visualizations for your specific use case.
Now, let’s look at the code in more detail to understand how the backend and frontend of this monitoring setup work together, and how you can customize it to your needs.
There are three Prefect pipelines to monitor input data quality, model predictions, and model performance.
Prefect executes three pipelines at different time intervals (T-1, T, and T+1). You make new predictions for each period, run input data checks, and monitor model performance.
The pipelines perform the following tasks:
You use the Evidently Python library to calculate different metrics inside each monitoring job: by generating a JSON Report with the selected metrics. The resulting JSON that contains a quality summary for a particular period is called a snapshot.
Let’s explore this logging part in more detail!
To illustrate what happens inside, let’s look at the data quality monitoring pipeline that tracks the quality and stability of the input data.
The monitor_data Perfect flow orchestrates the data monitoring process. It takes a timestamp ts and an interval (in minutes) as input arguments. On every flow run, it calculates the JSON snapshots with the data quality metrics using Evidently and logs them to a specific directory.
Here is how the logging backend works:
To be able to later display the metrics in the Evidently UI, you need to log related JSON snapshots to the correct directory. It will serve as a data source for the monitoring service.
You can create a Project inside an Evidently Workspace to easily group related snapshots. On every monitoring run, you generate a new Evidently Report and associate it with the same Project ID inside the Workspace. This way, you will automatically save the Report to the directory corresponding to this Project in the JSON snapshot format. Later, you can pull any metrics stored in the snapshots and visualize them over time.
Each Project has its monitoring dashboard in the Evidently UI. Typically, you can create one Project per ML model. In this scenario, you'd save all sorts of metrics (data quality, drift, model quality, etc.) together and visualize them on different panels of the same monitoring dashboard. However, it is entirely up to you. For example, in this tutorial, we decided to create separate Projects for each type of monitoring and get different dashboards.
Want to understand the Projects and Workspaces better? Check out the dedicated Evidently documentation section on Monitoring.
To sum up, to create a snapshot that will serve as a data source for the monitoring dashboard, you need to:
The code snippet below from the src/pipelines/monitor_data.py shows how to generate a data quality report using Evidently and log it to the “Data Quality” Project.
A workspace may have multiple Projects, each with its own monitoring dashboard. Every Project directory contains snapshots and metadata.
Now, let’s explore the generate_data_quality_report() function.
This task computes two metrics to evaluate the missing values and share of drifted columns. It takes the current data, reference data, numerical features, categorical features, and the prediction column as input arguments and computes the Report.
Here is a visual representation of a single Report logged as a snapshot during the data monitoring flow:
Want a different set of Metrics? In this example, we picked two metrics. However, you can select other metrics or presets. For example, pick a DataDriftPreset to log individual feature drift scores. You can also log Test Suites instead of Reports and capture the pass or fail results for any checks executed in the monitoring pipeline. You can refresh your knowledge on Reports and Test Suites with the Get Started tutorial for Reports and Test Suites.
After generating the Report for each batch of data, you save it as a JSON snapshot to the Data Quality Project, as shown above.
This way, all the snapshots generated on every run are collected together. You can view the individual Reports for each batch in the UI.
The Model Monitoring pipeline follows the same logic but computes a different set of Metrics related to model quality.
After you log the snapshots, you must choose which ones to display on the Dashboard. Each might contain multiple monitoring panels: you can select which metrics to display and how.
The code snippet below from src/utils/evidently_monitoring.py demonstrates adding counters and line plots to a dashboard. To add each monitoring panel, you must choose the panel type and specify the metric values to pull from the saved snapshots:
Every time you add or update a dashboard, you must update the Project in Evidently workspace.
The code snippet below from the src/utils/evidently_monitoring.py demonstrates how to update the Data Quality Project:
This example showed an end-to-end ML monitoring process implemented as a set of batch jobs. You logged the resulting metrics as JSON snapshots and created an ML monitoring dashboard to show values over time.
You can take this example as an inspiration and adapt it for your ML model by following these general guidelines:
By following these guidelines, you can adapt this example to suit your specific project needs. This will enable you to build a robust, scalable, and maintainable monitoring pipeline that ensures optimal model performance and reliability.
Try our open-source library with over 20 million downloads, or sign up to Evidently Cloud to run no-code checks and bring all the team to a single workspace to collaborate on AI quality.
Sign up free ⟶
Or try open source ⟶
This tutorial demonstrated running batch ML monitoring jobs and designing an ML monitoring dashboard.
You can further work with this example: