contentsā
In this code tutorial, you will learn how to set up an ML monitoring system for models deployed with FastAPI. You will use Evidently open-source Python library to track production model quality and target drift.
This is a complete deployment blueprint for ML serving and monitoring using open-source tools. You can copy the repository and adapt the reference architecture for your use case.
Code example: if you prefer to head straight to the code, open this example folder on GitHub.
FastAPI is a high-performance web framework for building APIs with Python 3.7+ based on standard Python-type hints. It is easy to use, robust, and fast, making it an excellent choice for serving ML models.
Evidently is an open-source Python library to evaluate, test and monitor ML models. It has 100+ built-in metrics and tests on data quality, data drift, and model performance and helps interactively visualize them.
Combining the two tools, you can deploy an ML model and monitor its performance.
Try our open-source library with over 25 million downloads, or sign up to Evidently Cloud to run no-code checks and bring all the team to a single workspace to collaborate on AI quality.
Sign up free ā¶
Or try open source ā¶
In this tutorial, you will create a FastAPI application to serve ML model predictions, log them to the PostgreSQL database and create an Evidently monitoring dashboard to keep track of the model performance.
By the end of this tutorial, you will learn how to implement an ML monitoring architecture using:
You will run the ML monitoring solution in a Docker environment for easy deployment and scalability.
We expect that you:
You also need the following tools installed on your local machine:
Note: we tested this example on macOS/Linux.
ā
Letās prepare to run the demo! This section explains the instructions in the example README: head there for more details.
1. Fork / Clone the repository
Clone the Evidently GitHub repository with the example code. This repository provides the necessary files and scripts to integrate Evidently, FastAPI, PostgreSQL, and Streamlit.
2. Build the Docker images
Create Docker images for the example. Before building the Docker images, set the USER_ID environment variable to your user ID. You can use this environment variable (USER_ID) in Dockerfiles or docker-compose.yml files to set permissions or perform other user-specific tasks.
3. Launch the monitoring cluster
Launch the monitoring cluster by using Docker Compose.
The docker-compose.yaml specifies the cluster components:
4. Create the database tableĀ
Before storing predictions in the PostgreSQL database, create the necessary tables. Run a Python script below to set up the database structure.
5. Download data & train model
This example is based on the NYC Taxi dataset. The data and the model training are out of the scope of this tutorial. We prepared a few scripts to download data, pre-process it and train a simple machine learning model.
6. Prepare the reference data for monitoringĀ
Some ML monitoring tasks require a reference dataset. For example, you can use data from model validation. You will compare your production data against this reference.
For this demo, we pre-computed the reference dataset. We assume it remains static.
Execute the following command to prepare your reference data:
Now, letās launch the pre-built example that shows how to serve an ML model with FastAPI and monitor it with Evidently.
At this point, you should already have a running fastapi_app.
FastAPI provides interactive API documentation that you can access using a web browser. It shows all available endpoints, their expected parameters and even allows sending test requests directly from the browser.
To view the API methods available for your project, navigate to the following URL: http://0.0.0.0:5000/docs#/
You can explore the different API methods, check their functionality, and try them out. You will see the required request format and expected response format for each method.
Letās generate some predictions using a prepared script.
This script emulates requests sent to the machine learning model. It selects a batch of data entries at random from the data/features/green_tripdata_2021-02.parquet file. Then, it uses these entries to make requests to the /predict endpoint.
We prepared two monitoring reports: a Model Performance Report and a Target Drift Report. You must invoke specific endpoints in the FastAPI application to generate them.
1. Model Performance Report:
This report provides insights into the model performance over time. You can access it via the following endpoint: http://0.0.0.0:5000/monitor-model
2. Target Drift Report:
This report allows tracking changes in the target variable over time. You can access it via the following endpoint: http://0.0.0.0:5000/monitor-target
š You can select other metrics. Evidently has multiple Metrics and Tests to evaluate data quality, data drift, and model performance. Browse through available presets to choose what might be relevant for your use case.
You can specify the size of the prediction data window used to generate the report. To do this, add a window_size parameter to the URL.Ā
For instance, if you want to generate the report based on the last 300 predictions, use the following URLs:
If you do not specify the window_size, the system defaults to using the last 3000 predictions.
These reports provide insight into the ML model's performance and the stability of the target variable. You can check them regularly to identify any unexpected behaviors or trends and take appropriate action.
This is an optional step.
Streamlit helps quickly build and deploy shareable web apps. In our case, we can use Streamlit to create a user-friendly interface to run and display our monitoring reports.
To generate the monitoring reports in Streamlit UI, follow these steps:
The application will take care of the rest and render the requested report.
š Want to understand the integration with Streamlit better? Here is a code tutorial on building ML and data monitoring dashboards with Evidently and Streamlit.Ā
Now, letās explore the ML serving and monitoring architecture with FastAPI and Evidently in detail.
The integration has several interconnected components: the FastAPI server, the PostgreSQL database, the user interface (UI) or HTTP client, and the file system.
Let's look at each component.
FastAPI. FastAPI acts as the main server in this architecture. It exposes several endpoints to generate predictions and run model monitoring.
PostgreSQL. The PostgreSQL database stores the generated predictions. You can later query these predictions using the ML monitoring endpoints to generate the model performance and target drift reports.
HTTP Client / UI: This component is responsible for interacting with the FastAPI server. It sends requests to the /predict endpoint to generate predictions and fetches monitoring reports from the /monitor-model and /monitor-target endpoints. In this demo, we use the Streamlit app to build a simple UI.
File System. Some artifacts are saved into the file system. The reference data is stored in the /data/reference directory. This is necessary to compare the model's current performance against its historical performance. Additionally, the generated monitoring reports are in the /reports directory.
These components provide a streamlined system to serve ML model predictions, log them, and run and store ML monitoring reports.
We use Docker Compose to manage and orchestrate our services: FastAPI application, Streamlit application, and a PostgreSQL database.
Let's take a look at the docker-compose.yaml file:
All three services are connected to the monitoring network, enabling it to access the FastAPI server and the PostgreSQL database. By running these services with Docker Compose, you can ensure that all components work together seamlessly in a controlled and isolated environment.
In the example FastAPI app, we define three key endpoints. Each serves a specific purpose: generate predictions, monitor model quality, and monitor target drift.Ā
The first endpoint is /predict. This endpoint is set up to receive a POST request containing input features. The predict function is designed to receive the request, compute and save predictions into a database.
Here's how it works:
The second endpoint /monitor-model is a GET endpoint that, when accessed, invokes the monitor_model_performance() function. This function is responsible for monitoring the performance of the model.
Here's how it works:
The third endpoint /monitor-target is another GET endpoint. When accessed, it invokes the monitor_target_drift()function to monitor the drift in the target variable. Similar to the monitor_model_performance endpoint, this function accepts an optional window_size parameter and returns an HTML file.
By setting up these three endpoints, you effectively establish an API to generate model predictions and give visibility into the model quality.
The function build_model_performance_report() from the src/utils/reports.py module is responsible for creating the model performance report using Evidently.
Here's a detailed look at what each part of the function does:
This function helps get a comprehensive model performance report using Evidently's built-in metrics for regression analysis. The report provides a detailed look at how the model performs, identifying potential issues and areas for improvement.
Generating Evidently reports. This function follows the Evidently API for creating and customizing Reports. Check out the official docs to explore how Column Mapping works and the additional parameters available for report customization. You might also prefer to include raw data in the visualizations like scatter plots. Ā Ā
The function build_target_drift_report generates a target drift report using Evidently.
It follows a similar structure as the function used to generate the model performance report.
To sum up, this function helps create a detailed target drift report, providing insights into the changes in the target variable over time. This helps identify potential issues and the need for model updates.
š Monitoring without ground truth. This example assumes that actual values are available to compute the true model quality. In practice, they might come with a delay. In this scenario, you can run other checks: such as detecting drift in the model input data and model predictions and monitoring the data quality for any changes. Browse through available Evidently presets and metrics to choose what might be relevant for your use case.Ā
On-demand metrics computation. In this example, metrics computation is integrated into the model serving, and you calculate them on demand. This way, you always get the most up-to-date view of your data and model performance. Since you do not compute metrics continuously, you might also need less storage.Ā
Robust ML monitoring capabilities. Evidently provides comprehensive model monitoring: you can choose from 100+ metrics on data quality, data drift, and model performance, adjusting it to the specifics of your model and data pipelines.Ā
Easy to customize. You can build upon this example architecture, extend it, or replace the specific components. You are not limited to the HTML reports: you can return the metrics computed by Evidently in JSON format, log them, and visualize them in the external system.
Friendly UI. Using a Streamlit app on top can help users browse the reports and simplify access to the metrics.Ā
Static monitoring. With this architecture, you generate the reports with a āsnapshotā of model quality. However, if you want to log and track model metrics in time, you might need to add more components. For example, you can host the Evidently Monitoring interface instead of Reports.
š Track metrics over time. You can add a dynamic visualization layer and host a monitoring dashboard to visualize metrics as they are computed over time. Here is an example of how to do this with Evidently.
Scalability and serving latency. In this example, the metric computation is incorporated within the model-serving application. While convenient, this can potentially affect serving latency, especially in high-load scenarios. To improve performance and scalability, you can decouple model serving and monitoring. For example, you can add a pipeline manager like Airflow or Prefect to schedule regular monitoring tasks.
šļø End-to-end deployment blueprint. Here is a code example that shows how to schedule model and data monitoring jobs and track metrics over time.
The initial monitoring architecture is suitable for a range of applications and is simple to run. However, some of the cons listed above might be important. Consider extending the system design if you care about near real-time monitoring, logging historical model performance, scalability, and improved serving latency.Ā
You can add the following components:
Follow these general steps to extend the solution:
1. Delegate monitoring tasks to pipeline orchestration tools.
Use a tool like Prefect or Airflow (or any other orchestrator) to manage and schedule monitoring jobs at regular intervals. This will separate the ML model serving from monitoring. To do this, you should:Ā
2. Design monitoring dashboards.
You can use the Evidently UI to host a live monitoring dashboard. You can save the Evidently Reports as JSONĀ snapshots, and then launch a live dashboard that will automatically parse data from multiple Reports and help visualize it over time. You can choose which panels to visualize on the dashboard.
You can also design the workflow using other tools, like Grafana. In this case, you need to export the Evidently Metrics and store them in a database like PostgreSQL. You will then connect it as a data source for Grafana and can design dashboards and alerts.
š Want a detailed tutorial with code? Here is the batch ML monitoring blueprint with Evidently, PostgreSQL, Prefect, and Grafana.
This tutorial showed how to build an ML monitoring solution for models deployed with FastAPI.Ā
You can further extend this example:
References
Thanks to Duarte O.Carmo for preparing the blog that inspired this integration example: Monitoring ML models with FastAPI and Evidently AI, Duarte O.Carmo