🎓 Free introductory course "LLM evaluations for AI product teams". Save your seat
Product
LLM observability
Evaluate LLM-powered products, from RAGs to AI assistants.
ML observability
Monitor data drift, data quality, and performance for production ML models.
Open-source
Open-source Python library for ML monitoring with 20m+ downloads.
Pricing
Docs
Resources
Blog
Insights on building AI products
LLM benchmarks
100+ LLM benchmarks and datasets
Tutorials
AI observability and MLOps tutorials
ML and LLM system design
500 ML and LLM use cases
Guides
In-depth AI quality and MLOps guides
ML and AI platforms
45+ internal ML and AI platforms
Community
Get support and chat about AI products
Course on LLM evaluations for AI product teams
Sign up now
Get demo
Sign up
GitHub
Get demo
Sign up
All blogs about
#ml-testing
Community
AI regulations: EU AI Act, AI Bill of Rights, and more
In this guide, we’ll discuss key AI regulations, such as the EU AI Act and the Blueprint for AI Bill of Rights, and explain what they mean for teams building AI-powered products.
Community
LLM hallucinations and failures: lessons from 4 examples
Real-world examples of LLM hallucinations and other failures that can occur in LLM-powered products in the wild, such as prompt injection and out-of-scope usage scenarios.
Community
When AI goes wrong: 10 examples of AI mistakes and failures
From being biased to making things up, there are numerous instances where we’ve seen AI going wrong. In this post, we’ll explore ten notable AI failures when the technology didn’t perform as expected.
Tutorials
Wrong but useful: an LLM-as-a-judge tutorial
This tutorial shows how to create, tune, and use LLM judges. We'll make a toy dataset and assess correctness and verbosity. You can apply the same workflow for other criteria.
Evidently
Meet Evidently Cloud for AI Product Teams
We are launching Evidently Cloud, a collaborative AI observability platform built for teams developing products with LLMs. It includes tracing, datasets, evals, and a no-code workflow. Check it out!
Community
45 real-world LLM applications and use cases from top companies
How do companies use LLMs in production? We compiled 45 real-world LLM applications from companies that share their learnings from building LLM systems.
Tutorials
LLM regression testing workflow step by step: code tutorial
In this tutorial, we introduce the end-to-end workflow of LLM regression testing. You will learn how to run regression testing as a process and build a dashboard to monitor the test results.
Tutorials
Watch the language: A tutorial on regression testing for LLMs
In this tutorial, you will learn how to systematically check the quality of LLM outputs. You will work with issues like changes in answer content, length, or tone, and see which methods can detect them.
Community
MLOps Zoomcamp recap: how to monitor ML models in production?
Our CTO Emeli Dral was an instructor for the ML Monitoring module of MLOps Zoomcamp 2024, a free MLOps course. We summarized the ML monitoring course notes and linked to all the practical videos.
Community
AI, Machine Learning, and Data Science conferences to attend in 2025
We put together the most interesting conferences on AI, Machine Learning, and Data Science in 2025. And the best part? Some of them are free to attend or publish the content after the event.
Evidently
Evidently 0.4.25: An open-source tool to evaluate, test and monitor your LLM-powered apps
Evidently open-source Python library now supports evaluations for LLM-based applications, including RAGs and chatbots. You can compare, test, and monitor your LLM system quality from development to production.
Evidently
7 new features at Evidently: ranking metrics, data drift on Spark, and more
Did you miss some of the latest updates at Evidently open-source Python library? We summed up a few features we shipped recently in one blog.
Tutorials
Batch inference and ML monitoring with Evidently and Prefect
In this tutorial, you will learn how to run batch ML model inference and deploy a model monitoring dashboard for production ML models using open-source tools.
Community
MLOps courses to take in 2023
Looking for MLOps courses to attend in 2023? We put together five great online MLOps courses for data scientists and ML engineers. They are free to join or publish their content for everyone to access without a fee.
MLOps
An MLOps story: how DeepL monitors ML models in production
How do different companies start and scale their MLOps practices? In this blog, we share a story of how DeepL monitors ML models in production using open-source tools.
Tutorials
How to stop worrying and start monitoring your ML models: a step-by-step guide
A beginner-friendly MLOps tutorial on how to evaluate ML data quality, data drift, model performance in production, and track them all over time using open-source tools.
Evidently
Evidently 0.4: an open-source ML monitoring dashboard to track all your models
Evidently 0.4 is here! Meet a new feature: Evidently user interface for ML monitoring. You can now track how your ML models perform over time and bring all your checks to one central dashboard.
Tutorials
Monitoring unstructured data for LLM and NLP with text descriptors
How do you monitor unstructured text data? In this code tutorial, we’ll explore how to track interpretable text descriptors that help assign specific properties to every text.
Tutorials
A simple way to create ML Model Cards in Python
In this code tutorial, you will learn how to create interactive visual ML model cards to document your models and data using Evidently, an open-source Python library.
Tutorials
ML serving and monitoring with FastAPI and Evidently
In this code tutorial, you will learn how to set up an ML monitoring system for models deployed with FastAPI. This is a complete deployment blueprint for ML serving and monitoring using open-source tools.
ML Monitoring
Shift happens: we compared 5 methods to detect drift in ML embeddings
Monitoring embedding drift is relevant for the production use of LLM and NLP models. We ran experiments to compare 5 drift detection methods. Here is what we found.
Community
AMA with Lina Weichbrodt: ML monitoring, LLMs, and freelance ML engineering
In this blog, we recap the Ask-Me-Anything session with Lina Weichbrodt. We chatted about ML monitoring and debugging, adopting LLMs, and the challenges of being a freelance ML engineer.
Tutorials
Batch ML monitoring blueprint: Evidently, Prefect, PostgreSQL, and Grafana
In this code tutorial, you will learn how to run batch ML model inference, collect data and ML model quality monitoring metrics, and visualize them on a live dashboard.
Tutorials
How to set up ML monitoring with email alerts using Evidently and AWS SES
In this tutorial, you will learn how to implement Evidently checks as part of an ML pipeline and send email notifications based on a defined condition.
MLOps
An MLOps story: how Wayflyer creates ML model cards
How do different companies start and scale their MLOps practices? In this blog, we share a story of how Wayflyer creates ML model cards using open-source tools.
Tutorials
A tutorial on building ML and data monitoring dashboards with Evidently and Streamlit
In this tutorial, you will learn how to create a data quality and ML model monitoring dashboard using the two open-source libraries: Evidently and Streamlit.
Community
AMA with Stefan Krawczyk: from building ML platforms at Stitch Fix to an open-source startup on top of the Hamilton framework
In this blog, we recap the Ask-Me-Anything session with Stefan Krawczyk. We chatted about how to build an ML platform and what data science teams do wrong about ML dataflows.
MLOps
How to build an ML platform? Lessons from 10 tech companies
How to approach building an internal ML platform if you’re not Google? We put together stories from 10 companies that shared their platforms’ design and learnings along the way.
Tutorials
Monitoring NLP models in production: a tutorial on detecting drift in text data
In this tutorial, we will explore issues affecting the performance of NLP models in production, imitate them on an example toy dataset, and show how to monitor and debug them.
Community
AMA with Neal Lathia: data science career tracks, shipping ML models to production, and Monzo ML stack
In this blog, we recap the Ask-Me-Anything session with Neal Lathia. We chatted about career paths of an ML Engineer, building and expanding ML teams, Monzo’s ML stack, and 2023 ML trends.
Evidently
Evidently 0.2.2: Data quality monitoring and drift detection for text data
Meet the new feature: data quality monitoring and drift detection for text data! You can now use the Evidently open-source Python library to evaluate, test, and monitor text data.
Community
50 best machine learning blogs from engineering teams
Want to know how companies with top engineering teams do machine learning? We put together a list of the best machine learning blogs from companies that share specific ML use cases, lessons learned from building ML platforms, and insights into the tech they use.
Community
AMA with Ben Wilson: planning ML projects, AutoML, and deploying at scale
In this blog, we recap the Ask-Me-Anything session with Ben Wilson. We chatted about AutoML use cases, deploying ML models to production, and how one can learn about ML engineering.
Evidently
Meet Evidently 0.2, the open-source ML monitoring tool to continuously check on your models and data
We are thrilled to announce our latest and largest release: Evidently 0.2. In this blog, we give an overview of what Evidently is now.
Evidently
Evidently feature spotlight: NoTargetPerformance test preset
In this series of blogs, we are showcasing specific features of the Evidently open-source ML monitoring library. Meet NoTargetPerformance test preset!
Community
AMA with Rick Lamers: the evolution of data orchestration tools and the perks of open source
In this blog, we recap the Ask-Me-Anything session with Rick Lamers, where we chatted about the evolution of orchestration tools, their place within the MLOps landscape, the future of data pipelines, and building an open-source project amidst the economic crisis.
Community
How to contribute to open source as a Data Scientist, and Hacktoberfest 2022 recap
Now that Hacktoberfest 2022 is over, it’s time to celebrate our contributors, look back at what we’ve achieved together, and share what we’ve learned during this month of giving back to the community through contributing to open source.
Evidently
Evidently 0.1.59: Migrating from Dashboards and JSON profiles to Reports
In Evidently v0.1.59, we moved the existing dashboard functionality to the new API. Here is a quick guide on migrating from the old to the new API. In short, it is very, very easy.
ML Monitoring
ML model maintenance. “Should I throw away the drifting features”?
Imagine you have a machine learning model in production, and some features are very volatile. Their distributions are not stable. What should you do with those? Should you just throw them away?
Community
AMA with Jacopo Tagliabue: reasonable scale ML, testing recommendation systems, and hot DataOps
In this blog, we recap the Ask-Me-Anything session with Jacopo Tagliabue, where we chatted about ML at a reasonable scale, testing RecSys, MLOps anti-patterns, what’s hot in DataOps, fundamentals in MLOps, and more.
Community
AMA with Bo Yu and Sean Sheng: why ML deployment is hard
In this blog, we recap the Ask-Me-Anything session with Bozhao Yu and Sean Sheng, where we chatted about why deploying a model is hard, beginner mistakes and how to avoid them, the challenges of building an open-source product, and BentoML’s roadmap.
ML Monitoring
Pragmatic ML monitoring for your first model. How to prioritize metrics?
There is an overwhelming set of potential metrics to monitor. In this blog, we'll try to introduce a reasonable hierarchy.
Community
AMA with Doris Xin: AutoML, modern data stack, and reunifying the tools
In this blog, we recap Ask-Me-Anything session with Doris Xin, that covered the roles of Data Scientists and Data Engineers in an ML cycle, automation, MLOps tooling, bridging the gap between development and production, and more.
Community
AMA with Fabiana Clemente: synthetic data, data-centric approach, and rookie mistakes to avoid
We recap Ask-Me-Anything session with Fabiana Clemente, which covered synthetic data, its quality, beginner mistakes in data generation, the data-centric approach, and how well companies are doing in getting there.
ML Monitoring
Monitoring ML systems in production. Which metrics should you track?
When one mentions "ML monitoring," this can mean many things. Are you tracking service latency? Model accuracy? Data quality? This blog organizes everything one can look at in a single framework.
Evidently
Evidently 0.1.52: Test-based ML monitoring with smart defaults
Meet the new feature in the Evidently open-source Python library! You can easily integrate data and model checks into your ML pipeline with a clear success/fail result. It comes with presets and defaults to make the configuration painless.
ML Monitoring
Which test is the best? We compared 5 methods to detect data drift on large datasets
We ran an experiment to help build an intuition on how popular drift detection methods behave. In this blog, we share the key takeaways and the code to run the tests on your data.
Community
AMA with Matt Squire: what makes a good open-source tool great, and the future of MLOps
In this blog we recap Ask-Me-Anything session with Matt Squire, that covered MLOps maturity and future, how MLOps fits in data-centric AI, and why open-source wins.
Tutorials
How to set up ML Monitoring with Evidently. A tutorial from CS 329S: Machine Learning Systems Design.
Our CTO Emeli Dral gave a tutorial on how to use Evidently at the Stanford Winter 2022 course CS 329S on Machine Learning System design. Here is the written version of the tutorial and a code example.
Community
AMA with Hamza Tahir: MLOps trends, tools, and building an open-source startup
In this blog we recap Ask-Me-Anything session with Hamza Tahir, that covered MLOps trends and tools, the future of real-time ML, and building an open-source startup.
Community
Evidently Community Call #2 Recap: custom text comments, color schemes and a library of statistical tests
In this blog we recap the second Evidently Community Call that covers the recent feature updates in our open-source ML monitoring tool.
Community
AMA with Alexey Grigorev: MLOps tools, best practices for ML projects, and tips for community builders
In this blog we recap Ask-Me-Anything session with Alexey Grigorev, that covered all things production machine learning, from tools to workflow, and even a bit on community building.
ML Monitoring
Q&A: ML drift that matters. "How to interpret data and prediction drift together?"
Data and prediction drift often need contextual interpretation. In this blog, we walk you through possible scenarios for when you detect these types of drift together or independently.
Evidently
Evidently 0.1.46: Evaluating and monitoring data quality for ML models.
Meet the new Data Quality report in the Evidently open-source Python library! You can use it to explore your dataset and track feature statistics and behavior changes.
Evidently
7 highlights of 2021: A year in review for Evidently AI
We are building an open-source tool to evaluate, monitor, and debug machine learning models in production. Here is a look back at what has happened at Evidently AI in 2021.
Evidently
Evidently 0.1.35: Customize it! Choose the statistical tests, metrics, and plots to evaluate data drift and ML performance.
Now, you can easily customize the pre-built Evidently reports to add your metrics, statistical tests or change the look of the dashboards with a bit of Python code.
ML Monitoring
Q&A: Do I need to monitor data drift if I can measure the ML model quality?
Even if you can calculate the model quality metric, monitoring data and prediction drift can be often useful. Let’s consider a few examples when it makes sense to track the distributions of the model inputs and outputs.
MLOps
"My data drifted. What's next?" How to handle ML model drift in production.
What can you do once you detect data drift for a production ML model? Here is an introductory overview of the possible steps.
Evidently
Evidently 0.1.30: Data drift and model performance evaluation in Google Colab, Kaggle Kernel, and Deepnote
Now, you can use Evidently to display dashboards not only in Jupyter notebook but also in Colab, Kaggle, and Deepnote.
ML Monitoring
Q&A: What is the difference between outlier detection and data drift detection?
When monitoring ML models in production, we can apply different techniques. Data drift and outlier detection are among those. What is the difference? Here is a visual explanation.
Evidently
Real-time ML monitoring: building live dashboards with Evidently and Grafana
You can use Evidently together with Prometheus and Grafana to set up live monitoring dashboards. We created an integration example for Data Drift monitoring. You can easily configure it to use with your existing ML service.
Tutorials
How to detect, evaluate and visualize historical drifts in the data
You can look at historical drift in data to understand how your data changes and choose the monitoring thresholds. Here is an example with Evidently, Plotly, Mlflow, and some Python code.
MLOps
To retrain, or not to retrain? Let's get analytical about ML model updates
Is it time to retrain your machine learning model? Even though data science is all about… data, the answer to this question is surprisingly often based on a gut feeling. Can we do better?
Evidently
Evidently 0.1.17: Meet JSON Profiles, an easy way to integrate Evidently in your prediction pipelines
Now, you can use Evidently to generate JSON profiles. It makes it easy to send metrics and test results elsewhere.
ML Monitoring
Can you build a machine learning model to monitor another model?
Can you train a machine learning model to predict your model’s mistakes? Nothing stops you from trying. But chances are, you are better off without it.
Tutorials
What Is Your Model Hiding? A Tutorial on Evaluating ML Models
There is more to performance than accuracy. In this tutorial, we explore how to evaluate the behavior of a classification model before production use.
Evidently
Evidently 0.1.8: Machine Learning Performance Reports for Classification Models
You can now use Evidently to analyze the performance of classification models in production and explore the errors they make.
Tutorials
How to break a model in 20 days. A tutorial on production model analytics
What can go wrong with ML model in production? Here is a story of how we trained a model, simulated deployment, and analyzed its gradual decay.
Evidently
Evidently 0.1.6: How To Analyze The Performance of Regression Models in Production?
You can now use Evidently to analyze the performance of production ML models and explore their weak spots.
Evidently
Evidently 0.1.4: Analyze Target and Prediction Drift in Machine Learning Models
Our second report is released! Now, you can use Evidently to explore the changes in your target function and model predictions.
Evidently
Introducing Evidently 0.0.1 Release: Open-Source Tool To Analyze Data Drift
We are excited to announce our first release. You can now use Evidently open-source python package to estimate and explore data drift for machine learning models.
ML Monitoring
Machine Learning Monitoring, Part 5: Why You Should Care About Data and Concept Drift
No model lasts forever. While the data quality can be fine, the model itself can start degrading. A few terms are used in this context. Let’s dive in.
ML Monitoring
Machine Learning Monitoring, Part 4: How To Track Data Quality and Data Integrity
A bunch of things can go wrong with the data that goes into a machine learning model. Our goal is to catch them on time.
ML Monitoring
Machine Learning Monitoring, Part 3: What Can Go Wrong With Your Data?
Garbage in is garbage out. Input data is a crucial component of a machine learning system. Whether or not you have immediate feedback, your monitoring starts here.
ML Monitoring
Machine Learning Monitoring, Part 2: Who Should Care, and What We Are Missing
Who should care about machine learning monitoring? The short answer: everyone who cares about the model's impact on business.
ML Monitoring
Machine Learning Monitoring, Part 1: What It Is and How It Differs
Congratulations! Your machine learning model is now live. Many models never make it that far. Some claim, as much as 87% are never deployed.
Product
LLM observability
Evaluate LLM-powered products, from RAGs to AI assistants.
ML observability
Monitor data drift, data quality, and performance for production ML models.
Open-source
Open-source Python library for ML monitoring with 20m+ downloads.
Pricing
Docs
Resources
Blog
Insights on building AI products
LLM benchmarks
100+ LLM benchmarks and datasets
Tutorials
AI observability and MLOps tutorials
ML and LLM system design
500 ML and LLM use cases
Guides
In-depth AI quality and MLOps guides
ML and AI platforms
45+ internal ML and AI platforms
Community
Get support and chat about AI products
Course on LLM evaluations for AI product teams
Sign up now
Get demo
Sign up
GitHub
Get demo
Sign up
Get Started with AI Observability
Book a personalized 1:1 demo with our team or sign up for a free account.
Start free
Get demo
No credit card required
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our
Privacy Policy
for more information.
Deny
Accept
Privacy Preferences
Essential cookies
Required
Marketing cookies
Essential
Personalization cookies
Essential
Analytics cookies
Essential
Reject all cookies
Allow all cookies
Save preferences