⚠️ AI Risk 101: How to test your AI systems before users do. Join the webinar on April 22, 2025. Register now

LEARN

AI quality and MLOps guides

Learn about AI observability, ML and LLM evaluation and MLOps with our in-depth guides.

course

LLM evaluations for AI product teams

Building an LLM-powered product? Sign up for our free course on LLM evaluations for AI product teams. A gentle introduction to evaluating LLM-powered apps, no coding knowledge required.

LLM GUIDE

A complete guide to LLM quality

How to evaluate the quality of generative outputs and LLM-based systems? In this guide, we break down different evaluation methods and metrics for AI-powered products.

course

Open-source ML observability course

Want to learn production ML monitoring? Sign up for our Open-source ML observability course for data scientists and ML engineers. It's free!

Start now

MACHINE LEARNING IN PRODUCTION

A complete guide to ML in production

How to maintain ML models once after you deploy them, and what exactly to prepare for? In this guide, we look into the key concepts that relate to production ML model operations.

Model monitoring

What metrics to use and how to design a model monitoring strategy.

Concept drift

What concept drift is, why it matters, and how to detect and react to it.

Data drift

What data drift is, why it matters, and how it differs from similar concepts.

Explore topics

ML Monitoring metrics

ML monitoring metrics in production

"ML monitoring" can mean many things. Are you tracking service latency? Model accuracy? Data quality? This guide organizes everything one can look at in a single framework.

See the guide

Ranking and recommendations metrics in ml

A complete guide to ranking quality metrics

How do you evaluate the quality of ranking and recommender systems? In this guide, we look into the key ranking metrics and explain them step by step.

Metrics overview

10 metrics to evaluate recommender and ranking systems.

Precision and Recall at K

How to use Precision and Recall at K to assess the performance of ranking and recommender systems.

Mean Average Precision (MAP)

How to compute MAP to evaluate the quality of ranking and recommender systems.

Normalized Discounted Cumulative Gain (NDCG)

How to use NDCG at K to measure the ranking quality for both binary and graded relevance scores.

Mean Reciprocal Rank (MRR)

How to use MRR at K to assess the ranking quality.

Explore topics

DATA Drift detection

5 methods to detect data drift on large datasets

We ran an experiment to help build an intuition on how popular drift detection methods behave. In this guide, we share the key takeaways and the code to run the tests on your data.

See the guide

5 methods to detect drift in ML embeddings

Monitoring embedding drift is relevant for the production use of LLM and NLP models. We ran experiments to compare 5 drift detection methods. Here is what we found.

See the guide

Classification metrics in ml

A complete guide to classification metrics in ML

How to evaluate the quality of a classification model? In this guide, we break down different machine learning metrics for binary and multi-class problems.

Start testing your AI systems today

Book a personalized 1:1 demo with our team or sign up for a free account.

Get demo

No credit card required

LEARN

AI quality and MLOps guides

course

LLM evaluations for AI product teams

LLM GUIDE

A complete guide to LLM quality

Intro to LLM evals

LLM evaluation metrics

Test datasets

LLM-as-a-judge

LLM benchmarks

course

Open-source ML observability course

MACHINE LEARNING IN PRODUCTION

A complete guide to ML in production

Model monitoring

Concept drift

Data drift

ML Monitoring metrics

ML monitoring metrics in production

Ranking and recommendations metrics in ml

A complete guide to ranking quality metrics

Metrics overview

Precision and Recall at K

Mean Average Precision (MAP)

Normalized Discounted Cumulative Gain (NDCG)

Mean Reciprocal Rank (MRR)

DATA Drift detection

5 methods to detect data drift on large datasets

5 methods to detect drift in ML embeddings

Classification metrics in ml

A complete guide to classification metrics in ML

Confusion Matrix

Accuracy, Precision, Recall

Multi-class Precision and Recall

Classification Threshold

ROC AUC score

Start testing your AI systems today