If you were to implement an ML monitoring system from scratch, you need to:
- Define and choose the metrics and statistical tests. How exactly do I monitor drift? Is it a KS test? Or an Anderson-Darling? Is it the same one for numerical and categorical features?
- Implement the metrics. If you do it by hand, you might spend quite some time constructing a giant PromQL query for a statistical test. It can be hard to maintain or edit and share between different models.
- Build the monitoring logic and instrument your service. Traditional software monitoring doesn't have to do with things like moving windows or using external baseline references. You'd need to code this logic in a custom way to set up the logging.
In our integration, Evidently abstracts all this monitoring logic and provides the metrics in a Prometheus-friendly format.
You can think of it as a metrics calculation layer that you can easily modify. It provides great defaults due to a pre-built set of metrics and gives a convenient route to include custom ones.
As we are actively developing the tool, this Grafana integration will inherit new Evidently reports, different metrics, and statistical tests that we expect to add in the future.
Using a standard library on the backend makes it easier to maintain, control, and unify your monitoring across different models. It will scale as the number of deployments grows.