Wasserstein distance (WD) is applied only for numerical features. By default, it shows the absolute value of data drift. Roughly speaking, it measures how much effort it takes to turn one distribution into another. Here is an explanation of the intuition behind WD: if drift happens in one direction (e.g., all values increase), the absolute value of the WD metric often equals the difference of means.
Even if the changes happen in both directions (some values increase, some values decrease), the WD metric will sum them up to reflect the change. Had we used the difference of means, these changes would "cancel" each other. This makes WD a more informative metric.
However, if you have two different features—say "grams" and "years"—you'll need to interpret each distance separately. Imagine having a hundred features instead of two. That doesn't look very practical, right?
One solution is to turn the absolute values into relative. Let's do so by dividing the absolute metric by standard deviation. The normed WD metric shows the number of standard deviations, on average, you should move each object of the current group to match the reference group.
This normed WD metric is pretty interpretable. When setting the drift detection threshold, you can rely on the intuition of what a "standard deviation" is. In a simplified way, when you set the WD threshold to 0.1, you define that the change in the size of "0.1 standard deviations" is something you want to notice.
Now back to our experiment. The normed WD metric returns
a value from 0 to infinity, making the degree of drift comparable between features.
Once again, we consider the 0.1 value as drift.