kdiagram.utils.compute_winkler_score

kdiagram.utils.compute_winkler_score(y_true, y_pred_lower, y_pred_upper, alpha=0.1)[source]

Computes the Winkler score for a given prediction interval.

The Winkler score is a proper scoring rule that evaluates a prediction interval by combining its width (sharpness) with a penalty for observations that fall outside the interval. A lower score indicates a better forecast.

Parameters:
y_truenp.ndarray

1D array of the true observed values.

y_pred_lowernp.ndarray

1D array of the lower bound of the prediction interval.

y_pred_uppernp.ndarray

1D array of the upper bound of the prediction interval.

alphafloat, default=0.1

The significance level for the prediction interval. For example, alpha=0.1 corresponds to a (1-0.1)*100 = 90% prediction interval.

Returns:
float

The average Winkler score over all observations.

Parameters:
Return type:

float

See also

compute_coverage_score

A metric that only assesses coverage.

compute_interval_width

A metric that only assesses sharpness.

Notes

The Winkler score [1] is designed to evaluate both the sharpness and calibration of a prediction interval simultaneously. The score for a single observation \(y\) and a \((1-\alpha)\) prediction interval \([l, u]\) is defined as:

(1)\[\begin{split}S_{\alpha}(l, u, y) = (u - l) + \begin{cases} \frac{2}{\alpha}(l - y) & \text{if } y < l \\ 0 & \text{if } l \le y \le u \\ \frac{2}{\alpha}(y - u) & \text{if } y > u \end{cases}\end{split}\]

The first term, \((u - l)\), is the interval width, which rewards sharpness (narrower intervals). The second term is a penalty that is applied only if the observation falls outside the interval. The penalty increases as the observation gets further from the violated bound. This function returns the average of this score over all observations.

References

Examples

>>> import numpy as np
>>> from kdiagram.utils.mathext import compute_winkler_score
>>>
>>> y_true = np.array([1, 5, 12])
>>> y_lower = np.array([2, 4, 8])
>>> y_upper = np.array([8, 6, 10])
>>>
>>> # For a 90% interval (alpha=0.1)
>>> # Obs 1 (y=1): outside. Width=6. Penalty=(2/0.1)*(2-1)=20. Score=26.
>>> # Obs 2 (y=5): inside. Width=2. Penalty=0. Score=2.
>>> # Obs 3 (y=12): outside. Width=2. Penalty=(2/0.1)*(12-10)=40. Score=42.
>>> # Average = (26 + 2 + 42) / 3 = 23.33
>>>
>>> score = compute_winkler_score(
...     y_true, y_lower, y_upper, alpha=0.1
... )
>>> print(f"Average Winkler Score: {score:.2f}")
Expected Output
Average Winkler Score: 23.33