kdiagram.utils.calculate_probabilistic_scores¶
- kdiagram.utils.calculate_probabilistic_scores(y_true, y_preds_quantiles, quantiles)[source]¶
Calculates probabilistic scores for each observation.
Computes the Probability Integral Transform (PIT), sharpness (interval width), and Continuous Ranked Probability Score (CRPS) for each forecast-observation pair. This utility provides a per-observation breakdown of key probabilistic metrics.
- Parameters:
- y_true
np.ndarray 1D array of observed (true) values.
- y_preds_quantiles
np.ndarray 2D array of quantile forecasts. Each row corresponds to an observation in
y_true, and each column is a specific quantile forecast.- quantiles
np.ndarray 1D array of the quantile levels corresponding to the columns of
y_preds_quantiles.
- y_true
- Returns:
pd.DataFrameA DataFrame with columns ‘pit_value’, ‘sharpness’, and ‘crps’, where each row corresponds to an observation.
- Parameters:
- Return type:
DataFrame
See also
compute_pitCalculate only the PIT values.
compute_crpsCalculate only the average CRPS score.
compute_winkler_scoreScore a single prediction interval.
Notes
This function calculates three fundamental scores for assessing the quality of a probabilistic forecast, which is judged by the joint properties of calibration and sharpness [1].
Probability Integral Transform (PIT): This score assesses calibration. For each observation \(y_i\), the PIT is approximated as the fraction of forecast quantiles less than or equal to the observation.
(1)¶\[\text{PIT}_i = \frac{1}{M} \sum_{j=1}^{M} \mathbf{1}\{q_{i,j} \le y_i\}\]Sharpness: This score assesses precision. It is the width of the prediction interval between the lowest (\(q_{min}\)) and highest (\(q_{max}\)) provided quantiles for each observation \(i\).
(2)¶\[\text{Sharpness}_i = y_{i, q_{max}} - y_{i, q_{min}}\]Continuous Ranked Probability Score (CRPS): This is an overall score that rewards both calibration and sharpness. It is approximated as the average of the Pinball Loss across all \(M\) quantiles for each observation \(i\).
(3)¶\[\text{CRPS}_i \approx \frac{1}{M} \sum_{j=1}^{M} 2 \mathcal{L}_{\tau_j}(q_{i,j}, y_i)\]
References
Examples
>>> import numpy as np >>> from scipy.stats import norm >>> from kdiagram.utils.forecast_utils import calculate_probabilistic_scores >>> >>> # Generate synthetic data >>> np.random.seed(42) >>> n_samples = 5 >>> y_true = np.random.normal(loc=10, scale=2, size=n_samples) >>> quantiles = np.array([0.1, 0.5, 0.9]) >>> y_preds = norm.ppf( ... quantiles, loc=y_true[:, np.newaxis], scale=1.5 ... ) >>> >>> # Calculate the scores >>> scores_df = calculate_probabilistic_scores( ... y_true, y_preds, quantiles ... ) >>> print(scores_df) pit_value sharpness crps 0 0.666667 3.844655 0.865381 1 0.333333 3.844655 0.892013 2 0.666667 3.844655 1.269438 3 0.666667 3.844655 0.472782 4 0.333333 3.844655 1.171358