kdiagram.utils.build_cdf_interpolator¶
- kdiagram.utils.build_cdf_interpolator(y_preds_quantiles, quantiles)[source]¶
Builds an interpolator to act as a Cumulative Distribution Function.
This function takes a set of quantile forecasts and returns a callable function that linearly interpolates between them. This effectively creates an empirical, continuous Cumulative Distribution Function (CDF) for each individual forecast, which is a foundational tool for probabilistic analysis.
- Parameters:
- y_preds_quantiles
np.ndarray 2D array of quantile forecasts, with shape
(n_samples, n_quantiles). Each row represents a complete probabilistic forecast for a single observation.- quantiles
np.ndarray 1D array of the quantile levels corresponding to the columns of the prediction array (e.g.,
[0.05, 0.1, ..., 0.95]).
- y_preds_quantiles
- Returns:
Callable[[np.ndarray],np.ndarray]A function that takes a 1D array of observed values (
y_true) and returns the corresponding PIT values, which are the CDF evaluated at each of those points.
- Raises:
ValueErrorIf the number of y_true values passed to the returned interpolator does not match the number of forecast distributions it was built with.
- Parameters:
- Return type:
See also
compute_pitA simplified utility that uses this logic directly.
scipy.interpolate.interp1dThe underlying concept for interpolation.
Notes
The Probability Integral Transform (PIT) is a key concept in probabilistic forecast evaluation [1]. For a continuous predictive CDF \(F\), the PIT of an observation \(y\) is \(F(y)\). This utility constructs an empirical approximation of \(F\) for each forecast.
The function works by creating a closure: the returned
_interpolatorfunction “remembers” the quantile forecasts it was built with. For each observation \(y_i\), it performs a linear interpolation using the corresponding forecast quantiles \(\mathbf{q}_i = (q_{i,1}, ..., q_{i,M})\) as the x-coordinates and the quantile levels \(\mathbf{\tau} = (\tau_1, ..., \tau_M)\) as the y-coordinates. This allows you to estimate the cumulative probability for any value of \(y_i\).References
Examples
>>> import numpy as np >>> from kdiagram.utils.mathext import build_cdf_interpolator >>> >>> # Forecasts for 3 observations at 3 quantiles (0.1, 0.5, 0.9) >>> preds_quantiles = np.array([ ... [8, 10, 12], ... [0, 1, 2], ... [4, 5, 6] ... ]) >>> quantiles = np.array([0.1, 0.5, 0.9]) >>> >>> # Build the interpolator >>> cdf_func = build_cdf_interpolator(preds_quantiles, quantiles) >>> >>> # Now, use the interpolator to find the PIT for 3 observations >>> y_true = np.array([10.0, 0.5, 5.5]) >>> pit_values = cdf_func(y_true) >>> print(pit_values) [0.5 0.3 0.7]