kdiagram.utils.build_cdf_interpolator¶

kdiagram.utils.build_cdf_interpolator(y_preds_quantiles, quantiles)[source]¶

Builds an interpolator to act as a Cumulative Distribution Function.

This function takes a set of quantile forecasts and returns a callable function that linearly interpolates between them. This effectively creates an empirical, continuous Cumulative Distribution Function (CDF) for each individual forecast, which is a foundational tool for probabilistic analysis.

Parameters:

y_preds_quantilesnp.ndarray: 2D array of quantile forecasts, with shape (n_samples, n_quantiles). Each row represents a complete probabilistic forecast for a single observation.
quantilesnp.ndarray: 1D array of the quantile levels corresponding to the columns of the prediction array (e.g., [0.05, 0.1, ..., 0.95]).

Returns:

Callable[[np.ndarray], np.ndarray]: A function that takes a 1D array of observed values (y_true) and returns the corresponding PIT values, which are the CDF evaluated at each of those points.

Raises:

ValueError: If the number of y_true values passed to the returned interpolator does not match the number of forecast distributions it was built with.

Parameters:

y_preds_quantiles (ndarray)
quantiles (ndarray)

Return type:

Callable[[ndarray], ndarray]

See also

compute_pit: A simplified utility that uses this logic directly.
scipy.interpolate.interp1d: The underlying concept for interpolation.

Notes

The Probability Integral Transform (PIT) is a key concept in probabilistic forecast evaluation [1]. For a continuous predictive CDF \(F\), the PIT of an observation \(y\) is \(F(y)\). This utility constructs an empirical approximation of \(F\) for each forecast.

The function works by creating a closure: the returned _interpolator function “remembers” the quantile forecasts it was built with. For each observation \(y_i\), it performs a linear interpolation using the corresponding forecast quantiles \(\mathbf{q}_i = (q_{i,1}, ..., q_{i,M})\) as the x-coordinates and the quantile levels \(\mathbf{\tau} = (\tau_1, ..., \tau_M)\) as the y-coordinates. This allows you to estimate the cumulative probability for any value of \(y_i\).

References

Examples

>>> import numpy as np
>>> from kdiagram.utils.mathext import build_cdf_interpolator
>>>
>>> # Forecasts for 3 observations at 3 quantiles (0.1, 0.5, 0.9)
>>> preds_quantiles = np.array([
...     [8, 10, 12],
...     [0, 1, 2],
...     [4, 5, 6]
... ])
>>> quantiles = np.array([0.1, 0.5, 0.9])
>>>
>>> # Build the interpolator
>>> cdf_func = build_cdf_interpolator(preds_quantiles, quantiles)
>>>
>>> # Now, use the interpolator to find the PIT for 3 observations
>>> y_true = np.array([10.0, 0.5, 5.5])
>>> pit_values = cdf_func(y_true)
>>> print(pit_values)
[0.5 0.3 0.7]