kdiagram.datasets.make_multi_model_quantile_data¶

kdiagram.datasets.make_multi_model_quantile_data(n_samples=100, n_models=3, quantiles=None, prefix='pred', model_names=None, true_mean=50.0, true_std=10.0, bias_range=(-2.0, 2.0), width_range=(5.0, 15.0), noise_level=1.0, seed=202, as_frame=False)[source]¶

Generate multi-model quantile forecast data for a single horizon.

Simulates a target variable \(y_{\text{true}}\) and quantile predictions (e.g., Q10/Q50/Q90) from several models for the same forecast time. Each model can have its own systematic bias and characteristic interval width, enabling reproducible examples for coverage/calibration and cross-model comparisons [1][2].

Parameters:

n_samplesint, default=100: Number of rows (independent samples/locations).
n_modelsint, default=3: Number of simulated models providing quantile forecasts.
quantileslist of float, default=[0.1, 0.5, 0.9]: Quantile levels in (0, 1) to generate for each model. Must include 0.5 (the median). The list is de-duplicated and sorted internally.
prefixstr, default=’pred’: Base prefix for prediction columns. Final names follow {prefix}_{model_name}_q{quantile}.
model_nameslist of str, optional: Custom model names of length n_models. If None, 'Model_A', 'Model_B', … are generated.
true_meanfloat, default=50.0: Mean of the Normal distribution used to draw y_true.
true_stdfloat, default=10.0: Standard deviation of the Normal distribution for y_true.
bias_rangetuple of (float, float), default=(-2.0, 2.0): Uniform range from which a model-specific bias for Q50 is sampled and added to y_true.
width_rangetuple of (float, float), default=(5.0, 15.0): Uniform range for the target overall interval width (e.g., Q90–Q10) of each model.
noise_levelfloat, default=1.0: Standard deviation of independent Gaussian noise added to each generated quantile series.
seedint or None, default=202: NumPy RNG seed (default_rng). If None, a fresh RNG is used.
as_framebool, default=False: If False, return a Bunch with arrays/metadata; if True, return only the pandas DataFrame.

Returns:

dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

frame : pandas DataFrame of shape (n_samples, 3 + n_models * n_quantiles) containing 'y_true', two auxiliary features, and all quantile columns.
data : ndarray with numeric feature + prediction columns.
feature_names : ['feature_1', 'feature_2'].
target_names : ['y_true'].
target : ndarray of y_true values.
model_names : list of model labels.
quantile_levels : sorted list of unique quantiles.
prediction_columns : dict mapping each model name to its list of quantile column names.
prefix : the column prefix.
DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:

ValueError: If 0.5 is not in quantiles, if name/range lengths are inconsistent, or if ranges are invalid.
TypeError: If non-numeric inputs prevent computation.

Parameters:

n_samples (int)
n_models (int)
quantiles (list[float] | None)
prefix (str)
model_names (list[str] | None)
true_mean (float)
true_std (float)
bias_range (tuple[float, float])
width_range (tuple[float, float])
noise_level (float)
seed (int | None)
as_frame (bool)

Return type:

Bunch | DataFrame

See also

make_uncertainty_data: Temporal multi-period quantiles with drift/consistency controls.
make_taylor_data: Synthetic data tailored for Taylor diagram evaluation.
kdiagram.plot.uncertainty.plot_coverage: Aggregate empirical coverage vs nominal.
kdiagram.plot.uncertainty.plot_temporal_uncertainty: General polar visualization for multiple series.

Notes

Generation model. Draw the truth as \(y_{\text{true}} \sim \mathcal{N}(\mu, \sigma^2)\) with mu=true_mean and sigma=true_std. For model \(m\), let \(b^{(m)}\) be a sampled bias and \(W^{(m)}\) a sampled overall width (e.g., Q90–Q10). The median prediction (Q50) is

(1)¶\[q_{0.5}^{(m)} \;=\; y_{\text{true}} \;+\; b^{(m)} \;+\; \varepsilon^{(m)}, \qquad \varepsilon^{(m)} \sim \mathcal{N}(0, \sigma_\varepsilon^2),\]

with sigma_ε = noise_level. Other quantiles are created by adding offsets proportional to their distance from the median and scaled so that the extreme quantiles span approximately \(W^{(m)}\); small independent noise is then added. Finally, for each row we sort the model’s quantile values to enforce \(q_{\alpha} \le q_{0.5} \le q_{\beta}\) (e.g., Q10 ≤ Q50 ≤ Q90), which is useful for coverage and calibration diagnostics [1][2].

Two auxiliary columns (feature_1, feature_2) are included for convenience in examples; they do not influence the simulated target or quantiles.

References

Examples

>>> # As a Bunch with metadata:
>>>
>>> from kdiagram.datasets import make_multi_model_quantile_data
>>> ds = make_multi_model_quantile_data(n_samples=50, n_models=2, seed=1)
>>> ds.model_names
['Model_A', 'Model_B']
>>> sorted(ds.quantile_levels)
[0.1, 0.5, 0.9]
>>> ds.prediction_columns['Model_A'][:3]  
['pred_Model_A_q0.1', 'pred_Model_A_q0.5', 'pred_Model_A_q0.9']
>>>
>>> # As a DataFrame:
>>>
>>> df = make_multi_model_quantile_data(as_frame=True, seed=2)
>>> set(['y_true','feature_1','feature_2']).issubset(df.columns)
True