kdiagram.datasets.make_multi_model_quantile_data

kdiagram.datasets.make_multi_model_quantile_data(n_samples=100, n_models=3, quantiles=None, prefix='pred', model_names=None, true_mean=50.0, true_std=10.0, bias_range=(-2.0, 2.0), width_range=(5.0, 15.0), noise_level=1.0, seed=202, as_frame=False)[source]

Generate multi-model quantile forecast data for a single horizon.

Simulates a target variable \(y_{\text{true}}\) and quantile predictions (e.g., Q10/Q50/Q90) from several models for the same forecast time. Each model can have its own systematic bias and characteristic interval width, enabling reproducible examples for coverage/calibration and cross-model comparisons [1][2].

Parameters:
n_samplesint, default=100

Number of rows (independent samples/locations).

n_modelsint, default=3

Number of simulated models providing quantile forecasts.

quantileslist of float, default=[0.1, 0.5, 0.9]

Quantile levels in (0, 1) to generate for each model. Must include 0.5 (the median). The list is de-duplicated and sorted internally.

prefixstr, default=’pred’

Base prefix for prediction columns. Final names follow {prefix}_{model_name}_q{quantile}.

model_nameslist of str, optional

Custom model names of length n_models. If None, 'Model_A', 'Model_B', … are generated.

true_meanfloat, default=50.0

Mean of the Normal distribution used to draw y_true.

true_stdfloat, default=10.0

Standard deviation of the Normal distribution for y_true.

bias_rangetuple of (float, float), default=(-2.0, 2.0)

Uniform range from which a model-specific bias for Q50 is sampled and added to y_true.

width_rangetuple of (float, float), default=(5.0, 15.0)

Uniform range for the target overall interval width (e.g., Q90–Q10) of each model.

noise_levelfloat, default=1.0

Standard deviation of independent Gaussian noise added to each generated quantile series.

seedint or None, default=202

NumPy RNG seed (default_rng). If None, a fresh RNG is used.

as_framebool, default=False

If False, return a Bunch with arrays/metadata; if True, return only the pandas DataFrame.

Returns:
dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

  • frame : pandas DataFrame of shape (n_samples, 3 + n_models * n_quantiles) containing 'y_true', two auxiliary features, and all quantile columns.

  • data : ndarray with numeric feature + prediction columns.

  • feature_names : ['feature_1', 'feature_2'].

  • target_names : ['y_true'].

  • target : ndarray of y_true values.

  • model_names : list of model labels.

  • quantile_levels : sorted list of unique quantiles.

  • prediction_columns : dict mapping each model name to its list of quantile column names.

  • prefix : the column prefix.

  • DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:
ValueError

If 0.5 is not in quantiles, if name/range lengths are inconsistent, or if ranges are invalid.

TypeError

If non-numeric inputs prevent computation.

Parameters:
Return type:

Bunch | DataFrame

See also

make_uncertainty_data

Temporal multi-period quantiles with drift/consistency controls.

make_taylor_data

Synthetic data tailored for Taylor diagram evaluation.

kdiagram.plot.uncertainty.plot_coverage

Aggregate empirical coverage vs nominal.

kdiagram.plot.uncertainty.plot_temporal_uncertainty

General polar visualization for multiple series.

Notes

Generation model. Draw the truth as \(y_{\text{true}} \sim \mathcal{N}(\mu, \sigma^2)\) with mu=true_mean and sigma=true_std. For model \(m\), let \(b^{(m)}\) be a sampled bias and \(W^{(m)}\) a sampled overall width (e.g., Q90–Q10). The median prediction (Q50) is

(1)\[q_{0.5}^{(m)} \;=\; y_{\text{true}} \;+\; b^{(m)} \;+\; \varepsilon^{(m)}, \qquad \varepsilon^{(m)} \sim \mathcal{N}(0, \sigma_\varepsilon^2),\]

with sigma_ε = noise_level. Other quantiles are created by adding offsets proportional to their distance from the median and scaled so that the extreme quantiles span approximately \(W^{(m)}\); small independent noise is then added. Finally, for each row we sort the model’s quantile values to enforce \(q_{\alpha} \le q_{0.5} \le q_{\beta}\) (e.g., Q10 ≤ Q50 ≤ Q90), which is useful for coverage and calibration diagnostics [1][2].

Two auxiliary columns (feature_1, feature_2) are included for convenience in examples; they do not influence the simulated target or quantiles.

References

Examples

>>> # As a Bunch with metadata:
>>>
>>> from kdiagram.datasets import make_multi_model_quantile_data
>>> ds = make_multi_model_quantile_data(n_samples=50, n_models=2, seed=1)
>>> ds.model_names
['Model_A', 'Model_B']
>>> sorted(ds.quantile_levels)
[0.1, 0.5, 0.9]
>>> ds.prediction_columns['Model_A'][:3]  
['pred_Model_A_q0.1', 'pred_Model_A_q0.5', 'pred_Model_A_q0.9']
>>>
>>> # As a DataFrame:
>>>
>>> df = make_multi_model_quantile_data(as_frame=True, seed=2)
>>> set(['y_true','feature_1','feature_2']).issubset(df.columns)
True