kdiagram.datasets.make_multi_model_quantile_data¶
- kdiagram.datasets.make_multi_model_quantile_data(n_samples=100, n_models=3, quantiles=None, prefix='pred', model_names=None, true_mean=50.0, true_std=10.0, bias_range=(-2.0, 2.0), width_range=(5.0, 15.0), noise_level=1.0, seed=202, as_frame=False)[source]¶
Generate multi-model quantile forecast data for a single horizon.
Simulates a target variable \(y_{\text{true}}\) and quantile predictions (e.g., Q10/Q50/Q90) from several models for the same forecast time. Each model can have its own systematic bias and characteristic interval width, enabling reproducible examples for coverage/calibration and cross-model comparisons [1][2].
- Parameters:
- n_samples
int, default=100 Number of rows (independent samples/locations).
- n_models
int, default=3 Number of simulated models providing quantile forecasts.
- quantiles
listoffloat, default=[0.1, 0.5, 0.9] Quantile levels in
(0, 1)to generate for each model. Must include0.5(the median). The list is de-duplicated and sorted internally.- prefix
str, default=’pred’ Base prefix for prediction columns. Final names follow
{prefix}_{model_name}_q{quantile}.- model_names
listofstr,optional Custom model names of length
n_models. IfNone,'Model_A','Model_B', … are generated.- true_mean
float, default=50.0 Mean of the Normal distribution used to draw
y_true.- true_std
float, default=10.0 Standard deviation of the Normal distribution for
y_true.- bias_range
tupleof(float,float), default=(-2.0, 2.0) Uniform range from which a model-specific bias for Q50 is sampled and added to
y_true.- width_range
tupleof(float,float), default=(5.0, 15.0) Uniform range for the target overall interval width (e.g., Q90–Q10) of each model.
- noise_level
float, default=1.0 Standard deviation of independent Gaussian noise added to each generated quantile series.
- seed
intorNone, default=202 NumPy RNG seed (
default_rng). IfNone, a fresh RNG is used.- as_framebool, default=False
If
False, return aBunchwith arrays/metadata; ifTrue, return only the pandasDataFrame.
- n_samples
- Returns:
- data
Bunchorpandas.DataFrame If
as_frame=False(default), a Bunch with:frame: pandasDataFrameof shape(n_samples, 3 + n_models * n_quantiles)containing'y_true', two auxiliary features, and all quantile columns.data:ndarraywith numeric feature + prediction columns.feature_names:['feature_1', 'feature_2'].target_names:['y_true'].target:ndarrayofy_truevalues.model_names: list of model labels.quantile_levels: sorted list of unique quantiles.prediction_columns: dict mapping each model name to its list of quantile column names.prefix: the column prefix.DESCR: human-readable description.
If
as_frame=True, only the pandasDataFrameis returned.
- data
- Raises:
ValueErrorIf
0.5is not inquantiles, if name/range lengths are inconsistent, or if ranges are invalid.TypeErrorIf non-numeric inputs prevent computation.
- Parameters:
- Return type:
Bunch | DataFrame
See also
make_uncertainty_dataTemporal multi-period quantiles with drift/consistency controls.
make_taylor_dataSynthetic data tailored for Taylor diagram evaluation.
kdiagram.plot.uncertainty.plot_coverageAggregate empirical coverage vs nominal.
kdiagram.plot.uncertainty.plot_temporal_uncertaintyGeneral polar visualization for multiple series.
Notes
Generation model. Draw the truth as \(y_{\text{true}} \sim \mathcal{N}(\mu, \sigma^2)\) with
mu=true_meanandsigma=true_std. For model \(m\), let \(b^{(m)}\) be a sampled bias and \(W^{(m)}\) a sampled overall width (e.g., Q90–Q10). The median prediction (Q50) is(1)¶\[q_{0.5}^{(m)} \;=\; y_{\text{true}} \;+\; b^{(m)} \;+\; \varepsilon^{(m)}, \qquad \varepsilon^{(m)} \sim \mathcal{N}(0, \sigma_\varepsilon^2),\]with
sigma_ε = noise_level. Other quantiles are created by adding offsets proportional to their distance from the median and scaled so that the extreme quantiles span approximately \(W^{(m)}\); small independent noise is then added. Finally, for each row we sort the model’s quantile values to enforce \(q_{\alpha} \le q_{0.5} \le q_{\beta}\) (e.g., Q10 ≤ Q50 ≤ Q90), which is useful for coverage and calibration diagnostics [1][2].Two auxiliary columns (
feature_1,feature_2) are included for convenience in examples; they do not influence the simulated target or quantiles.References
Examples
>>> # As a Bunch with metadata: >>> >>> from kdiagram.datasets import make_multi_model_quantile_data >>> ds = make_multi_model_quantile_data(n_samples=50, n_models=2, seed=1) >>> ds.model_names ['Model_A', 'Model_B'] >>> sorted(ds.quantile_levels) [0.1, 0.5, 0.9] >>> ds.prediction_columns['Model_A'][:3] ['pred_Model_A_q0.1', 'pred_Model_A_q0.5', 'pred_Model_A_q0.9'] >>> >>> # As a DataFrame: >>> >>> df = make_multi_model_quantile_data(as_frame=True, seed=2) >>> set(['y_true','feature_1','feature_2']).issubset(df.columns) True