kdiagram.datasets.load_uncertainty_data

kdiagram.datasets.load_uncertainty_data(*, as_frame=False, n_samples=150, n_periods=4, anomaly_frac=0.15, start_year=2022, prefix='value', base_value=10.0, trend_strength=1.5, noise_level=2.0, interval_width_base=4.0, interval_width_noise=1.5, interval_width_trend=0.5, seed=42)[source]

Generate a synthetic dataset for uncertainty diagnostics.

Creates a compact, controllable dataset for demonstrating k-diagram plots: one period of actuals, multi-period predicted quantiles (Q10, Q50, Q90), configurable trends and noise, injected interval failures (anomalies), and optional coordinates. Useful for examples, unit tests, and performance checks [1].

Parameters:
as_framebool, default=False

If False, return a Bunch with the generated frame and metadata; if True, return only the pandas DataFrame.

n_samplesint, default=150

Number of rows (locations) to generate.

n_periodsint, default=4

Number of consecutive periods (e.g., years) for which quantiles are generated.

anomaly_fracfloat, default=0.15

Approximate fraction in [0, 1] where the actual value lies outside the first period’s Q10–Q90 interval.

start_yearint, default=2022

Starting year used when naming time-dependent columns.

prefixstr, default=’value’

Base prefix for value and quantile column names.

base_valuefloat, default=10.0

Approximate mean of the signal in the first period.

trend_strengthfloat, default=1.5

Linear trend added to the Q50 trajectory across periods.

noise_levelfloat, default=2.0

Standard deviation of base random noise added to values.

interval_width_basefloat, default=4.0

Base width of the Q10–Q90 interval in the first period.

interval_width_noisefloat, default=1.5

Random variability added to the interval width per sample/period.

interval_width_trendfloat, default=0.5

Linear trend added to the interval width across periods.

seedint or None, default=42

Random seed for reproducibility. If None, use an unconstrained RNG state.

Returns:
dataBunch or pandas.DataFrame

If as_frame=False (default) a Bunch with:

  • frame : pandas DataFrame of synthesized values.

  • feature_names : included feature columns (e.g., coords).

  • target_names : names of actual/target columns.

  • target : NumPy array of target values (if present).

  • quantile_cols : dict mapping 'q0.1', 'q0.5', 'q0.9' to lists of columns across periods.

  • q10_cols, q50_cols, q90_cols : convenience lists.

  • n_periods, prefix, start_year, and DESCR.

If as_frame=True, return only the DataFrame.

Parameters:
Return type:

Bunch | DataFrame

Notes

The generator injects a user-controlled fraction of interval failures to create meaningful examples for coverage and anomaly diagnostics. Use this dataset to exercise plots such as coverage rates, point-wise coverage diagnostics, anomaly magnitude, temporal consistency, and drift views [2][3][1].

References

Examples

>>> # Create a small dataset and explore quantile columns:
>>>
>>> from kdiagram.datasets import load_uncertainty_data
>>> ds = load_uncertainty_data(n_samples=10, n_periods=3, seed=0)
>>> sorted(ds.quantile_cols.keys())
['q0.1', 'q0.5', 'q0.9']
>>>
>>> # Return a ``DataFrame`` only:
>>>
>>> df = load_uncertainty_data(as_frame=True, n_samples=5, seed=1)
>>> df.shape[0] == 5
True