kdiagram.datasets.load_uncertainty_data¶

kdiagram.datasets.load_uncertainty_data(*, as_frame=False, n_samples=150, n_periods=4, anomaly_frac=0.15, start_year=2022, prefix='value', base_value=10.0, trend_strength=1.5, noise_level=2.0, interval_width_base=4.0, interval_width_noise=1.5, interval_width_trend=0.5, seed=42)[source]¶

Generate a synthetic dataset for uncertainty diagnostics.

Creates a compact, controllable dataset for demonstrating k-diagram plots: one period of actuals, multi-period predicted quantiles (Q10, Q50, Q90), configurable trends and noise, injected interval failures (anomalies), and optional coordinates. Useful for examples, unit tests, and performance checks [1].

Parameters:

as_framebool, default=False: If False, return a Bunch with the generated frame and metadata; if True, return only the pandas DataFrame.
n_samplesint, default=150: Number of rows (locations) to generate.
n_periodsint, default=4: Number of consecutive periods (e.g., years) for which quantiles are generated.
anomaly_fracfloat, default=0.15: Approximate fraction in [0, 1] where the actual value lies outside the first period’s Q10–Q90 interval.
start_yearint, default=2022: Starting year used when naming time-dependent columns.
prefixstr, default=’value’: Base prefix for value and quantile column names.
base_valuefloat, default=10.0: Approximate mean of the signal in the first period.
trend_strengthfloat, default=1.5: Linear trend added to the Q50 trajectory across periods.
noise_levelfloat, default=2.0: Standard deviation of base random noise added to values.
interval_width_basefloat, default=4.0: Base width of the Q10–Q90 interval in the first period.
interval_width_noisefloat, default=1.5: Random variability added to the interval width per sample/period.
interval_width_trendfloat, default=0.5: Linear trend added to the interval width across periods.
seedint or None, default=42: Random seed for reproducibility. If None, use an unconstrained RNG state.

Returns:

dataBunch or pandas.DataFrame

If as_frame=False (default) a Bunch with:

frame : pandas DataFrame of synthesized values.
feature_names : included feature columns (e.g., coords).
target_names : names of actual/target columns.
target : NumPy array of target values (if present).
quantile_cols : dict mapping 'q0.1', 'q0.5', 'q0.9' to lists of columns across periods.
q10_cols, q50_cols, q90_cols : convenience lists.
n_periods, prefix, start_year, and DESCR.

If as_frame=True, return only the DataFrame.

Parameters:

as_frame (bool)
n_samples (int)
n_periods (int)
anomaly_frac (float)
start_year (int)
prefix (str)
base_value (float)
trend_strength (float)
noise_level (float)
interval_width_base (float)
interval_width_noise (float)
interval_width_trend (float)
seed (int | None)

Return type:

Bunch | DataFrame

See also

load_zhongshan_subsidence: Real-world sample for Zhongshan subsidence with quantiles and coordinates.
kdiagram.plot.uncertainty.plot_coverage
kdiagram.plot.uncertainty.plot_coverage_diagnostic
kdiagram.plot.uncertainty.plot_anomaly_magnitude
kdiagram.plot.uncertainty.plot_interval_consistency
kdiagram.plot.uncertainty.plot_model_drift: Visual diagnostics this dataset was designed to support.

Notes

The generator injects a user-controlled fraction of interval failures to create meaningful examples for coverage and anomaly diagnostics. Use this dataset to exercise plots such as coverage rates, point-wise coverage diagnostics, anomaly magnitude, temporal consistency, and drift views [2][3][1].

References

Examples

>>> # Create a small dataset and explore quantile columns:
>>>
>>> from kdiagram.datasets import load_uncertainty_data
>>> ds = load_uncertainty_data(n_samples=10, n_periods=3, seed=0)
>>> sorted(ds.quantile_cols.keys())
['q0.1', 'q0.5', 'q0.9']
>>>
>>> # Return a ``DataFrame`` only:
>>>
>>> df = load_uncertainty_data(as_frame=True, n_samples=5, seed=1)
>>> df.shape[0] == 5
True