kdiagram.datasets.load_uncertainty_data¶
- kdiagram.datasets.load_uncertainty_data(*, as_frame=False, n_samples=150, n_periods=4, anomaly_frac=0.15, start_year=2022, prefix='value', base_value=10.0, trend_strength=1.5, noise_level=2.0, interval_width_base=4.0, interval_width_noise=1.5, interval_width_trend=0.5, seed=42)[source]¶
Generate a synthetic dataset for uncertainty diagnostics.
Creates a compact, controllable dataset for demonstrating k-diagram plots: one period of actuals, multi-period predicted quantiles (Q10, Q50, Q90), configurable trends and noise, injected interval failures (anomalies), and optional coordinates. Useful for examples, unit tests, and performance checks [1].
- Parameters:
- as_framebool, default=False
If
False, return aBunchwith the generated frame and metadata; ifTrue, return only the pandasDataFrame.- n_samples
int, default=150 Number of rows (locations) to generate.
- n_periods
int, default=4 Number of consecutive periods (e.g., years) for which quantiles are generated.
- anomaly_frac
float, default=0.15 Approximate fraction in
[0, 1]where the actual value lies outside the first period’s Q10–Q90 interval.- start_year
int, default=2022 Starting year used when naming time-dependent columns.
- prefix
str, default=’value’ Base prefix for value and quantile column names.
- base_value
float, default=10.0 Approximate mean of the signal in the first period.
- trend_strength
float, default=1.5 Linear trend added to the Q50 trajectory across periods.
- noise_level
float, default=2.0 Standard deviation of base random noise added to values.
- interval_width_base
float, default=4.0 Base width of the Q10–Q90 interval in the first period.
- interval_width_noise
float, default=1.5 Random variability added to the interval width per sample/period.
- interval_width_trend
float, default=0.5 Linear trend added to the interval width across periods.
- seed
intorNone, default=42 Random seed for reproducibility. If
None, use an unconstrained RNG state.
- Returns:
- data
Bunchorpandas.DataFrame If
as_frame=False(default) a Bunch with:frame: pandasDataFrameof synthesized values.feature_names: included feature columns (e.g., coords).target_names: names of actual/target columns.target: NumPy array of target values (if present).quantile_cols: dict mapping'q0.1','q0.5','q0.9'to lists of columns across periods.q10_cols,q50_cols,q90_cols: convenience lists.n_periods,prefix,start_year, andDESCR.
If
as_frame=True, return only theDataFrame.
- data
- Parameters:
- Return type:
Bunch | DataFrame
See also
load_zhongshan_subsidenceReal-world sample for Zhongshan subsidence with quantiles and coordinates.
kdiagram.plot.uncertainty.plot_coveragekdiagram.plot.uncertainty.plot_coverage_diagnostickdiagram.plot.uncertainty.plot_anomaly_magnitudekdiagram.plot.uncertainty.plot_interval_consistencykdiagram.plot.uncertainty.plot_model_driftVisual diagnostics this dataset was designed to support.
Notes
The generator injects a user-controlled fraction of interval failures to create meaningful examples for coverage and anomaly diagnostics. Use this dataset to exercise plots such as coverage rates, point-wise coverage diagnostics, anomaly magnitude, temporal consistency, and drift views [2][3][1].
References
Examples
>>> # Create a small dataset and explore quantile columns: >>> >>> from kdiagram.datasets import load_uncertainty_data >>> ds = load_uncertainty_data(n_samples=10, n_periods=3, seed=0) >>> sorted(ds.quantile_cols.keys()) ['q0.1', 'q0.5', 'q0.9'] >>> >>> # Return a ``DataFrame`` only: >>> >>> df = load_uncertainty_data(as_frame=True, n_samples=5, seed=1) >>> df.shape[0] == 5 True