kdiagram.datasets.load_zhongshan_subsidence¶
- kdiagram.datasets.load_zhongshan_subsidence(*, as_frame=False, years=None, quantiles=None, include_coords=True, include_target=True, data_home=None, download_if_missing=True, force_download=False)[source]¶
Load the Zhongshan land subsidence prediction dataset.
This dataset contains sample multi-period quantile predictions (Q10, Q50, Q90 for 2022–2026) and simulated actual subsidence for 2022 and 2023, along with geographic coordinates for 898 locations in Zhongshan, China. It is intended for demonstrating and testing k-diagram’s uncertainty and evaluation plots and for reproducing examples related to spatiotemporal uncertainty diagnostics [1][2].
The function searches a local cache directory, bundled package resources, and optionally a remote repository (in that order). On success it returns either a pandas
DataFrameor aBunchwith convenient attributes.- Parameters:
- as_framebool, default=False
If
False, return aBunchthat includes the filteredDataFrameplus metadata and sliced arrays (e.g., coordinates, target, and quantile columns). IfTrue, return only the filteredDataFrame.- years
listofint,optional Subset to these calendar years (e.g.,
[2023, 2025]) when selecting target and quantile columns. IfNone, load all years found in the file (quantiles typically 2022–2026; targets typically 2022/2023).- quantiles
listoffloat,optional Subset to these quantile levels in
[0, 1](e.g.,[0.1, 0.5, 0.9]). IfNone, load all detected quantiles for the selected years. Defaults to[0.1, 0.5, 0.9].- include_coordsbool, default=True
If
True, include coordinate columns'longitude'and'latitude'when present.- include_targetbool, default=True
If
True, include base target columns (e.g.,'subsidence_2022','subsidence_2023') when present and consistent with the requestedyears.- data_home
str,optional Directory path for caching datasets. If
None, the path is resolved byget_data(). You may also configure the root via theKDIAGRAM_DATAenvironment variable. Example default is~/kdiagram_data.- download_if_missingbool, default=True
If
True, attempt to download the dataset into the cache when it is not found locally nor in package resources.- force_downloadbool, default=False
If
True, attempt to fetch a fresh copy even if a local file exists. Useful to refresh data during development.
- Returns:
- data
Bunchorpandas.DataFrame If
as_frame=False(default) a Bunch with:frame: pandasDataFramefiltered by the request.feature_names: list of included coordinate column names.target_names: list of included target column names.target: NumPy array of target values (orNone).longitude,latitude: NumPy arrays when coordinates are included.quantile_cols: dict mapping keys like'q0.1'to lists of matching column names.q10_cols,q50_cols,q90_cols: convenience lists.years_available,quantiles_available: lists detected in the original file.start_year: smallest year in the loaded subset (if any).n_periods: number of loaded years.DESCR: human-readable dataset description.
If
as_frame=True, only the filtered pandasDataFrameis returned.
- data
- Raises:
FileNotFoundErrorWhen the dataset cannot be resolved from cache or package resources and either downloading is disabled or the download fails.
ValueErrorIf requested
yearsorquantilesare invalid or not present in the data file.
- Parameters:
- Return type:
Bunch | DataFrame
See also
load_uncertainty_dataGenerate a synthetic dataset with controllable anomalies and quantiles for testing visual diagnostics.
kdiagram.plot.uncertainty.plot_model_driftkdiagram.plot.uncertainty.plot_uncertainty_driftkdiagram.plot.uncertainty.plot_coverage_diagnostickdiagram.plot.uncertainty.plot_anomaly_magnitudeExample consumers of this dataset in documentation figures.
Notes
Search order. The loader resolves a file path using the following order: (1) local cache under
data_home; (2) installed package resources; (3) optional remote download whendownload_if_missing=True. You can force step (3) withforce_download=True.Column detection. Quantile columns encode a year \(y\) and a quantile level \(q\) in their names.
(1)¶\[\text{quantile name} \;\equiv\; \texttt{<prefix>}\_{y}\_\texttt{q}q, \qquad y \in \{2022,\dots,2026\},\; q \in (0,1)\]Target columns encode only the year \(y\):
(2)¶\[\text{target name} \;\equiv\; \texttt{subsidence}\_{y}\]In code, the implementation detects these with the following regular expressions (kept as literals, not math):
r"_(\d{4})_q([0-9.]+)$"(quantile columns) andr"_(\d{4})$"(target columns).This design enables flexible subsetting by year and quantile without hard-coding headers.
Coordinate handling. When present and
include_coords=True, the columns'longitude'and'latitude'are included and exposed both in the returned frame and as top-level arrays in the Bunch for convenience.Intended use. The dataset is a compact sample designed for tutorials, documentation figures, and regression tests of k-diagram uncertainty diagnostics [2]. It is not a comprehensive research release.
References
Examples
Basic usage returning a Bunch with metadata:
>>> from kdiagram.datasets import load_zhongshan_subsidence >>> ds = load_zhongshan_subsidence() >>> isinstance(ds.frame, type(__import__('pandas').DataFrame())) True >>> list(ds.quantile_cols.keys())[:3] ['q0.1', 'q0.5', 'q0.9'] >>> >>> # Return only the DataFrame and subset to selected years/quantiles: >>> >>> df = load_zhongshan_subsidence( ... as_frame=True, years=[2023, 2025], quantiles=[0.1, 0.9] ... ) >>> set(c.split('_')[-1] for c in df.columns if '_q' in c) <= {'q0.1','q0.9'} True >>> >>> # Force a fresh download into the cache: >>> _ = load_zhongshan_subsidence(force_download=True)