kdiagram.utils.reshape_quantile_data

kdiagram.utils.reshape_quantile_data(df, value_prefix, spatial_cols=None, dt_col='year', error='warn', savefile=None, verbose=0)[source]

Reshape a wide-format DataFrame with quantile columns into a DataFrame where the quantiles are separated into distinct columns for each quantile value.

This method transforms columns that follow the naming pattern {value_prefix}_{dt_value}_q{quantile} into a structured format, preserving spatial coordinates and adding the temporal dimension based on extracted datetime values [1].

Parameters:
dfpd.DataFrame

Input DataFrame containing quantile columns. The columns should follow the pattern {value_prefix}_{dt_val}_q{quantile}, where:

  • value_prefix is the base name for the quantile measurement (e.g., 'predicted_subsidence')

  • dt_val is the datetime value (e.g., year or month)

  • quantile is the quantile value (e.g., 0.1, 0.5, 0.9)

value_prefixstr

Base name for quantile measurement columns (e.g., 'predicted_subsidence'). This is used to identify the quantile columns in the DataFrame.

spatial_colslist of str, optional

List of spatial column names (e.g., ['longitude', 'latitude']). These columns will be preserved through the reshaping operations. If None, the default columns (e.g., ['longitude', 'latitude']) will be used.

dt_colstr, default=’year’

Name of the column that will contain the extracted temporal information (e.g., ‘year’). This will be used as a column in the output DataFrame for temporal dimension tracking.

error{‘raise’, ‘warn’, ‘ignore’}, default=’warn’

Specifies how to handle errors when certain columns or data patterns are not found. Options include: - 'raise': Raises a ValueError with a message if columns are missing. - 'warn': Issues a warning with a message if columns are missing. - 'ignore': Silently returns an empty DataFrame when issues are found.

savefilestr, optional

Path to save the reshaped DataFrame. If provided, the DataFrame will be saved to this location.

verboseint, default=0

Level of verbosity for progress messages. Higher values correspond to more detailed output during processing: - 0: Silent - 1: Basic progress - 2: Column parsing details - 3: Metadata extraction - 4: Reshaping steps - 5: Full debug

Returns:
pd.DataFrame

A reshaped DataFrame with quantiles as separate columns for each quantile value. The DataFrame will have the following columns:

  • Spatial columns (if any)

  • Temporal column (specified by dt_col)

  • {value_prefix}_q{quantile} value columns for each quantile

Parameters:
  • df (DataFrame)

  • value_prefix (str)

  • spatial_cols (list[str] | None)

  • dt_col (str)

  • error (str)

  • savefile (str | None)

  • verbose (int)

Return type:

DataFrame

See also

pandas.melt

For reshaping DataFrames from wide to long format.

kdiagram.utils.q_utils.melt_q_data

Alternative method for reshaping quantile data.

Notes

  • The column names must follow the pattern {value_prefix}_{dt_value}_q{quantile} for proper extraction.

  • The temporal dimension is determined by the dt_col argument.

  • Spatial columns are automatically detected or can be passed explicitly.

  • The quantiles are pivoted and separated into distinct columns based on the unique quantile values found in the DataFrame [2].

(1)\[\mathbf{W}_{m \times n} \rightarrow \mathbf{L}_{p \times k}\]

where:

  • \(m\) = Original row count

  • \(n\) = Original columns (quantile + spatial + temporal)

  • \(p\) = \(m \times t\) (t = unique temporal values)

  • \(k\) = Spatial cols + 1 temporal + q quantile cols

References

[1]

McKinney, W. (2010). “Data Structures for Statistical Computing in Python”. Proceedings of the 9th Python in Science Conference.

[2]

Wickham, H. (2014). “Tidy Data”. Journal of Statistical Software, 59(10), 1-23.

Examples

>>> from kdiagram.utils.q_utils import reshape_quantile_data
>>> import pandas as pd
>>> wide_df = pd.DataFrame({
...     'lon': [-118.25, -118.30],
...     'lat': [34.05, 34.10],
...     'subs_2022_q0.1': [1.2, 1.3],
...     'subs_2022_q0.5': [1.5, 1.6],
...     'subs_2023_q0.1': [1.7, 1.8]
... })
>>> reshaped_df = reshape_quantile_data(wide_df, 'subs')
>>> reshaped_df.columns
Index(['lon', 'lat', 'year', 'subs_q0.1', 'subs_q0.5'], dtype='object')