kdiagram.utils.reshape_quantile_data¶

kdiagram.utils.reshape_quantile_data(df, value_prefix, spatial_cols=None, dt_col='year', error='warn', savefile=None, verbose=0)[source]¶

Reshape a wide-format DataFrame with quantile columns into a DataFrame where the quantiles are separated into distinct columns for each quantile value.

This method transforms columns that follow the naming pattern {value_prefix}_{dt_value}_q{quantile} into a structured format, preserving spatial coordinates and adding the temporal dimension based on extracted datetime values [1].

Parameters:

dfpd.DataFrame

Input DataFrame containing quantile columns. The columns should follow the pattern {value_prefix}_{dt_val}_q{quantile}, where:

value_prefix is the base name for the quantile measurement (e.g., 'predicted_subsidence')
dt_val is the datetime value (e.g., year or month)
quantile is the quantile value (e.g., 0.1, 0.5, 0.9)

value_prefixstr

Base name for quantile measurement columns (e.g., 'predicted_subsidence'). This is used to identify the quantile columns in the DataFrame.

spatial_colslist of str, optional

List of spatial column names (e.g., ['longitude', 'latitude']). These columns will be preserved through the reshaping operations. If None, the default columns (e.g., ['longitude', 'latitude']) will be used.

dt_colstr, default=’year’

Name of the column that will contain the extracted temporal information (e.g., ‘year’). This will be used as a column in the output DataFrame for temporal dimension tracking.

error{‘raise’, ‘warn’, ‘ignore’}, default=’warn’

Specifies how to handle errors when certain columns or data patterns are not found. Options include: - 'raise': Raises a ValueError with a message if columns are missing. - 'warn': Issues a warning with a message if columns are missing. - 'ignore': Silently returns an empty DataFrame when issues are found.

savefilestr, optional

Path to save the reshaped DataFrame. If provided, the DataFrame will be saved to this location.

verboseint, default=0

Level of verbosity for progress messages. Higher values correspond to more detailed output during processing: - 0: Silent - 1: Basic progress - 2: Column parsing details - 3: Metadata extraction - 4: Reshaping steps - 5: Full debug

Returns:

pd.DataFrame

A reshaped DataFrame with quantiles as separate columns for each quantile value. The DataFrame will have the following columns:

Spatial columns (if any)
Temporal column (specified by dt_col)
{value_prefix}_q{quantile} value columns for each quantile

Parameters:

df (DataFrame)
value_prefix (str)
spatial_cols (list[str] | None)
dt_col (str)
error (str)
savefile (str | None)
verbose (int)

Return type:

DataFrame

See also

pandas.melt: For reshaping DataFrames from wide to long format.
kdiagram.utils.q_utils.melt_q_data: Alternative method for reshaping quantile data.

Notes

The column names must follow the pattern {value_prefix}_{dt_value}_q{quantile} for proper extraction.
The temporal dimension is determined by the dt_col argument.
Spatial columns are automatically detected or can be passed explicitly.
The quantiles are pivoted and separated into distinct columns based on the unique quantile values found in the DataFrame [2].

(1)¶\[\mathbf{W}_{m \times n} \rightarrow \mathbf{L}_{p \times k}\]

where:

\(m\) = Original row count
\(n\) = Original columns (quantile + spatial + temporal)
\(p\) = \(m \times t\) (t = unique temporal values)
\(k\) = Spatial cols + 1 temporal + q quantile cols

References

[1]

McKinney, W. (2010). “Data Structures for Statistical Computing in Python”. Proceedings of the 9th Python in Science Conference.

[2]

Wickham, H. (2014). “Tidy Data”. Journal of Statistical Software, 59(10), 1-23.

Examples

>>> from kdiagram.utils.q_utils import reshape_quantile_data
>>> import pandas as pd
>>> wide_df = pd.DataFrame({
...     'lon': [-118.25, -118.30],
...     'lat': [34.05, 34.10],
...     'subs_2022_q0.1': [1.2, 1.3],
...     'subs_2022_q0.5': [1.5, 1.6],
...     'subs_2023_q0.1': [1.7, 1.8]
... })
>>> reshaped_df = reshape_quantile_data(wide_df, 'subs')
>>> reshaped_df.columns
Index(['lon', 'lat', 'year', 'subs_q0.1', 'subs_q0.5'], dtype='object')