kdiagram.utils.melt_q_data¶

kdiagram.utils.melt_q_data(df, value_prefix=None, dt_name='dt_col', q=None, error='raise', sort_values=None, spatial_cols=None, savefile=None, verbose=0)[source]¶

Reshape a wide DataFrame with time-embedded quantile columns into a tidy wide table with explicit temporal and quantile dimensions.

This function looks for columns named like {value_prefix}_{dt_value}_q{quantile} (e.g., subs_2022_q0.1) and returns a table with one row per temporal value (and optional spatial coordinates), and one column per quantile (e.g., subs_q0.1, subs_q0.5, …). Internally it melts, extracts metadata (time & quantile), then pivots so quantiles become separate columns.

(1)¶\[\mathcal{S}=\text{spatial indices},\quad \mathcal{T}=\text{times},\quad \mathcal{Q}=\text{quantiles}\]

(2)¶\[\mathbf{W}\in\mathbb{R}^{m\times n} \ \xrightarrow{\ \text{melt+pivot}\ }\ \mathbf{L}\in\mathbb{R}^{p\times k}\]

with

\(p=\lvert\{(s,t): s\in\mathcal{S},\,t\in\mathcal{T}\}\rvert\)
\(k=\lvert\mathcal{S}\rvert + 1 + \lvert\mathcal{Q}\rvert\)

The source columns are named

(3)¶\[\mathrm{col}(t,\alpha)= \texttt{f"{value\_prefix}\_\{t\}\_q\{\alpha\}"}\]

and hold values \(y_{s,t,\alpha}\). The output table contains, for each \(\alpha\in\mathcal{Q}\), the column

(4)¶\[\texttt{f"{value\_prefix}\_q\{\alpha\}"} \quad\text{with entries}\quad \left[\mathbf{L}\right]_{(s,t),\,\alpha} = y_{s,t,\alpha}.\]

Parameters:

dfpandas.DataFrame: Input DataFrame containing quantile columns named with the pattern {value_prefix}_{dt_value}_q{quantile}. Here dt_value is a time token (e.g., year) and quantile is a numeric label (e.g., 0.1, 0.5, 0.9).
value_prefixstr: Base measurement name used to identify quantile columns (e.g., 'subs' or 'predicted_subsidence'). Required.
dt_namestr, default=’dt_col’: Name of the output column holding the extracted temporal value (e.g., 'year').
qlist of {float, str}, optional: Which quantiles to keep. Floats like 0.1 or strings like "10%" are accepted. If None, all detected quantiles are used.
error{‘raise’, ‘warn’, ‘ignore’}, default=’raise’: Behavior when no matching columns are found or a filter removes all: - 'raise' : raise ValueError with details - 'warn' : warn and return an empty DataFrame - 'ignore': silently return an empty DataFrame
sort_valuesstr, optional: If provided, sort the final DataFrame by this column. If the column is missing, a warning is printed when verbose >= 1 and no sort is applied.
spatial_colstuple[str, …] or list[str], optional: Names of columns that identify spatial coordinates (e.g., ('lon', 'lat')). If provided, they are retained in the index during aggregation and preserved in the output. If omitted, spatial columns are not retained (unless your environment-specific helper auto-detects them).
savefilestr, optional: Path to save the reshaped DataFrame (handled by @SaveFile).
verboseint, default=0: Verbosity level: 0=silent, 1=progress, 2=column parsing, 3=metadata extraction, 4=reshaping steps, 5=full debug.

Returns:

pandas.DataFrame

A tidy-wide DataFrame with columns: - Spatial columns (if provided via spatial_cols) - The temporal column named dt_name - One column per quantile: {value_prefix}_q{quantile}

Quantile column names are normalized to compact fixed-point strings, e.g. subs_q0.1, subs_q0.25, subs_q0.9 (trailing zeros are trimmed).

Parameters:

df (DataFrame)
value_prefix (str | None)
dt_name (str)
q (list[float | str] | None)
error (str)
sort_values (str | None)
spatial_cols (tuple[str, str] | None)
savefile (str | None)
verbose (int)

Return type:

DataFrame

See also

pandas.melt: Reshape from wide to long.
pandas.DataFrame.pivot_table: Pivot long to wide.

Notes

Expected input column pattern: {value_prefix}_{dt_value}_q{quantile}. The time token is captured literally (e.g., 2022) and emitted into the dt_name column.
Quantile labels are parsed as floats. They are re-emitted with stable string formatting (e.g., q0.1, q0.25).
If spatial_cols is not provided, spatial coordinates are typically not preserved (unless a custom columns_manager performs automatic detection in your codebase).
The function sorts rows by spatial_cols + [dt_name] (when spatial_cols are present) or by [dt_name] for consistency.
If sort_values is given, a secondary sort is attempted; failures are downgraded to warnings when verbose >= 2.

References

[1]

Wickham, H. (2014). Tidy Data. J. Stat. Software, 59(10).

[2]

McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proc. SciPy.

Examples

Basic reshape without spatial coordinates:

>>> from kdiagram.utils.q_utils import melt_q_data
>>> wide_df = pd.DataFrame({
...     'lon': [-118.25, -118.30],
...     'lat': [34.05, 34.10],
...     'subs_2022_q0.1': [1.2, 1.3],
...     'subs_2022_q0.5': [1.5, 1.6],
...     'subs_2023_q0.9': [1.7, 1.8]
... })
>>> out = melt_q_data(wide_df, 'subs', dt_name='year')
>>> out.columns.tolist()
['year', 'subs_q0.1', 'subs_q0.5', 'subs_q0.9']

Preserving spatial coordinates:

>>> out2 = melt_q_data(
...     wide_df, 'subs', dt_name='year', spatial_cols=('lon', 'lat')
... )
>>> out2.columns[:3].tolist()
['lon', 'lat', 'year']

Filtering to a subset of quantiles:

>>> out3 = melt_q_data(wide_df, 'subs', q=[0.1, '50%'])
>>> [c for c in out3.columns if c.startswith('subs_q')]
['subs_q0.1', 'subs_q0.5']