kdiagram.utils.melt_q_data¶
- kdiagram.utils.melt_q_data(df, value_prefix=None, dt_name='dt_col', q=None, error='raise', sort_values=None, spatial_cols=None, savefile=None, verbose=0)[source]¶
Reshape a wide DataFrame with time-embedded quantile columns into a tidy wide table with explicit temporal and quantile dimensions.
This function looks for columns named like
{value_prefix}_{dt_value}_q{quantile}(e.g.,subs_2022_q0.1) and returns a table with one row per temporal value (and optional spatial coordinates), and one column per quantile (e.g.,subs_q0.1,subs_q0.5, …). Internally it melts, extracts metadata (time & quantile), then pivots so quantiles become separate columns.(1)¶\[\mathcal{S}=\text{spatial indices},\quad \mathcal{T}=\text{times},\quad \mathcal{Q}=\text{quantiles}\](2)¶\[\mathbf{W}\in\mathbb{R}^{m\times n} \ \xrightarrow{\ \text{melt+pivot}\ }\ \mathbf{L}\in\mathbb{R}^{p\times k}\]with
\(p=\lvert\{(s,t): s\in\mathcal{S},\,t\in\mathcal{T}\}\rvert\)
\(k=\lvert\mathcal{S}\rvert + 1 + \lvert\mathcal{Q}\rvert\)
The source columns are named
(3)¶\[\mathrm{col}(t,\alpha)= \texttt{f"{value\_prefix}\_\{t\}\_q\{\alpha\}"}\]and hold values \(y_{s,t,\alpha}\). The output table contains, for each \(\alpha\in\mathcal{Q}\), the column
(4)¶\[\texttt{f"{value\_prefix}\_q\{\alpha\}"} \quad\text{with entries}\quad \left[\mathbf{L}\right]_{(s,t),\,\alpha} = y_{s,t,\alpha}.\]- Parameters:
- df
pandas.DataFrame Input DataFrame containing quantile columns named with the pattern
{value_prefix}_{dt_value}_q{quantile}. Heredt_valueis a time token (e.g., year) andquantileis a numeric label (e.g.,0.1,0.5,0.9).- value_prefix
str Base measurement name used to identify quantile columns (e.g.,
'subs'or'predicted_subsidence'). Required.- dt_name
str, default=’dt_col’ Name of the output column holding the extracted temporal value (e.g.,
'year').- q
listof{float,str},optional Which quantiles to keep. Floats like
0.1or strings like"10%"are accepted. IfNone, all detected quantiles are used.- error{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
Behavior when no matching columns are found or a filter removes all: -
'raise': raiseValueErrorwith details -'warn': warn and return an empty DataFrame -'ignore': silently return an empty DataFrame- sort_values
str,optional If provided, sort the final DataFrame by this column. If the column is missing, a warning is printed when
verbose >= 1and no sort is applied.- spatial_cols
tuple[str, …]orlist[str],optional Names of columns that identify spatial coordinates (e.g.,
('lon', 'lat')). If provided, they are retained in the index during aggregation and preserved in the output. If omitted, spatial columns are not retained (unless your environment-specific helper auto-detects them).- savefile
str,optional Path to save the reshaped DataFrame (handled by
@SaveFile).- verbose
int, default=0 Verbosity level: 0=silent, 1=progress, 2=column parsing, 3=metadata extraction, 4=reshaping steps, 5=full debug.
- df
- Returns:
pandas.DataFrameA tidy-wide DataFrame with columns: - Spatial columns (if provided via
spatial_cols) - The temporal column nameddt_name- One column per quantile:{value_prefix}_q{quantile}Quantile column names are normalized to compact fixed-point strings, e.g.
subs_q0.1,subs_q0.25,subs_q0.9(trailing zeros are trimmed).
- Parameters:
- Return type:
DataFrame
See also
pandas.meltReshape from wide to long.
pandas.DataFrame.pivot_tablePivot long to wide.
Notes
Expected input column pattern:
{value_prefix}_{dt_value}_q{quantile}. The time token is captured literally (e.g.,2022) and emitted into thedt_namecolumn.Quantile labels are parsed as floats. They are re-emitted with stable string formatting (e.g.,
q0.1,q0.25).If
spatial_colsis not provided, spatial coordinates are typically not preserved (unless a customcolumns_managerperforms automatic detection in your codebase).The function sorts rows by
spatial_cols + [dt_name](whenspatial_colsare present) or by[dt_name]for consistency.If
sort_valuesis given, a secondary sort is attempted; failures are downgraded to warnings whenverbose >= 2.
References
[1]Wickham, H. (2014). Tidy Data. J. Stat. Software, 59(10).
[2]McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proc. SciPy.
Examples
Basic reshape without spatial coordinates:
>>> from kdiagram.utils.q_utils import melt_q_data >>> wide_df = pd.DataFrame({ ... 'lon': [-118.25, -118.30], ... 'lat': [34.05, 34.10], ... 'subs_2022_q0.1': [1.2, 1.3], ... 'subs_2022_q0.5': [1.5, 1.6], ... 'subs_2023_q0.9': [1.7, 1.8] ... }) >>> out = melt_q_data(wide_df, 'subs', dt_name='year') >>> out.columns.tolist() ['year', 'subs_q0.1', 'subs_q0.5', 'subs_q0.9']
Preserving spatial coordinates:
>>> out2 = melt_q_data( ... wide_df, 'subs', dt_name='year', spatial_cols=('lon', 'lat') ... ) >>> out2.columns[:3].tolist() ['lon', 'lat', 'year']
Filtering to a subset of quantiles:
>>> out3 = melt_q_data(wide_df, 'subs', q=[0.1, '50%']) >>> [c for c in out3.columns if c.startswith('subs_q')] ['subs_q0.1', 'subs_q0.5']