kdiagram.utils.melt_q_data

kdiagram.utils.melt_q_data(df, value_prefix=None, dt_name='dt_col', q=None, error='raise', sort_values=None, spatial_cols=None, savefile=None, verbose=0)[source]

Reshape wide-format DataFrame with quantile columns to long format with explicit temporal and quantile dimensions.

This method transforms columns that follow the naming pattern {value_prefix}_{dt_value}_q{quantile} into a structured long format with separated datetime and quantile columns. Handles spatial coordinates preservation through reshaping operations.

Parameters:
  • df (pd.DataFrame) –

    Input DataFrame containing quantile columns. The columns should follow the pattern {value_prefix}_{dt_val}_q{quantile}, where: - value_prefix is the base name for the quantile measurement

    (e.g., 'predicted_subsidence')

    • dt_val is the datetime value (e.g., year or month)

    • quantile is the quantile value (e.g., 0.1, 0.5, 0.9)

  • value_prefix (str) – Base name for quantile measurement columns (e.g., 'predicted_subsidence'). This is used to identify the quantile columns in the DataFrame.

  • dt_name (str, default 'dt_col') – Name of the column that will contain the extracted temporal information (e.g., ‘year’). This will be used as a column in the output DataFrame for temporal dimension tracking.

  • q (list of float/str, optional) – Specific quantiles to include. Accepts: - Float values (0.1, 0.5, 0.9) - Percentage strings (“10%”, “90%”) - None (include all detected quantiles)

  • error ({'raise', 'warn', 'ignore'}, default 'raise') – Specifies how to handle errors when certain columns or data patterns are not found. Options include: - 'raise': Raises a ValueError with a message if columns are missing. - 'warn': Issues a warning with a message if columns are missing. - 'ignore': Silently returns an empty DataFrame when issues are found.

  • sort_values (str, optional) – If provided, the final pivoted DataFrame is sorted by this column. If the column does not exist and verbose >= 1, the function warns and does not sort.

  • spatial_cols (tuple of str, optional) – Columns corresponding to spatial coordinates (e.g., ('lon', 'lat')). These are retained as part of the index when the DataFrame is pivoted.

  • savefile (str, optional) – Path to save the reshaped DataFrame. If provided, the DataFrame will be saved to this location.

  • verbose (int, default 0) – Level of verbosity for progress messages. Higher values correspond to more detailed output during processing: - 0: Silent - 1: Basic progress - 2: Column parsing details - 3: Metadata extraction - 4: Reshaping steps - 5: Full debug

Returns:

A long-format DataFrame with quantiles as separate columns for each quantile value. The DataFrame will have the following columns: - Spatial columns (if any) - Temporal column (specified by dt_name) - {value_prefix}_q{quantile} value columns for each quantile

Return type:

pd.DataFrame

Examples

>>> from kdiagram.utils.q_utils import melt_q_data
>>> import pandas as pd
>>> wide_df = pd.DataFrame({
...     'lon': [-118.25, -118.30],
...     'lat': [34.05, 34.10],
...     'subs_2022_q0.1': [1.2, 1.3],
...     'subs_2022_q0.5': [1.5, 1.6],
...     'subs_2023_q0.9': [1.7, 1.8]
... })
>>> long_df = melt_q_data(wide_df, 'subs', dt_name='year')
>>> long_df
Out[113]:
   year  subs_q0.1  subs_q0.5  subs_q0.9
0  2022        1.2        1.5        NaN
1  2023        NaN        NaN        1.7
>>> long_df.columns
Index(['lon', 'lat', 'year', 'subs_q0.1', 'subs_q0.5'], dtype='object')
>>> long_df = melt_q_data(wide_df, 'subs', dt_name='year',
...                      spatial_cols=('lon', 'lat'))
>>> long_df
Out[115]:
      lon    lat  year  subs_q0.1  subs_q0.5  subs_q0.9
0 -118.30  34.10  2022        1.3        1.6        NaN
1 -118.30  34.10  2023        NaN        NaN        1.8
2 -118.25  34.05  2022        1.2        1.5        NaN
3 -118.25  34.05  2023        NaN        NaN        1.7

Notes

  • The column names must follow the pattern {value_prefix}_{dt_value}_q{quantile} for proper extraction.

  • The temporal dimension is determined by the dt_name argument.

  • Spatial columns are automatically detected or can be passed explicitly.

  • The quantiles are pivoted and separated into distinct columns based on the unique quantile values found in the DataFrame.

\[\mathbf{W}_{m \times n} \rightarrow \mathbf{L}_{p \times k}\]

Where: - \(m\) = Original row count - \(n\) = Original columns (quantile + spatial + temporal) - \(p\) = \(m \times t\) (t = unique temporal values) - \(k\) = Spatial cols + 1 temporal + q quantile cols

See also

pandas.melt

For reshaping DataFrames from wide to long format.

kdiagram.utils.q_utils.reshape_quantile_data

Alternative method for reshaping quantile data.

References