kdiagram.utils.melt_q_data

kdiagram.utils.melt_q_data(df, value_prefix=None, dt_name='dt_col', q=None, error='raise', sort_values=None, spatial_cols=None, savefile=None, verbose=0)[source]

Reshape wide-format DataFrame with quantile columns to long format with explicit temporal and quantile dimensions.

This method transforms columns that follow the naming pattern {value_prefix}_{dt_value}_q{quantile} into a structured long format with separated datetime and quantile columns. Handles spatial coordinates preservation through reshaping operations [1].

Parameters:
dfpd.DataFrame

Input DataFrame containing quantile columns. The columns should follow the pattern {value_prefix}_{dt_val}_q{quantile}, where:

  • value_prefix is the base name for the quantile measurement (e.g., 'predicted_subsidence')

  • dt_val is the datetime value (e.g., year or month)

  • quantile is the quantile value (e.g., 0.1, 0.5, 0.9)

value_prefixstr

Base name for quantile measurement columns (e.g., 'predicted_subsidence'). This is used to identify the quantile columns in the DataFrame.

dt_namestr, default=’dt_col’

Name of the column that will contain the extracted temporal information (e.g., ‘year’). This will be used as a column in the output DataFrame for temporal dimension tracking.

qlist of float/str, optional

Specific quantiles to include. Accepts: - Float values (0.1, 0.5, 0.9) - Percentage strings (“10%”, “90%”) - None (include all detected quantiles)

error{‘raise’, ‘warn’, ‘ignore’}, default=’raise’

Specifies how to handle errors when certain columns or data patterns are not found. Options include: - 'raise': Raises a ValueError with a message if columns are missing. - 'warn': Issues a warning with a message if columns are missing. - 'ignore': Silently returns an empty DataFrame when issues are found.

sort_valuesstr, optional

If provided, the final pivoted DataFrame is sorted by this column. If the column does not exist and verbose >= 1, the function warns and does not sort.

spatial_colstuple of str, optional

Columns corresponding to spatial coordinates (e.g., ('lon', 'lat')). These are retained as part of the index when the DataFrame is pivoted.

savefilestr, optional

Path to save the reshaped DataFrame. If provided, the DataFrame will be saved to this location.

verboseint, default=0

Level of verbosity for progress messages. Higher values correspond to more detailed output during processing: - 0: Silent - 1: Basic progress - 2: Column parsing details - 3: Metadata extraction - 4: Reshaping steps - 5: Full debug

Returns:
pd.DataFrame

A long-format DataFrame with quantiles as separate columns for each quantile value. The DataFrame will have the following columns: - Spatial columns (if any) - Temporal column (specified by dt_name) - {value_prefix}_q{quantile} value columns for each quantile

Parameters:
Return type:

DataFrame

See also

pandas.melt

For reshaping DataFrames from wide to long format.

kdiagram.utils.q_utils.reshape_quantile_data

Alternative method for reshaping quantile data.

Notes

  • The column names must follow the pattern {value_prefix}_{dt_value}_q{quantile} for proper extraction.

  • The temporal dimension is determined by the dt_name argument.

  • Spatial columns are automatically detected or can be passed explicitly.

  • The quantiles are pivoted and separated into distinct columns based on the unique quantile values found in the DataFrame [2].

(1)\[\mathbf{W}_{m \times n} \rightarrow \mathbf{L}_{p \times k}\]

Where:

  • \(m\) = Original row count

  • \(n\) = Original columns (quantile + spatial + temporal)

  • \(p\) = \(m \times t\) (t = unique temporal values)

  • \(k\) = Spatial cols + 1 temporal + q quantile cols

References

[1]

McKinney, W. (2010). “Data Structures for Statistical Computing in Python”. Proceedings of the 9th Python in Science Conference.

[2]

Wickham, H. (2014). “Tidy Data”. Journal of Statistical Software, 59(10), 1-23.

Examples

>>> from kdiagram.utils.q_utils import melt_q_data
>>> import pandas as pd
>>> wide_df = pd.DataFrame({
...     'lon': [-118.25, -118.30],
...     'lat': [34.05, 34.10],
...     'subs_2022_q0.1': [1.2, 1.3],
...     'subs_2022_q0.5': [1.5, 1.6],
...     'subs_2023_q0.9': [1.7, 1.8]
... })
>>> long_df = melt_q_data(wide_df, 'subs', dt_name='year')
>>> long_df
Out[113]:
   year  subs_q0.1  subs_q0.5  subs_q0.9
0  2022        1.2        1.5        NaN
1  2023        NaN        NaN        1.7
>>>
>>> long_df.columns
Index(['lon', 'lat', 'year', 'subs_q0.1', 'subs_q0.5'], dtype='object')
>>>
>>> long_df = melt_q_data(wide_df, 'subs', dt_name='year',
...                      spatial_cols=('lon', 'lat'))
>>> long_df
Out[115]:
      lon    lat  year  subs_q0.1  subs_q0.5  subs_q0.9
0 -118.30  34.10  2022        1.3        1.6        NaN
1 -118.30  34.10  2023        NaN        NaN        1.8
2 -118.25  34.05  2022        1.2        1.5        NaN
3 -118.25  34.05  2023        NaN        NaN        1.7