kdiagram.utils.melt_q_data¶
- kdiagram.utils.melt_q_data(df, value_prefix=None, dt_name='dt_col', q=None, error='raise', sort_values=None, spatial_cols=None, savefile=None, verbose=0)[source]¶
Reshape wide-format DataFrame with quantile columns to long format with explicit temporal and quantile dimensions.
This method transforms columns that follow the naming pattern
{value_prefix}_{dt_value}_q{quantile}into a structured long format with separated datetime and quantile columns. Handles spatial coordinates preservation through reshaping operations [1].- Parameters:
- df
pd.DataFrame Input DataFrame containing quantile columns. The columns should follow the pattern
{value_prefix}_{dt_val}_q{quantile}, where:value_prefix is the base name for the quantile measurement (e.g.,
'predicted_subsidence')dt_val is the datetime value (e.g., year or month)
quantile is the quantile value (e.g., 0.1, 0.5, 0.9)
- value_prefix
str Base name for quantile measurement columns (e.g.,
'predicted_subsidence'). This is used to identify the quantile columns in the DataFrame.- dt_name
str, default=’dt_col’ Name of the column that will contain the extracted temporal information (e.g., ‘year’). This will be used as a column in the output DataFrame for temporal dimension tracking.
- q
listoffloat/str,optional Specific quantiles to include. Accepts: - Float values (0.1, 0.5, 0.9) - Percentage strings (“10%”, “90%”) - None (include all detected quantiles)
- error{‘raise’, ‘warn’, ‘ignore’}, default=’raise’
Specifies how to handle errors when certain columns or data patterns are not found. Options include: -
'raise': Raises a ValueError with a message if columns are missing. -'warn': Issues a warning with a message if columns are missing. -'ignore': Silently returns an empty DataFrame when issues are found.- sort_values
str,optional If provided, the final pivoted DataFrame is sorted by this column. If the column does not exist and verbose >= 1, the function warns and does not sort.
- spatial_cols
tupleofstr,optional Columns corresponding to spatial coordinates (e.g.,
('lon', 'lat')). These are retained as part of the index when the DataFrame is pivoted.- savefile
str,optional Path to save the reshaped DataFrame. If provided, the DataFrame will be saved to this location.
- verbose
int, default=0 Level of verbosity for progress messages. Higher values correspond to more detailed output during processing: - 0: Silent - 1: Basic progress - 2: Column parsing details - 3: Metadata extraction - 4: Reshaping steps - 5: Full debug
- df
- Returns:
pd.DataFrameA long-format DataFrame with quantiles as separate columns for each quantile value. The DataFrame will have the following columns: - Spatial columns (if any) - Temporal column (specified by
dt_name) -{value_prefix}_q{quantile}value columns for each quantile
- Parameters:
- Return type:
See also
pandas.meltFor reshaping DataFrames from wide to long format.
kdiagram.utils.q_utils.reshape_quantile_dataAlternative method for reshaping quantile data.
Notes
The column names must follow the pattern
{value_prefix}_{dt_value}_q{quantile}for proper extraction.The temporal dimension is determined by the
dt_nameargument.Spatial columns are automatically detected or can be passed explicitly.
The quantiles are pivoted and separated into distinct columns based on the unique quantile values found in the DataFrame [2].
(1)¶\[\mathbf{W}_{m \times n} \rightarrow \mathbf{L}_{p \times k}\]Where:
\(m\) = Original row count
\(n\) = Original columns (quantile + spatial + temporal)
\(p\) = \(m \times t\) (t = unique temporal values)
\(k\) = Spatial cols + 1 temporal + q quantile cols
References
[1]McKinney, W. (2010). “Data Structures for Statistical Computing in Python”. Proceedings of the 9th Python in Science Conference.
[2]Wickham, H. (2014). “Tidy Data”. Journal of Statistical Software, 59(10), 1-23.
Examples
>>> from kdiagram.utils.q_utils import melt_q_data >>> import pandas as pd >>> wide_df = pd.DataFrame({ ... 'lon': [-118.25, -118.30], ... 'lat': [34.05, 34.10], ... 'subs_2022_q0.1': [1.2, 1.3], ... 'subs_2022_q0.5': [1.5, 1.6], ... 'subs_2023_q0.9': [1.7, 1.8] ... }) >>> long_df = melt_q_data(wide_df, 'subs', dt_name='year') >>> long_df Out[113]: year subs_q0.1 subs_q0.5 subs_q0.9 0 2022 1.2 1.5 NaN 1 2023 NaN NaN 1.7 >>> >>> long_df.columns Index(['lon', 'lat', 'year', 'subs_q0.1', 'subs_q0.5'], dtype='object') >>> >>> long_df = melt_q_data(wide_df, 'subs', dt_name='year', ... spatial_cols=('lon', 'lat')) >>> long_df Out[115]: lon lat year subs_q0.1 subs_q0.5 subs_q0.9 0 -118.30 34.10 2022 1.3 1.6 NaN 1 -118.30 34.10 2023 NaN NaN 1.8 2 -118.25 34.05 2022 1.2 1.5 NaN 3 -118.25 34.05 2023 NaN NaN 1.7