Utility Function Examples¶
This section of the gallery demonstrates practical usage of the utility functions provided within k-diagram. These functions are primarily designed to help identify, validate, and reshape quantile data stored in pandas DataFrames, preparing it for analysis or visualization.
Each example includes Python code using sample data and shows the expected output printed to the console.
Detecting Quantile Columns¶
Uses detect_quantiles_in() to find columns matching
quantile naming patterns (e.g., prefix_date_qX.X or prefix_qX.X).
This example shows detection based on prefix, date, and returning
different output types.
1import kdiagram.utils as kdu # Assuming utils are exposed here
2import pandas as pd
3import numpy as np
4
5# --- Sample Data ---
6df = pd.DataFrame({
7 'site': ['A', 'B'],
8 'value_2023_q0.1': [10, 11],
9 'value_2023_q0.9': [20, 22],
10 'temp_2023_q0.5': [15, 16],
11 'value_2024_q0.1': [12, 13],
12 'value_2024_q0.9': [23, 25],
13 'notes': ['x', 'y']
14})
15
16# --- Usage ---
17print("Detecting 'value' columns for 2023:")
18q_cols_2023 = kdu.detect_quantiles_in(
19 df, col_prefix='value', dt_value=['2023']
20)
21print(q_cols_2023)
22
23print("\nDetecting all quantile columns (returning levels):")
24q_levels = kdu.detect_quantiles_in(df, return_types='q_val')
25print(sorted(q_levels)) # Sort for consistent output
26
27print("\nDetecting 'temp' columns (returning frame):")
28temp_frame = kdu.detect_quantiles_in(
29 df, col_prefix='temp', return_types='frame'
30)
31print(temp_frame)
Detecting 'value' columns for 2023:
['value_2023_q0.1', 'value_2023_q0.9']
Detecting all quantile columns (returning levels):
[0.1, 0.5, 0.9]
Detecting 'temp' columns (returning frame):
temp_2023_q0.5
0 15
1 16
Building Quantile Column Names¶
Uses build_q_column_names() to construct expected
quantile column names based on patterns and validate their existence in
a DataFrame.
1import kdiagram.utils as kdu
2import pandas as pd
3
4# --- Sample Data ---
5df = pd.DataFrame({
6 'site': ['A', 'B'],
7 'precip_2024_q0.1': [1, 2],
8 'precip_2024_q0.9': [5, 6],
9 'precip_2025_q0.1': [1.5, 2.5],
10 # Missing 'precip_2025_q0.9'
11})
12
13# --- Usage ---
14print("Building names for 2024, quantiles 0.1, 0.9:")
15# Assuming strict_match=True (default)
16names_2024 = kdu.build_q_column_names(
17 df, quantiles=[0.1, 0.9], value_prefix='precip', dt_value=['2024']
18)
19print(names_2024)
20
21print("\nBuilding names for 2025, quantiles 0.1, 0.9 (one missing):")
22# dt_value can often handle integers as years
23names_2025 = kdu.build_q_column_names(
24 df, quantiles=[0.1, 0.9], value_prefix='precip', dt_value=[2025]
25)
26print(names_2025)
Building names for 2024, quantiles 0.1, 0.9:
['precip_2024_q0.1', 'precip_2024_q0.9']
Building names for 2025, quantiles 0.1, 0.9 (one missing):
['precip_2025_q0.1']
Reshaping Quantile Data (Wide to Semi-Long)¶
Uses reshape_quantile_data() to transform
wide-format quantile data (e.g., prefix_date_qX.X columns) into a
format where each row is a location/time combination and different
quantiles become columns (e.g., prefix_qX.X).
1import kdiagram.utils as kdu
2import pandas as pd
3
4# --- Sample Wide Data ---
5wide_df = pd.DataFrame({
6 'lon': [-118.25, -118.30],
7 'lat': [34.05, 34.10],
8 'subs_2022_q0.1': [1.2, 1.3],
9 'subs_2022_q0.5': [1.5, 1.6],
10 'subs_2023_q0.1': [1.7, 1.8],
11 'subs_2023_q0.5': [1.9, 2.0],
12})
13print("Original Wide DataFrame:")
14print(wide_df)
15
16# --- Usage ---
17semi_long_df = kdu.reshape_quantile_data(
18 wide_df,
19 value_prefix='subs',
20 spatial_cols=['lon', 'lat'],
21 dt_col='year' # Name for the new time column
22)
23print("\nReshaped (Semi-Long) DataFrame:")
24print(semi_long_df)
Original Wide DataFrame:
lon lat subs_2022_q0.1 subs_2022_q0.5 subs_2023_q0.1 subs_2023_q0.5
0 -118.25 34.05 1.2 1.5 1.7 1.9
1 -118.30 34.10 1.3 1.6 1.8 2.0
Reshaped (Semi-Long) DataFrame:
lon lat year subs_q0.1 subs_q0.5
0 -118.25 34.05 2022 1.2 1.5
1 -118.30 34.10 2022 1.3 1.6
2 -118.25 34.05 2023 1.7 1.9
3 -118.30 34.10 2023 1.8 2.0
Uses melt_q_data() to convert a wide-format
DataFrame into a fully long (“tidy”) format with separate columns for
time, quantile level, and the measurement value.
(Note: The exact output structure of melt_q_data might depend on its specific implementation; this example shows a typical “melted” structure.)
1import kdiagram.utils as kdu
2import pandas as pd
3
4# --- Sample Wide Data ---
5wide_df = pd.DataFrame({
6 'lon': [-118.25, -118.30],
7 'lat': [34.05, 34.10],
8 'subs_2022_q0.1': [1.2, 1.3],
9 'subs_2022_q0.5': [1.5, 1.6],
10 'subs_2023_q0.1': [1.7, 1.8],
11})
12print("Original Wide DataFrame:")
13print(wide_df)
14
15# --- Usage ---
16long_df = kdu.melt_q_data(
17 wide_df,
18 value_prefix='subs',
19 spatial_cols=('lon', 'lat'),
20 dt_name='year' # Name for the time column
21)
22print("\nMelted (Long) DataFrame:")
23print(long_df)
Original Wide DataFrame:
lon lat subs_2022_q0.1 subs_2022_q0.5 subs_2023_q0.1
0 -118.25 34.05 1.2 1.5 1.7
1 -118.30 34.10 1.3 1.6 1.8
Melted (Long) DataFrame:
lon lat year quantile subs
0 -118.25 34.05 2022 0.1 1.2
1 -118.30 34.10 2022 0.1 1.3
2 -118.25 34.05 2022 0.5 1.5
3 -118.30 34.10 2022 0.5 1.6
4 -118.25 34.05 2023 0.1 1.7
5 -118.30 34.10 2023 0.1 1.8
Pivoting Quantile Data (Long to Wide)¶
Uses pivot_q_data() to perform the inverse of
melting; converts a long-format DataFrame back into a wide format where
each time step and quantile combination becomes a separate column
(e.g., prefix_date_qX.X).
1import kdiagram.utils as kdu
2import pandas as pd
3
4# --- Sample Long Data (output from reshape or similar) ---
5long_df = pd.DataFrame({
6 'lon': [-118.25, -118.30, -118.25, -118.30],
7 'lat': [34.05, 34.10, 34.05, 34.10],
8 'year': [2022, 2022, 2023, 2023],
9 'subs_q0.1': [1.2, 1.3, 1.7, 1.8], # Quantiles are columns
10 'subs_q0.5': [1.5, 1.6, 1.9, 2.0]
11})
12print("Original Long DataFrame:")
13print(long_df)
14
15# --- Usage ---
16wide_df_reconstructed = kdu.pivot_q_data(
17 long_df,
18 value_prefix='subs',
19 spatial_cols=('lon', 'lat'),
20 dt_col='year' # Column containing time steps
21)
22print("\nPivoted (Wide) DataFrame:")
23# Sort columns for consistent output display
24print(wide_df_reconstructed.reindex(
25 sorted(wide_df_reconstructed.columns), axis=1)
26)
Original Long DataFrame:
lon lat year subs_q0.1 subs_q0.5
0 -118.25 34.05 2022 1.2 1.5
1 -118.30 34.10 2022 1.3 1.6
2 -118.25 34.05 2023 1.7 1.9
3 -118.30 34.10 2023 1.8 2.0
Pivoted (Wide) DataFrame:
lat lon subs_2022_q0.1 subs_2022_q0.5 subs_2023_q0.1 subs_2023_q0.5
0 34.10 -118.300 1.3 1.6 1.8 2.0
1 34.05 -118.250 1.2 1.5 1.7 1.9