kdiagram.plot.uncertainty.plot_interval_width¶
- kdiagram.plot.uncertainty.plot_interval_width(df, q_cols, theta_col=None, z_col=None, acov='default', figsize=(8.0, 8.0), title=None, cmap='viridis', s=30, alpha=0.8, show_grid=True, grid_props=None, cbar=True, mask_angle=True, savefig=None)[source]¶
Polar scatter plot visualizing prediction interval width.
This function generates a polar scatter plot to visualize the magnitude of prediction uncertainty, represented by the width of the prediction interval (Upper Quantile - Lower Quantile), across different locations or samples.
Angular Position (`theta`): Represents each location or data point. Currently derived from the DataFrame index, mapped linearly onto the specified angular coverage (acov). The optional theta_col parameter is intended for future use in ordering but is currently ignored for positioning.
Radial Distance (`r`): Directly represents the width of the prediction interval (\(Q_{upper} - Q_{lower}\)). A larger radius indicates greater predicted uncertainty for that point.
Color (`z`): Optionally represents a third variable, specified by z_col (e.g., the median prediction Q50, or the actual value). This allows for correlating the uncertainty width with another metric. If z_col is not provided, the color defaults to representing the interval width (r) itself.
This plot helps to: - Identify locations with high or low prediction uncertainty. - Visualize the spatial distribution or sample distribution of
uncertainty magnitude.
Explore potential correlations between uncertainty width and other variables (like the central prediction) when using z_col.
- dfpd.DataFrame
Input DataFrame containing the quantile prediction columns and optionally columns for theta ordering and color (z_col). Decorators ensure it’s a valid, non-empty pandas DataFrame.
- q_colslist or tuple of str
A sequence containing exactly two string elements: the column name for the lower quantile bound (e.g., ‘prediction_q10’) and the column name for the upper quantile bound (e.g., ‘prediction_q90’). The order must be
[lower_col, upper_col].- theta_colstr, optional
Intended column name for ordering points angularly based on its values (e.g., ‘latitude’). Note: This parameter is currently ignored for positioning/ordering in this implementation; points use DataFrame index order. A warning is issued if provided. Default is
None.- z_colstr, optional
Name of the column whose values will be used to color the scatter points. Common choices include the median prediction column (e.g., ‘prediction_q50’) or an actual value column to see if high/low uncertainty correlates with high/low values. If
None, the color will represent the interval width (r) itself. Default isNone.- acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’}, default=’default’
Specifies the angular coverage (span) of the polar plot:
'default'(360°),'half_circle'(180°),'quarter_circle'(90°),'eighth_circle'(45°).- figsizetuple of (float, float), default=(8.0, 8.0)
Width and height of the figure in inches.
- titlestr, optional
Custom title for the plot. If
None, a default title like “Prediction Interval Width (Q_upper - Q_lower)” is used. Default isNone.- cmapstr, default=’viridis’
Name of the Matplotlib colormap used to color the points based on the z value (from z_col or defaulting to interval width r).
- sint, default=30
Marker size for the scatter points.
- alphafloat, default=0.8
Transparency level for the scatter points (0=transparent, 1=opaque).
- show_gridbool, default=True
If
True, display the polar grid lines.- cbarbool, default=True
If
True, display a color bar indicating the mapping between colors and the z values.- mask_anglebool, default=True
If
True, hide the angular tick labels (degrees). Recommended if the angle is based on index.- savefigstr, optional
File path to save the plot image. If
None, displays the plot interactively. Default isNone.
- axmatplotlib.axes._axes.Axes
The Matplotlib Axes object containing the polar scatter plot, specifically a PolarAxesSubplot.
- TypeError
If q_cols does not contain exactly two elements.
- ValueError
If specified columns in q_cols, theta_col (if provided), or z_col (if provided) are not found in the DataFrame df. If data in the required columns is not numeric.
- plot_interval_consistencyVisualize the temporal consistency of
interval widths.
- plot_uncertainty_driftVisualize the temporal drift of interval
widths as rings.
matplotlib.pyplot.scatter : Function used for plotting points. matplotlib.colors.Normalize : Used for scaling color values.
Rows with NaN values in any of the required columns (q_cols, theta_col if used, z_col if used) are dropped before plotting.
The interval width r is calculated as Upper Quantile - Lower Quantile. If the lower quantile is greater than the upper quantile for any point, this will result in a negative width, which might lead to unexpected plotting behavior as radius is typically non-negative. A warning is issued if negative widths are detected.
The angular coordinate theta is derived from the DataFrame index after NaN removal, not influenced by theta_col currently.
Color is determined by the z_col values if provided, otherwise it defaults to representing the interval width r.
Let \(L_j\) and \(U_j\) be the lower and upper quantile bound values for data point (location) \(j\) (\(j=0, \dots, N-1\) after NaN removal). Let \(z'_j\) be the value from z_col for point \(j\), if z_col is provided.
Radial Coordinate (`r`): Interval Width. .. math:
r_j = U_j - L_j
Color Value (`z`): .. math:
z_j = egin{cases} z'_j & ext{if } z\_col ext{ is provided}\ \ r_j & ext{if } z\_col ext{ is None} \end{cases}Angular Coordinate (`theta`): Let \(S\) be the angular span and :math:` heta_{min}` the start angle from acov. .. math:
heta_j = \left(
rac{j}{N} imes S ight) + heta_{min}
Plotting: Plot points \((r_j, heta_j)\) using scatter, where the color of each point is determined by applying cmap to the normalized value of \(z_j\).
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.uncertainty import plot_interval_width
1. Random Example:
>>> np.random.seed(1) >>> N = 120 >>> df_iw_rand = pd.DataFrame({ ... 'sample_id': range(N), ... 'latitude': np.linspace(40, 42, N) + np.random.randn(N)*0.05, ... 'q10_pred': np.random.rand(N) * 10, ... 'q50_pred': np.random.rand(N) * 10 + 5, ... 'q90_pred': np.random.rand(N) * 10 + 10, # Width varies ... }) >>> # Ensure Q90 > Q10 >>> df_iw_rand['q90_pred'] = df_iw_rand['q10_pred'] + np.abs( ... df_iw_rand['q90_pred'] - df_iw_rand['q10_pred']) >>> >>> ax_iw_rand = plot_interval_width( ... df=df_iw_rand, ... q_cols=['q10_pred', 'q90_pred'], # Pass as list [lower, upper] ... z_col='q50_pred', # Color by median prediction ... theta_col='latitude', # Ignored for positioning ... acov='default', ... title='Interval Width vs. Median Prediction', ... cmap='plasma', ... s=40, ... cbar=True ... ) >>> # plt.show() called internally
2. Concrete Example (Subsidence Data):
>>> # Assume zhongshan_pred_2023_2026 is a loaded DataFrame >>> # Create dummy data if it doesn't exist >>> try: ... zhongshan_pred_2023_2026 ... except NameError: ... print("Creating dummy subsidence data for example...") ... N_sub = 150 ... zhongshan_pred_2023_2026 = pd.DataFrame({ ... 'latitude': np.linspace(22.2, 22.8, N_sub), ... 'subsidence_2023_q10': np.random.rand(N_sub)*5 + 1, ... 'subsidence_2023_q50': np.random.rand(N_sub)*10 + 3, ... 'subsidence_2023_q90': np.random.rand(N_sub)*5 + 6 + np.linspace(0, 10, N_sub), ... # Ensure q90 > q10 ... **{f'subsidence_{yr}_q10': np.random.rand(N_sub)*(yr-2022)*2 + 1 ... for yr in range(2024, 2027)}, # Add other cols if needed ... **{f'subsidence_{yr}_q90': np.random.rand(N_sub)*(yr-2022)*2 + 5 ... + np.linspace(0, (yr-2022)*3, N_sub) ... for yr in range(2024, 2027)}, ... }) >>> # Ensure Q90 > Q10 for the primary year >>> zhongshan_pred_2023_2026['subsidence_2023_q90'] = ( ... zhongshan_pred_2023_2026['subsidence_2023_q10'] + ... np.abs(zhongshan_pred_2023_2026['subsidence_2023_q90'] - ... zhongshan_pred_2023_2026['subsidence_2023_q10']) + 0.1 ... )
>>> ax_iw_sub = plot_interval_width( ... df=zhongshan_pred_2023_2026.head(100), # Use subset ... q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'], # Use list ... z_col='subsidence_2023_q50', # Color by Q50 ... theta_col='latitude', # Ignored for positioning ... acov='quarter_circle', # Use 90 degrees ... title='Spatial Spread of Uncertainty (2023)', ... cmap='YlGnBu', ... s=25, ... cbar=True, # Show colorbar for Q50 ... mask_angle=True ... ) >>> # plt.show() called internally