kdiagram.plot.uncertainty.plot_interval_consistency

kdiagram.plot.uncertainty.plot_interval_consistency(df, qlow_cols, qup_cols, q50_cols=None, theta_col=None, use_cv=True, cmap='coolwarm', acov='default', title=None, figsize=(9, 9), s=30, alpha=0.85, show_grid=True, mask_angle=False, savefig=None, dpi=300, ax=None)[source]

Polar plot showing consistency of prediction interval widths.

This function generates a polar scatter plot to visualize the temporal consistency (or variability) of prediction interval widths (e.g., Q90 - Q10) across different locations over multiple time steps or forecast horizons:

  • The angular position (`theta`) represents each location, currently derived from the DataFrame index and mapped onto the specified angular coverage (acov).

  • The radial distance (`r`) quantifies the inconsistency or variability of the interval width over time for each location. It is calculated as either the standard deviation (absolute variability) or the coefficient of variation (CV, relative variability) of the interval widths (Upper Quantile - Lower Quantile) across the specified time steps. Higher r values indicate locations where the predicted uncertainty range fluctuates more significantly over time.

  • The color of each point typically represents the average median prediction (Q50) across the time steps (if q50_cols are provided). This adds context, helping to identify if interval inconsistency occurs in regions of high or low average predictions. If q50_cols are not provided, color defaults to representing the inconsistency measure r.

This plot is useful for diagnosing model reliability, identifying locations or conditions where the model’s uncertainty estimates are unstable or vary considerably across different forecast horizons.

Parameters:
dfpd.DataFrame

Input DataFrame containing the data. Must include columns specified in qlow_cols and qup_cols. Decorator @isdf ensures this is a pandas DataFrame. Decorator @check_non_emptiness ensures it’s not empty.

qlow_colslist of str

List of column names representing the lower quantile (e.g., Q10) predictions for consecutive time steps (e.g., years). Order should correspond to the time steps. Example: ['subsidence_2023_q10', 'subsidence_2024_q10', ...].

qup_colslist of str

List of column names representing the upper quantile (e.g., Q90) predictions for the same consecutive time steps as qlow_cols. Must be the same length as qlow_cols. Example: ['subsidence_2023_q90', 'subsidence_2024_q90', ...].

q50_colslist of str, optional

List of column names representing the median quantile (Q50) predictions for the same time steps. If provided, the average Q50 value across these columns will be used to color the points. Must be the same length as qlow_cols if provided. If None, the color will represent the radial value r (the inconsistency measure). Default is None.

theta_colstr, optional

Intended column name to determine the angular position (theta) for each location (e.g., ‘latitude’, ‘longitude’, or a spatial index). If None, the DataFrame index is conceptually used. Note: The current implementation maps the DataFrame row index to the angular range specified by `acov`, regardless of whether `theta_col` is provided. Providing `theta_col` will currently trigger a warning but will not affect the plot’s angular axis. Default is None.

use_cvbool, default=True

Determines the measure of interval width variability used for the radial coordinate r:

  • If True, r is the Coefficient of Variation (CV) of the interval widths (Std Dev / Mean). CV measures relative variability, useful when mean widths differ substantially.

  • If False, r is the Standard Deviation (Std Dev) of the interval widths. Std Dev measures absolute variability.

cmapstr, default=’coolwarm’

The name of the Matplotlib colormap used to color the scatter points based on the average Q50 value (or r if q50_cols is None).

acovstr, default=’default’

Angular coverage defining the span of the polar plot’s theta axis. Options: 'default' (360°), 'half_circle' (180°), 'quarter_circle' (90°), 'eighth_circle' (45°). Invalid options default to 'default'.

titlestr, optional

The title displayed above the polar plot. If None, a default title like “Prediction Interval Consistency (Q90–Q10)” is used. Default is None.

figsizetuple of (float, float), default=(9, 9)

The width and height of the figure in inches.

sfloat or int, default=30

The marker size for the scatter points.

alphafloat, default=0.85

The transparency level of the scatter points (0=transparent, 1=opaque).

show_gridbool, default=True

If True, display the polar grid lines.

mask_anglebool, default=False

If True, hide the angular tick labels. Useful if the index- based angle is not directly interpretable.

savefigstr, optional

File path to save the plot image. If None, displays the plot interactively. Default is None.

Returns:
axmatplotlib.axes.Axes

The Matplotlib Axes object containing the polar scatter plot.

Raises:
AssertionError

If qlow_cols and qup_cols have different lengths.

ValueError

If specified columns in qlow_cols, qup_cols, or q50_cols are not found in the DataFrame.

Parameters:

See also

plot_velocity

Plot average velocity in polar coordinates.

numpy.std

Compute the standard deviation.

numpy.mean

Compute the arithmetic mean.

matplotlib.pyplot.scatter

Create scatter plots.

Notes

Interval-width consistency is assessed from paired lower/upper quantiles for multiple time steps. For each location and time step, the width is computed as upper minus lower. The radial value encodes either the standard deviation of these widths (absolute variability) or their coefficient of variation (relative variability), with safe handling of zero means by setting the CV to zero when the average width is numerically indistinguishable from zero. Angles are derived from the row index and mapped linearly across the angular span determined by acov; the current implementation does not use theta_col for positioning. Rows containing missing values in any required column are dropped prior to computation. These diagnostics relate to standard notions of predictive-interval calibration and stability; see Gneiting et al.[1], Jolliffe and Stephenson[2].

Interval widths. Let \(\mathbf L\) and \(\mathbf U\) be matrices extracted from df using qlow_cols and qup_cols, respectively, both of shape \((N,M)\), with \(N\) locations and \(M\) time steps. Define the width matrix \(\mathbf W\) by

(1)\[W_{j,i} = U_{j,i} - L_{j,i}\]

where \(j\) indexes locations (\(0\) to \(N-1\)) and \(i\) indexes time steps (\(0\) to \(M-1\)).

Radial Coordinate Calculation (`r`): Let \(\mathbf{w}_j = (W_{j,0}, \dots, W_{j,M-1})\) be the vector of widths over time for location \(j\). Let \(\bar{w}_j = \text{mean}(\mathbf{w}_j)\) and \(\sigma_{w_j} = \text{std}(\mathbf{w}_j)\).

  • If use_cv=False (Standard Deviation):

    (2)\[r_j = \sigma_{w_j}\]
  • If use_cv=True (Coefficient of Variation):

    (3)\[\begin{split}r_j = \begin{cases} \frac{\sigma_{w_j}}{\bar{w}_j} &\\ \text{if } |\bar{w}_j| > \epsilon \\ 0 & \text{if }\\ |\bar{w}_j| \le \epsilon \end{cases}\end{split}\]

    where \(\epsilon\) is a small threshold to prevent division by zero.

Color Value Calculation (`c`): Let \(\mathbf{Q50}\) be the data matrix (shape \((N, M)\)) from q50_cols.

  • If q50_cols is provided: Let \(\mathbf{q50}_j = (Q50_{j,0}, \dots, Q50_{j,M-1})\).

    (4)\[c_j = \text{mean}(\mathbf{q50}_j) = \frac{1}{M} \sum_{i=0}^{M-1} Q50_{j,i}\]
  • If q50_cols is None:

    \(c_j = r_j\)

Angular Coordinate Calculation (`theta`): Same index-based calculation as plot_velocity. Let \(S\) be the angular span from acov.

(5)\[\theta_j = \frac{j}{N} \times S\]

References

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_interval_consistency

1. Random Example:

>>> np.random.seed(1)
>>> N_points = 120
>>> df_rand_interval = pd.DataFrame({
...     'id': range(N_points),
...     'lat': np.linspace(30, 31, N_points),
...     'val_2021_q10': np.random.rand(N_points) * 5,
...     'val_2021_q50': np.random.rand(N_points) * 5 + 5,
...     'val_2021_q90': np.random.rand(N_points) * 5 + 10,
...     'val_2022_q10': np.random.rand(N_points) * 6, # Slightly wider
...     'val_2022_q50': np.random.rand(N_points) * 6 + 6,
...     'val_2022_q90': np.random.rand(N_points) * 6 + 12,
...     'val_2023_q10': np.random.rand(N_points) * 4, # Narrower
...     'val_2023_q50': np.random.rand(N_points) * 4 + 7,
...     'val_2023_q90': np.random.rand(N_points) * 4 + 11,
... })
>>> q10_cols_rand = ['val_2021_q10', 'val_2022_q10', 'val_2023_q10']
>>> q90_cols_rand = ['val_2021_q90', 'val_2022_q90', 'val_2023_q90']
>>> q50_cols_rand = ['val_2021_q50', 'val_2022_q50', 'val_2023_q50']
>>> ax_rand_ic = plot_interval_consistency(
...     df=df_rand_interval,
...     qlow_cols=q10_cols_rand,
...     qup_cols=q90_cols_rand,
...     q50_cols=q50_cols_rand,
...     theta_col='lat',      # Note: Ignored for positioning
...     use_cv=True,          # Use CV for radial axis
...     cmap='viridis',
...     acov='half_circle',
...     title='Random Interval Width Consistency (CV)',
...     s=35
... )
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data - adapted from docstring):

>>> # Assume zhongshan_pred_2023_2026 is loaded DataFrame like:
>>> # Create dummy data if it doesn't exist
>>> try:
...    zhongshan_pred_2023_2026
... except NameError:
...    print("Creating dummy subsidence data for example...")
...    N_sub = 150
...    zhongshan_pred_2023_2026 = pd.DataFrame({
...       'latitude': np.linspace(22.2, 22.8, N_sub),
...       **{f'subsidence_{yr}_q10': np.random.rand(N_sub)*(yr-2020)+1
...          for yr in range(2023, 2027)},
...       **{f'subsidence_{yr}_q50': np.random.rand(N_sub)*(yr-2019)+5
...          + np.linspace(0, (yr-2022)*2, N_sub)
...          for yr in range(2023, 2027)},
...       **{f'subsidence_{yr}_q90': np.random.rand(N_sub)*(yr-2018)+10
...          + np.linspace(0, (yr-2022)*4, N_sub)
...          for yr in range(2023, 2027)},
...     })
>>> qlow_sub = [f'subsidence_{yr}_q10' for yr in range(2023, 2027)]
>>> qup_sub = [f'subsidence_{yr}_q90' for yr in range(2023, 2027)]
>>> q50_sub = [f'subsidence_{yr}_q50' for yr in range(2023, 2027)]
>>> ax_sub_ic = plot_interval_consistency(
...     df=zhongshan_pred_2023_2026,
...     qlow_cols=qlow_sub,
...     qup_cols=qup_sub,
...     q50_cols=q50_sub,
...     theta_col='latitude',    # Ignored for pos, triggers warning
...     acov='default',
...     title='Subsidence Uncertainty Consistency (2023–2026)',
...     use_cv=False,            # Use Std Dev for radius
...     cmap='coolwarm',
...     s=28,
...     alpha=0.8,
...     mask_angle=True
... )
>>>