kdiagram.plot.uncertainty.plot_interval_consistency

kdiagram.plot.uncertainty.plot_interval_consistency(df, qlow_cols, qup_cols, q50_cols=None, theta_col=None, use_cv=True, cmap='coolwarm', acov='default', title=None, figsize=(9, 9), s=30, alpha=0.85, show_grid=True, mask_angle=False, savefig=None)[source]

Polar plot showing consistency of prediction interval widths.

This function generates a polar scatter plot to visualize the temporal consistency (or variability) of prediction interval widths (e.g., Q90 - Q10) across different locations over multiple time steps or forecast horizons.

  • The angular position (`theta`) represents each location, currently derived from the DataFrame index and mapped onto the specified angular coverage (acov).

  • The radial distance (`r`) quantifies the inconsistency or variability of the interval width over time for each location. It is calculated as either the standard deviation (absolute variability) or the coefficient of variation (CV, relative variability) of the interval widths (Upper Quantile - Lower Quantile) across the specified time steps. Higher r values indicate locations where the predicted uncertainty range fluctuates more significantly over time.

  • The color of each point typically represents the average median prediction (Q50) across the time steps (if q50_cols are provided). This adds context, helping to identify if interval inconsistency occurs in regions of high or low average predictions. If q50_cols are not provided, color defaults to representing the inconsistency measure r.

This plot is useful for diagnosing model reliability, identifying locations or conditions where the model’s uncertainty estimates are unstable or vary considerably across different forecast horizons.

dfpd.DataFrame

Input DataFrame containing the data. Must include columns specified in qlow_cols and qup_cols. Decorator @isdf ensures this is a pandas DataFrame. Decorator @check_non_emptiness ensures it’s not empty.

qlow_colslist of str

List of column names representing the lower quantile (e.g., Q10) predictions for consecutive time steps (e.g., years). Order should correspond to the time steps. Example: ['subsidence_2023_q10', 'subsidence_2024_q10', ...].

qup_colslist of str

List of column names representing the upper quantile (e.g., Q90) predictions for the same consecutive time steps as qlow_cols. Must be the same length as qlow_cols. Example: ['subsidence_2023_q90', 'subsidence_2024_q90', ...].

q50_colslist of str, optional

List of column names representing the median quantile (Q50) predictions for the same time steps. If provided, the average Q50 value across these columns will be used to color the points. Must be the same length as qlow_cols if provided. If None, the color will represent the radial value r (the inconsistency measure). Default is None.

theta_colstr, optional

Intended column name to determine the angular position (theta) for each location (e.g., ‘latitude’, ‘longitude’, or a spatial index). If None, the DataFrame index is conceptually used. Note: The current implementation maps the DataFrame row index to the angular range specified by `acov`, regardless of whether `theta_col` is provided. Providing `theta_col` will currently trigger a warning but will not affect the plot’s angular axis. Default is None.

use_cvbool, default=True

Determines the measure of interval width variability used for the radial coordinate r: - If True, r is the Coefficient of Variation (CV) of the

interval widths (Std Dev / Mean). CV measures relative variability, useful when mean widths differ substantially.

  • If False, r is the Standard Deviation (Std Dev) of the interval widths. Std Dev measures absolute variability.

cmapstr, default=’coolwarm’

The name of the Matplotlib colormap used to color the scatter points based on the average Q50 value (or r if q50_cols is None).

acovstr, default=’default’

Angular coverage defining the span of the polar plot’s theta axis. Options: 'default' (360°), 'half_circle' (180°), 'quarter_circle' (90°), 'eighth_circle' (45°). Invalid options default to 'default'.

titlestr, optional

The title displayed above the polar plot. If None, a default title like “Prediction Interval Consistency (Q90–Q10)” is used. Default is None.

figsizetuple of (float, float), default=(9, 9)

The width and height of the figure in inches.

sfloat or int, default=30

The marker size for the scatter points.

alphafloat, default=0.85

The transparency level of the scatter points (0=transparent, 1=opaque).

show_gridbool, default=True

If True, display the polar grid lines.

mask_anglebool, default=False

If True, hide the angular tick labels. Useful if the index- based angle is not directly interpretable.

savefigstr, optional

File path to save the plot image. If None, displays the plot interactively. Default is None.

axmatplotlib.axes.Axes

The Matplotlib Axes object containing the polar scatter plot.

AssertionError

If qlow_cols and qup_cols have different lengths.

ValueError

If specified columns in qlow_cols, qup_cols, or q50_cols are not found in the DataFrame.

plot_velocity : Plot average velocity in polar coordinates. plot_polar_uncertainty_spread : Plot uncertainty ranges year-wise. numpy.std : Compute the standard deviation. numpy.mean : Compute the arithmetic mean. matplotlib.pyplot.scatter : Create scatter plots.

  • The function requires corresponding lower and upper quantile columns for multiple time steps (at least two steps are implicitly needed for std dev/CV calculation, though one step would yield 0).

  • The interval width is calculated as Upper Quantile - Lower Quantile for each location and time step.

  • The Coefficient of Variation (CV) calculation handles potential division by zero (when mean width is zero) by setting CV to 0.

  • The angular coordinate theta is derived from the DataFrame index, not the theta_col parameter in the current implementation.

  • Input validation relies on decorators @isdf and @check_non_emptiness.

Let \(\mathbf{L}\) and \(\mathbf{U}\) be data matrices extracted from df using columns qlow_cols and qup_cols respectively, both of shape \((N, M)\), where \(N\) is the number of locations and \(M\) is the number of time steps.

  1. Interval Width Calculation: The matrix of interval widths \(\mathbf{W}\) (shape \((N, M)\)) is calculated as: .. math:

    W_{j,i} = U_{j,i} - L_{j,i}
    

    where \(j\) indexes locations (\(0\) to \(N-1\)) and \(i\) indexes time steps (\(0\) to \(M-1\)).

  2. Radial Coordinate Calculation (`r`): Let \(\mathbf{w}_j = (W_{j,0}, \dots, W_{j,M-1})\) be the vector of widths over time for location \(j\). Let \(ar{w}_j = ext{mean}(\mathbf{w}_j)\) and \(\sigma_{w_j} = ext{std}(\mathbf{w}_j)\). - If use_cv=False (Standard Deviation):

    \[r_j = \sigma_{w_j}\]
    • If use_cv=True (Coefficient of Variation): .. math:

      r_j = egin{cases}
      
rac{sigma_{w_j}}{ar{w}_j} &

ext{if } |ar{w}_j| > epsilon 0 & ext{if }

|ar{w}_j| le epsilon end{cases}

where \(\epsilon\) is a small threshold to prevent division by zero.

  1. Color Value Calculation (`c`): Let \(\mathbf{Q50}\) be the data matrix (shape \((N, M)\)) from q50_cols. - If q50_cols is provided: Let :math:`mathbf{q50}_j =

    (Q50_{j,0}, dots, Q50_{j,M-1})`. .. math:

    c_j =      ext{mean}(\mathbf{q50}_j) =
    
rac{1}{M} sum_{i=0}^{M-1} Q50_{j,i}
  • If q50_cols is None: \(c_j = r_j\)

  1. Angular Coordinate Calculation (`theta`): Same index-based calculation as plot_velocity. Let \(S\) be the angular span from acov. .. math:

    heta_j =
    

rac{j}{N} imes S

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_interval_consistency

1. Random Example:

>>> np.random.seed(1)
>>> N_points = 120
>>> df_rand_interval = pd.DataFrame({
...     'id': range(N_points),
...     'lat': np.linspace(30, 31, N_points),
...     'val_2021_q10': np.random.rand(N_points) * 5,
...     'val_2021_q50': np.random.rand(N_points) * 5 + 5,
...     'val_2021_q90': np.random.rand(N_points) * 5 + 10,
...     'val_2022_q10': np.random.rand(N_points) * 6, # Slightly wider
...     'val_2022_q50': np.random.rand(N_points) * 6 + 6,
...     'val_2022_q90': np.random.rand(N_points) * 6 + 12,
...     'val_2023_q10': np.random.rand(N_points) * 4, # Narrower
...     'val_2023_q50': np.random.rand(N_points) * 4 + 7,
...     'val_2023_q90': np.random.rand(N_points) * 4 + 11,
... })
>>> q10_cols_rand = ['val_2021_q10', 'val_2022_q10', 'val_2023_q10']
>>> q90_cols_rand = ['val_2021_q90', 'val_2022_q90', 'val_2023_q90']
>>> q50_cols_rand = ['val_2021_q50', 'val_2022_q50', 'val_2023_q50']
>>> ax_rand_ic = plot_interval_consistency(
...     df=df_rand_interval,
...     qlow_cols=q10_cols_rand,
...     qup_cols=q90_cols_rand,
...     q50_cols=q50_cols_rand,
...     theta_col='lat',      # Note: Ignored for positioning
...     use_cv=True,          # Use CV for radial axis
...     cmap='viridis',
...     acov='half_circle',
...     title='Random Interval Width Consistency (CV)',
...     s=35
... )
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data - adapted from docstring):

>>> # Assume zhongshan_pred_2023_2026 is loaded DataFrame like:
>>> # Create dummy data if it doesn't exist
>>> try:
...    zhongshan_pred_2023_2026
... except NameError:
...    print("Creating dummy subsidence data for example...")
...    N_sub = 150
...    zhongshan_pred_2023_2026 = pd.DataFrame({
...       'latitude': np.linspace(22.2, 22.8, N_sub),
...       **{f'subsidence_{yr}_q10': np.random.rand(N_sub)*(yr-2020)+1
...          for yr in range(2023, 2027)},
...       **{f'subsidence_{yr}_q50': np.random.rand(N_sub)*(yr-2019)+5
...          + np.linspace(0, (yr-2022)*2, N_sub)
...          for yr in range(2023, 2027)},
...       **{f'subsidence_{yr}_q90': np.random.rand(N_sub)*(yr-2018)+10
...          + np.linspace(0, (yr-2022)*4, N_sub)
...          for yr in range(2023, 2027)},
...     })
>>> qlow_sub = [f'subsidence_{yr}_q10' for yr in range(2023, 2027)]
>>> qup_sub = [f'subsidence_{yr}_q90' for yr in range(2023, 2027)]
>>> q50_sub = [f'subsidence_{yr}_q50' for yr in range(2023, 2027)]
>>> ax_sub_ic = plot_interval_consistency(
...     df=zhongshan_pred_2023_2026,
...     qlow_cols=qlow_sub,
...     qup_cols=qup_sub,
...     q50_cols=q50_sub,
...     theta_col='latitude',    # Ignored for pos, triggers warning
...     acov='default',
...     title='Subsidence Uncertainty Consistency (2023–2026)',
...     use_cv=False,            # Use Std Dev for radius
...     cmap='coolwarm',
...     s=28,
...     alpha=0.8,
...     mask_angle=True
... )
>>> # plt.show() called internally
Parameters: