kdiagram.plot.uncertainty.plot_coverage_diagnostic

kdiagram.plot.uncertainty.plot_coverage_diagnostic(df, actual_col, q_cols, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, show_grid=True, grid_props=None, cmap='RdYlGn', alpha=0.85, s=35, as_bars=False, coverage_line_color='r', buffer_pts=500, fill_gradient=True, gradient_size=300, gradient_cmap='Greens', gradient_levels=None, gradient_props=None, mask_angle=True, savefig=None, verbose=0)[source]

Diagnose prediction interval coverage using a polar plot.

This function generates a polar plot to visually assess whether actual observed values fall within their corresponding prediction intervals (defined by a lower and upper quantile). It helps diagnose the calibration of uncertainty estimates.

  • Angular Position (`theta`): Represents each data point or location, ordered by DataFrame index and mapped linearly onto the specified angular coverage (acov). theta_col is currently ignored.

  • Radial Position (`r`): Binary indicator of coverage. Points are plotted at radius 1 if the actual value is within the interval (\(Q_{lower} \le y_{actual} \le Q_{upper}\)), and at radius 0 otherwise.

  • Color (Points/Bars): Indicates coverage status using cmap (default ‘RdYlGn’), typically green for covered (1) and red for uncovered (0).

  • Reference Lines: Concentric dashed lines can be drawn at specified gradient_levels (e.g., 0.2, 0.4, …) for reference.

  • Average Coverage Line: A prominent solid line is drawn at a radius equal to the overall coverage rate (proportion of points covered), providing a benchmark against the expected coverage level (e.g., for a 90% interval [Q5-Q95], the line should ideally be near 0.9).

  • Background Gradient (Optional): A radial gradient fills the background from the center up to the average coverage rate, using gradient_cmap. This visually emphasizes the overall coverage level.

This plot is essential for evaluating if the model’s uncertainty quantification is reliable (i.e., if a 90% prediction interval truly covers about 90% of the actual outcomes).

dfpd.DataFrame

Input DataFrame with actual values and quantile bounds. Decorators ensure it’s a valid, non-empty pandas DataFrame.

actual_colstr

Name of the column containing the true observed (actual) values.

q_colslist or tuple of str

Sequence of exactly two column names: [lower_quantile_col, upper_quantile_col] defining the prediction interval.

theta_colstr, optional

Intended column for ordering points angularly. Note: Currently ignored; uses DataFrame index order. A warning is issued if provided. Default is None.

acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’}, default=’default’

Specifies the angular coverage (span) of the plot: 'default' (360°), 'half_circle' (180°), 'quarter_circle' (90°), 'eighth_circle' (45°).

figsizetuple of (float, float), default=(8.0, 8.0)

Width and height of the figure in inches.

titlestr, optional

Custom plot title. If None, a default title is used.

show_gridbool, default=True

If True, display polar grid lines.

cmapstr, default=’RdYlGn’

Colormap for coloring the coverage points or bars. Should ideally be a diverging map where one end represents covered (1) and the other represents uncovered (0).

alphafloat, default=0.85

Transparency level for the scatter points or bars.

sint, default=35

Marker size for scatter points (used if as_bars is False).

as_barsbool, default=False

If True, plot coverage status as bars radiating from the center (height 0 or 1). If False, plot as scatter points at radius 0 or 1.

coverage_line_colorstr, default=’r’

Color of the solid line indicating the average coverage rate.

buffer_ptsint, default=500

Number of points used to draw smooth circular lines for average coverage and gradient levels.

fill_gradientbool, default=True

If True, fill the background with a radial gradient up to the average coverage rate using gradient_cmap.

gradient_sizeint, default=300

Resolution (number of steps) for the background gradient meshgrid.

gradient_cmapstr, default=’Greens’

Colormap used for the optional background gradient fill.

gradient_levelslist of float, optional

List of radial values (between 0 and 1) at which to draw dashed concentric reference lines. Defaults to [0.2, 0.4, 0.6, 0.8, 1.0].

gradient_propsdict, optional

Dictionary of keyword arguments to customize the appearance of the dashed gradient_levels reference lines (e.g., {'linestyle': '--', 'color': 'blue'}). Defaults to gray dotted lines.

mask_anglebool, default=True

If True, hide the angular tick labels (degrees).

savefigstr, optional

File path to save the plot image. If None, displays interactively.

verboseint, default=0

Controls printing the calculated overall coverage rate. If > 0, the rate is printed.

axmatplotlib.axes._axes.Axes

The Matplotlib Axes object (PolarAxesSubplot) containing the plot.

TypeError

If q_cols does not contain exactly two elements.

ValueError

If required columns are missing or data is non-numeric. If acov value is invalid.

plot_anomaly_magnitude : Visualize magnitude of interval failures. calibration_curve : (If available) Plot reliability diagrams for

probabilistic forecasts.

  • Coverage is defined as \(L_j \le y_j \le U_j\).

  • The radial axis is fixed between 0 and 1, representing the binary coverage outcome for individual points/bars.

  • The average coverage line provides a single summary statistic, which should be compared to the nominal coverage level of the interval (e.g., 80% for a Q10-Q90 interval, 90% for Q5-Q95).

  • The theta_col parameter is currently ignored for positioning.

  • NaN values in essential columns are dropped before analysis.

Let \(y_j\) (actual), \(L_j\) (lower bound), \(U_j\) (upper bound) for data point \(j\) (\(j=0, \dots, N-1\) after NaN removal).

  1. Coverage Indicator (Radial Coordinate `r`): .. math:

    r_j = egin{cases} 1 &       ext{if } L_j \le y_j\
        \le U_j \ 0 &    ext{otherwise} \end{cases}
    
  2. Overall Coverage Rate: .. math:

    ar{C} =
    

rac{1}{N} sum_{j=0}^{N-1} r_j

  1. Angular Coordinate (`theta`): Let \(S\) be the angular span and :math:` heta_{min}` the start angle from acov. .. math:

    heta_j = \left(
    

rac{j}{N} imes S ight) + heta_{min}

  1. Plotting: - Plot points/bars at \((r_j, heta_j)\), colored based on

    \(r_j\) using cmap.

    • Plot a solid line at constant radius \(ar{C}\).

    • Optionally, plot dashed lines at constant radii specified by gradient_levels.

    • Optionally, fill background up to radius \(ar{C}\) using gradient_cmap.

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_coverage_diagnostic

1. Random Example (Well-calibrated 80% interval):

>>> np.random.seed(0)
>>> N = 200
>>> df_cov_rand = pd.DataFrame({'id': range(N)})
>>> df_cov_rand['actual'] = np.random.normal(loc=10, scale=2, size=N)
>>> # Simulate an ~80% interval (e.g., +/- 1.28 std devs for Normal)
>>> std_dev_pred = 2.0
>>> df_cov_rand['q10_pred'] = 10 - 1.28 * std_dev_pred
>>> df_cov_rand['q90_pred'] = 10 + 1.28 * std_dev_pred
>>> # Add some noise to interval bounds
>>> df_cov_rand['q10_pred'] += np.random.randn(N) * 0.2
>>> df_cov_rand['q90_pred'] += np.random.randn(N) * 0.2
>>> ax_cov_rand = plot_coverage_diagnostic(
...     df=df_cov_rand,
...     actual_col='actual',
...     q_cols=['q10_pred', 'q90_pred'], # [lower, upper]
...     theta_col='id',           # Ignored for positioning
...     acov='default',
...     title='Coverage Diagnostic (Simulated 80% Interval)',
...     as_bars=False,           # Use scatter points
...     coverage_line_color='blue', # Color for avg coverage line
...     gradient_cmap='Blues',    # Background gradient color
...     verbose=1                # Print coverage rate
... )
>>> # Expected coverage rate near 80%
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data):

>>> # Assume small_sample_pred is a loaded DataFrame
>>> # Create dummy data if it doesn't exist
>>> try:
...    small_sample_pred
... except NameError:
...    print("Creating dummy small sample prediction data...")
...    N_small = 200
...    small_sample_pred = pd.DataFrame({
...        'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small),
...        'subsidence_2023_q10': np.random.rand(N_small)*10,
...        'subsidence_2023_q90': np.random.rand(N_small)*10 + 10,
...        'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01
...     })
>>> # Ensure Q90 > Q10
>>> small_sample_pred['subsidence_2023_q90'] = (
...     small_sample_pred['subsidence_2023_q10'] +
...     np.abs(small_sample_pred['subsidence_2023_q90'] -
...            small_sample_pred['subsidence_2023_q10']) + 0.1
...     )
>>> ax_cov_sub = plot_coverage_diagnostic(
...     df=small_sample_pred,
...     actual_col='subsidence_2023',
...     q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'],
...     theta_col=None,            # Use index order
...     acov='half_circle',      # Use 180 degrees
...     as_bars=True,            # Use bars instead of scatter
...     coverage_line_color='darkgreen',
...     title='Coverage Evaluation for 2023 (Q10–Q90)',
...     mask_angle=False,         # Show angle labels if meaningful
...     fill_gradient=False,     # Turn off background gradient
...     gradient_levels=[0.5, 0.8, 0.9], # Custom reference lines
...     verbose=1
... )
>>> # plt.show() called internally
Parameters: