kdiagram.plot.uncertainty.plot_coverage_diagnostic

kdiagram.plot.uncertainty.plot_coverage_diagnostic(df, actual_col, q_cols, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, show_grid=True, grid_props=None, cmap='RdYlGn', alpha=0.85, s=35, as_bars=False, coverage_line_color='r', buffer_pts=500, fill_gradient=True, gradient_size=300, gradient_cmap='Greens', gradient_levels=None, gradient_props=None, mask_angle=True, savefig=None, dpi=300, verbose=0, ax=None)[source]

Diagnose prediction-interval coverage on a polar plot.

This visualization checks whether observed values fall within their predicted intervals and summarizes the empirical coverage rate against the nominal level. It is a compact diagnostic for calibration of quantile/interval forecasts; foundational background on forecast verification and calibration appears in [1][2]. See Gneiting et al.[1] for a discussion of calibration vs. sharpness.

The plot maps samples around a circle (angular coordinate) and encodes covered vs not covered on the radial axis. A solid reference ring marks the overall coverage rate, and optional concentric guides and background gradients aid interpretation.

Parameters:
dfpandas.DataFrame

Input table containing the observed target and the two quantile bounds. Decorators ensure a valid, non-empty DataFrame.

actual_colstr

Column name of the observed (ground-truth) values.

q_colslist of str or tuple of str

Exactly two names [lower_quantile_col, upper_quantile_col] that define the prediction interval. The order must be [lower, upper].

theta_colstr, optional

Intended column for angular ordering. Currently ignored; the plot uses the DataFrame index order. A warning is issued if provided. Default is None.

acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},
default=’default’

Angular coverage (span) of the plot:

  • 'default': full circle \(2\pi\)

  • 'half_circle': \(\pi\)

  • 'quarter_circle': \(\pi/2\)

  • 'eighth_circle': \(\pi/4\)

figsizetuple of (float, float), default=(8.0, 8.0)

Figure size in inches.

titlestr, optional

Custom title. If None, a default title is used.

show_gridbool, default=True

If True, show polar grid lines.

grid_propsdict, optional

Keyword arguments passed to your grid helper for customizing the grid (e.g., {'linestyle': '--', 'alpha': 0.6}).

cmapstr, default=’RdYlGn’

Colormap for per-point coverage (0 or 1). Diverging maps work well (e.g., red for uncovered, green for covered).

alphafloat, default=0.85

Transparency for scatter points or bars.

sint, default=35

Marker size for scatter points (ignored when as_bars=True).

as_barsbool, default=False

If True, draw radial bars (height 0/1) instead of points.

coverage_line_colorstr, default=’r’

Color of the solid ring at the average coverage rate.

buffer_ptsint, default=500

Number of samples used to draw smooth circular lines (average rate and guide rings).

fill_gradientbool, default=True

If True, fill the background radially up to the average coverage with a subtle gradient.

gradient_sizeint, default=300

Resolution of the background gradient mesh.

gradient_cmapstr, default=’Greens’

Colormap used for the optional background gradient.

gradient_levelslist of float, optional

Radii in \([0,1]\) for dashed concentric reference rings. Defaults to [0.2, 0.4, 0.6, 0.8, 1.0].

gradient_propsdict, optional

Style for the concentric guide rings (e.g., {'linestyle': ':', 'color': 'gray', 'linewidth': 0.8}).

mask_anglebool, default=True

If True, hide angular tick labels.

savefigstr, optional

File path to save the figure. If None, the figure is shown.

verboseint, default=0

If > 0, print the computed overall coverage rate.

Returns:
axmatplotlib.axes.Axes

The polar Axes containing the coverage diagnostic.

Raises:
TypeError

If q_cols does not contain exactly two names.

ValueError

If required columns are missing or cannot be coerced to numeric, or if acov is invalid.

Parameters:

See also

plot_anomaly_magnitude

Polar diagnostic for the magnitude and type of interval failures (under/over).

plot_reliability_diagram

Calibration (reliability) curves for probabilistic classifiers.

Notes

Coverage for row \(j\) is defined by the closed interval test \(L_j \le y_j \le U_j\). Rows with missing values in essential columns are removed prior to computation, so all symbols below refer to the filtered data. The radial axis is fixed to \([0,1]\) and encodes a binary outcome per sample, while a solid reference ring at radius \(\bar{C}\) summarizes the empirical coverage rate. Compare \(\bar{C}\) to the nominal level implied by your interval (e.g., \(0.8\) for Q10–Q90, \(0.9\) for Q5–Q95) to assess calibration. Angular positions follow index order over the chosen span; theta_col is currently ignored for positioning.

Let \(y_j\) denote the actual value, \(L_j\) the lower bound, and \(U_j\) the upper bound for sample \(j\), with \(j=0,\dots,N-1\) after NaN removal. The per-sample coverage indicator (radial coordinate \(r_j\)) is

(1)\[\begin{split}r_j \;=\; \begin{cases} 1, & L_j \le y_j \le U_j, \\\\ 0, & \text{otherwise.} \end{cases}\end{split}\]

The overall coverage rate drawn as a ring is

(2)\[\bar{C} \;=\; \frac{1}{N} \sum_{j=0}^{N-1} r_j.\]

Let \(S \in \{2\pi,\;\pi,\;\pi/2,\;\pi/4\}\) be the angular span set by acov and let \(\theta_{\min}\) be the start angle. The angular coordinate for sample \(j\) is

(3)\[\theta_j \;=\; \frac{j}{N}\,S \;+\; \theta_{\min}.\]

Plotting semantics: each sample is placed at \((\theta_j, r_j)\) and colored via cmap according to \(r_j\); a solid ring at \(\bar{C}\) is overlaid as a global summary; optional concentric guides at user-specified radii and an optional radial background gradient up to \(\bar{C}\) provide additional visual context. Rows with NaNs in essential columns are dropped before computation. theta_col is currently ignored (index order is used).

References

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_coverage_diagnostic

1. Random Example (Well-calibrated 80% interval):

>>> np.random.seed(0)
>>> N = 200
>>> df_cov_rand = pd.DataFrame({'id': range(N)})
>>> df_cov_rand['actual'] = np.random.normal(loc=10, scale=2, size=N)
>>> # Simulate an ~80% interval (e.g., +/- 1.28 std devs for Normal)
>>> std_dev_pred = 2.0
>>> df_cov_rand['q10_pred'] = 10 - 1.28 * std_dev_pred
>>> df_cov_rand['q90_pred'] = 10 + 1.28 * std_dev_pred
>>> # Add some noise to interval bounds
>>> df_cov_rand['q10_pred'] += np.random.randn(N) * 0.2
>>> df_cov_rand['q90_pred'] += np.random.randn(N) * 0.2
>>> ax_cov_rand = plot_coverage_diagnostic(
...     df=df_cov_rand,
...     actual_col='actual',
...     q_cols=['q10_pred', 'q90_pred'], # [lower, upper]
...     theta_col='id',           # Ignored for positioning
...     acov='default',
...     title='Coverage Diagnostic (Simulated 80% Interval)',
...     as_bars=False,           # Use scatter points
...     coverage_line_color='blue', # Color for avg coverage line
...     gradient_cmap='Blues',    # Background gradient color
...     verbose=1                # Print coverage rate
... )
>>> # Expected coverage rate near 80%
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data):

>>> # Assume small_sample_pred is a loaded DataFrame
>>> # Create dummy data if it doesn't exist
>>> try:
...    small_sample_pred
... except NameError:
...    print("Creating dummy small sample prediction data...")
...    N_small = 200
...    small_sample_pred = pd.DataFrame({
...        'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small),
...        'subsidence_2023_q10': np.random.rand(N_small)*10,
...        'subsidence_2023_q90': np.random.rand(N_small)*10 + 10,
...        'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01
...     })
>>> # Ensure Q90 > Q10
>>> small_sample_pred['subsidence_2023_q90'] = (
...     small_sample_pred['subsidence_2023_q10'] +
...     np.abs(small_sample_pred['subsidence_2023_q90'] -
...            small_sample_pred['subsidence_2023_q10']) + 0.1
...     )
>>> ax_cov_sub = plot_coverage_diagnostic(
...     df=small_sample_pred,
...     actual_col='subsidence_2023',
...     q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'],
...     theta_col=None,            # Use index order
...     acov='half_circle',      # Use 180 degrees
...     as_bars=True,            # Use bars instead of scatter
...     coverage_line_color='darkgreen',
...     title='Coverage Evaluation for 2023 (Q10–Q90)',
...     mask_angle=False,         # Show angle labels if meaningful
...     fill_gradient=False,     # Turn off background gradient
...     gradient_levels=[0.5, 0.8, 0.9], # Custom reference lines
...     verbose=1
... )
>>>