kdiagram.plot.uncertainty.plot_coverage_diagnostic¶
- kdiagram.plot.uncertainty.plot_coverage_diagnostic(df, actual_col, q_cols, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, show_grid=True, grid_props=None, cmap='RdYlGn', alpha=0.85, s=35, as_bars=False, coverage_line_color='r', buffer_pts=500, fill_gradient=True, gradient_size=300, gradient_cmap='Greens', gradient_levels=None, gradient_props=None, mask_angle=True, savefig=None, dpi=300, verbose=0, ax=None)[source]¶
Diagnose prediction-interval coverage on a polar plot.
This visualization checks whether observed values fall within their predicted intervals and summarizes the empirical coverage rate against the nominal level. It is a compact diagnostic for calibration of quantile/interval forecasts; foundational background on forecast verification and calibration appears in [1][2]. See Gneiting et al.[1] for a discussion of calibration vs. sharpness.
The plot maps samples around a circle (angular coordinate) and encodes covered vs not covered on the radial axis. A solid reference ring marks the overall coverage rate, and optional concentric guides and background gradients aid interpretation.
- Parameters:
- df
pandas.DataFrame Input table containing the observed target and the two quantile bounds. Decorators ensure a valid, non-empty DataFrame.
- actual_col
str Column name of the observed (ground-truth) values.
- q_cols
listofstrortupleofstr Exactly two names
[lower_quantile_col, upper_quantile_col]that define the prediction interval. The order must be [lower, upper].- theta_col
str,optional Intended column for angular ordering. Currently ignored; the plot uses the DataFrame index order. A warning is issued if provided. Default is
None.- acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},
- default=’default’
Angular coverage (span) of the plot:
'default': full circle \(2\pi\)'half_circle': \(\pi\)'quarter_circle': \(\pi/2\)'eighth_circle': \(\pi/4\)
- figsize
tupleof(float,float), default=(8.0, 8.0) Figure size in inches.
- title
str,optional Custom title. If
None, a default title is used.- show_gridbool, default=True
If
True, show polar grid lines.- grid_props
dict,optional Keyword arguments passed to your grid helper for customizing the grid (e.g.,
{'linestyle': '--', 'alpha': 0.6}).- cmap
str, default=’RdYlGn’ Colormap for per-point coverage (0 or 1). Diverging maps work well (e.g., red for uncovered, green for covered).
- alpha
float, default=0.85 Transparency for scatter points or bars.
- s
int, default=35 Marker size for scatter points (ignored when
as_bars=True).- as_barsbool, default=False
If
True, draw radial bars (height 0/1) instead of points.- coverage_line_color
str, default=’r’ Color of the solid ring at the average coverage rate.
- buffer_pts
int, default=500 Number of samples used to draw smooth circular lines (average rate and guide rings).
- fill_gradientbool, default=True
If
True, fill the background radially up to the average coverage with a subtle gradient.- gradient_size
int, default=300 Resolution of the background gradient mesh.
- gradient_cmap
str, default=’Greens’ Colormap used for the optional background gradient.
- gradient_levels
listoffloat,optional Radii in \([0,1]\) for dashed concentric reference rings. Defaults to
[0.2, 0.4, 0.6, 0.8, 1.0].- gradient_props
dict,optional Style for the concentric guide rings (e.g.,
{'linestyle': ':', 'color': 'gray', 'linewidth': 0.8}).- mask_anglebool, default=True
If
True, hide angular tick labels.- savefig
str,optional File path to save the figure. If
None, the figure is shown.- verbose
int, default=0 If
> 0, print the computed overall coverage rate.
- df
- Returns:
- ax
matplotlib.axes.Axes The polar Axes containing the coverage diagnostic.
- ax
- Raises:
TypeErrorIf
q_colsdoes not contain exactly two names.ValueErrorIf required columns are missing or cannot be coerced to numeric, or if
acovis invalid.
- Parameters:
df (DataFrame)
actual_col (str)
theta_col (str | None)
acov (str)
title (str | None)
show_grid (bool)
grid_props (dict | None)
cmap (str)
alpha (float)
s (int)
as_bars (bool)
coverage_line_color (str)
buffer_pts (int)
fill_gradient (bool)
gradient_size (int)
gradient_cmap (str)
mask_angle (bool)
savefig (str | None)
dpi (int)
verbose (int)
ax (Axes | None)
See also
plot_anomaly_magnitudePolar diagnostic for the magnitude and type of interval failures (under/over).
plot_reliability_diagramCalibration (reliability) curves for probabilistic classifiers.
Notes
Coverage for row \(j\) is defined by the closed interval test \(L_j \le y_j \le U_j\). Rows with missing values in essential columns are removed prior to computation, so all symbols below refer to the filtered data. The radial axis is fixed to \([0,1]\) and encodes a binary outcome per sample, while a solid reference ring at radius \(\bar{C}\) summarizes the empirical coverage rate. Compare \(\bar{C}\) to the nominal level implied by your interval (e.g., \(0.8\) for Q10–Q90, \(0.9\) for Q5–Q95) to assess calibration. Angular positions follow index order over the chosen span;
theta_colis currently ignored for positioning.Let \(y_j\) denote the actual value, \(L_j\) the lower bound, and \(U_j\) the upper bound for sample \(j\), with \(j=0,\dots,N-1\) after NaN removal. The per-sample coverage indicator (radial coordinate \(r_j\)) is
(1)¶\[\begin{split}r_j \;=\; \begin{cases} 1, & L_j \le y_j \le U_j, \\\\ 0, & \text{otherwise.} \end{cases}\end{split}\]The overall coverage rate drawn as a ring is
(2)¶\[\bar{C} \;=\; \frac{1}{N} \sum_{j=0}^{N-1} r_j.\]Let \(S \in \{2\pi,\;\pi,\;\pi/2,\;\pi/4\}\) be the angular span set by
acovand let \(\theta_{\min}\) be the start angle. The angular coordinate for sample \(j\) is(3)¶\[\theta_j \;=\; \frac{j}{N}\,S \;+\; \theta_{\min}.\]Plotting semantics: each sample is placed at \((\theta_j, r_j)\) and colored via
cmapaccording to \(r_j\); a solid ring at \(\bar{C}\) is overlaid as a global summary; optional concentric guides at user-specified radii and an optional radial background gradient up to \(\bar{C}\) provide additional visual context. Rows with NaNs in essential columns are dropped before computation.theta_colis currently ignored (index order is used).References
Examples
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.uncertainty import plot_coverage_diagnostic
1. Random Example (Well-calibrated 80% interval):
>>> np.random.seed(0) >>> N = 200 >>> df_cov_rand = pd.DataFrame({'id': range(N)}) >>> df_cov_rand['actual'] = np.random.normal(loc=10, scale=2, size=N) >>> # Simulate an ~80% interval (e.g., +/- 1.28 std devs for Normal) >>> std_dev_pred = 2.0 >>> df_cov_rand['q10_pred'] = 10 - 1.28 * std_dev_pred >>> df_cov_rand['q90_pred'] = 10 + 1.28 * std_dev_pred >>> # Add some noise to interval bounds >>> df_cov_rand['q10_pred'] += np.random.randn(N) * 0.2 >>> df_cov_rand['q90_pred'] += np.random.randn(N) * 0.2 >>> ax_cov_rand = plot_coverage_diagnostic( ... df=df_cov_rand, ... actual_col='actual', ... q_cols=['q10_pred', 'q90_pred'], # [lower, upper] ... theta_col='id', # Ignored for positioning ... acov='default', ... title='Coverage Diagnostic (Simulated 80% Interval)', ... as_bars=False, # Use scatter points ... coverage_line_color='blue', # Color for avg coverage line ... gradient_cmap='Blues', # Background gradient color ... verbose=1 # Print coverage rate ... ) >>> # Expected coverage rate near 80% >>> # plt.show() called internally
2. Concrete Example (Subsidence Data):
>>> # Assume small_sample_pred is a loaded DataFrame >>> # Create dummy data if it doesn't exist >>> try: ... small_sample_pred ... except NameError: ... print("Creating dummy small sample prediction data...") ... N_small = 200 ... small_sample_pred = pd.DataFrame({ ... 'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small), ... 'subsidence_2023_q10': np.random.rand(N_small)*10, ... 'subsidence_2023_q90': np.random.rand(N_small)*10 + 10, ... 'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01 ... }) >>> # Ensure Q90 > Q10 >>> small_sample_pred['subsidence_2023_q90'] = ( ... small_sample_pred['subsidence_2023_q10'] + ... np.abs(small_sample_pred['subsidence_2023_q90'] - ... small_sample_pred['subsidence_2023_q10']) + 0.1 ... ) >>> ax_cov_sub = plot_coverage_diagnostic( ... df=small_sample_pred, ... actual_col='subsidence_2023', ... q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'], ... theta_col=None, # Use index order ... acov='half_circle', # Use 180 degrees ... as_bars=True, # Use bars instead of scatter ... coverage_line_color='darkgreen', ... title='Coverage Evaluation for 2023 (Q10Q90)', ... mask_angle=False, # Show angle labels if meaningful ... fill_gradient=False, # Turn off background gradient ... gradient_levels=[0.5, 0.8, 0.9], # Custom reference lines ... verbose=1 ... ) >>>