kdiagram.plot.uncertainty.plot_coverage_diagnostic¶
- kdiagram.plot.uncertainty.plot_coverage_diagnostic(df, actual_col, q_cols, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, show_grid=True, grid_props=None, cmap='RdYlGn', alpha=0.85, s=35, as_bars=False, coverage_line_color='r', buffer_pts=500, fill_gradient=True, gradient_size=300, gradient_cmap='Greens', gradient_levels=None, gradient_props=None, mask_angle=True, savefig=None, verbose=0)[source]¶
Diagnose prediction interval coverage using a polar plot.
This function generates a polar plot to visually assess whether actual observed values fall within their corresponding prediction intervals (defined by a lower and upper quantile). It helps diagnose the calibration of uncertainty estimates.
Angular Position (`theta`): Represents each data point or location, ordered by DataFrame index and mapped linearly onto the specified angular coverage (acov). theta_col is currently ignored.
Radial Position (`r`): Binary indicator of coverage. Points are plotted at radius 1 if the actual value is within the interval (\(Q_{lower} \le y_{actual} \le Q_{upper}\)), and at radius 0 otherwise.
Color (Points/Bars): Indicates coverage status using cmap (default ‘RdYlGn’), typically green for covered (1) and red for uncovered (0).
Reference Lines: Concentric dashed lines can be drawn at specified gradient_levels (e.g., 0.2, 0.4, …) for reference.
Average Coverage Line: A prominent solid line is drawn at a radius equal to the overall coverage rate (proportion of points covered), providing a benchmark against the expected coverage level (e.g., for a 90% interval [Q5-Q95], the line should ideally be near 0.9).
Background Gradient (Optional): A radial gradient fills the background from the center up to the average coverage rate, using gradient_cmap. This visually emphasizes the overall coverage level.
This plot is essential for evaluating if the model’s uncertainty quantification is reliable (i.e., if a 90% prediction interval truly covers about 90% of the actual outcomes).
- dfpd.DataFrame
Input DataFrame with actual values and quantile bounds. Decorators ensure it’s a valid, non-empty pandas DataFrame.
- actual_colstr
Name of the column containing the true observed (actual) values.
- q_colslist or tuple of str
Sequence of exactly two column names: [lower_quantile_col, upper_quantile_col] defining the prediction interval.
- theta_colstr, optional
Intended column for ordering points angularly. Note: Currently ignored; uses DataFrame index order. A warning is issued if provided. Default is
None.- acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’}, default=’default’
Specifies the angular coverage (span) of the plot:
'default'(360°),'half_circle'(180°),'quarter_circle'(90°),'eighth_circle'(45°).- figsizetuple of (float, float), default=(8.0, 8.0)
Width and height of the figure in inches.
- titlestr, optional
Custom plot title. If
None, a default title is used.- show_gridbool, default=True
If
True, display polar grid lines.- cmapstr, default=’RdYlGn’
Colormap for coloring the coverage points or bars. Should ideally be a diverging map where one end represents covered (1) and the other represents uncovered (0).
- alphafloat, default=0.85
Transparency level for the scatter points or bars.
- sint, default=35
Marker size for scatter points (used if as_bars is
False).- as_barsbool, default=False
If
True, plot coverage status as bars radiating from the center (height 0 or 1). IfFalse, plot as scatter points at radius 0 or 1.- coverage_line_colorstr, default=’r’
Color of the solid line indicating the average coverage rate.
- buffer_ptsint, default=500
Number of points used to draw smooth circular lines for average coverage and gradient levels.
- fill_gradientbool, default=True
If
True, fill the background with a radial gradient up to the average coverage rate using gradient_cmap.- gradient_sizeint, default=300
Resolution (number of steps) for the background gradient meshgrid.
- gradient_cmapstr, default=’Greens’
Colormap used for the optional background gradient fill.
- gradient_levelslist of float, optional
List of radial values (between 0 and 1) at which to draw dashed concentric reference lines. Defaults to
[0.2, 0.4, 0.6, 0.8, 1.0].- gradient_propsdict, optional
Dictionary of keyword arguments to customize the appearance of the dashed gradient_levels reference lines (e.g.,
{'linestyle': '--', 'color': 'blue'}). Defaults to gray dotted lines.- mask_anglebool, default=True
If
True, hide the angular tick labels (degrees).- savefigstr, optional
File path to save the plot image. If
None, displays interactively.- verboseint, default=0
Controls printing the calculated overall coverage rate. If > 0, the rate is printed.
- axmatplotlib.axes._axes.Axes
The Matplotlib Axes object (PolarAxesSubplot) containing the plot.
- TypeError
If q_cols does not contain exactly two elements.
- ValueError
If required columns are missing or data is non-numeric. If acov value is invalid.
plot_anomaly_magnitude : Visualize magnitude of interval failures. calibration_curve : (If available) Plot reliability diagrams for
probabilistic forecasts.
Coverage is defined as \(L_j \le y_j \le U_j\).
The radial axis is fixed between 0 and 1, representing the binary coverage outcome for individual points/bars.
The average coverage line provides a single summary statistic, which should be compared to the nominal coverage level of the interval (e.g., 80% for a Q10-Q90 interval, 90% for Q5-Q95).
The theta_col parameter is currently ignored for positioning.
NaN values in essential columns are dropped before analysis.
Let \(y_j\) (actual), \(L_j\) (lower bound), \(U_j\) (upper bound) for data point \(j\) (\(j=0, \dots, N-1\) after NaN removal).
Coverage Indicator (Radial Coordinate `r`): .. math:
r_j = egin{cases} 1 & ext{if } L_j \le y_j\ \le U_j \ 0 & ext{otherwise} \end{cases}Overall Coverage Rate: .. math:
ar{C} =
rac{1}{N} sum_{j=0}^{N-1} r_j
Angular Coordinate (`theta`): Let \(S\) be the angular span and :math:` heta_{min}` the start angle from acov. .. math:
heta_j = \left(
rac{j}{N} imes S ight) + heta_{min}
Plotting: - Plot points/bars at \((r_j, heta_j)\), colored based on
\(r_j\) using cmap.
Plot a solid line at constant radius \(ar{C}\).
Optionally, plot dashed lines at constant radii specified by gradient_levels.
Optionally, fill background up to radius \(ar{C}\) using gradient_cmap.
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.uncertainty import plot_coverage_diagnostic
1. Random Example (Well-calibrated 80% interval):
>>> np.random.seed(0) >>> N = 200 >>> df_cov_rand = pd.DataFrame({'id': range(N)}) >>> df_cov_rand['actual'] = np.random.normal(loc=10, scale=2, size=N) >>> # Simulate an ~80% interval (e.g., +/- 1.28 std devs for Normal) >>> std_dev_pred = 2.0 >>> df_cov_rand['q10_pred'] = 10 - 1.28 * std_dev_pred >>> df_cov_rand['q90_pred'] = 10 + 1.28 * std_dev_pred >>> # Add some noise to interval bounds >>> df_cov_rand['q10_pred'] += np.random.randn(N) * 0.2 >>> df_cov_rand['q90_pred'] += np.random.randn(N) * 0.2
>>> ax_cov_rand = plot_coverage_diagnostic( ... df=df_cov_rand, ... actual_col='actual', ... q_cols=['q10_pred', 'q90_pred'], # [lower, upper] ... theta_col='id', # Ignored for positioning ... acov='default', ... title='Coverage Diagnostic (Simulated 80% Interval)', ... as_bars=False, # Use scatter points ... coverage_line_color='blue', # Color for avg coverage line ... gradient_cmap='Blues', # Background gradient color ... verbose=1 # Print coverage rate ... ) >>> # Expected coverage rate near 80% >>> # plt.show() called internally
2. Concrete Example (Subsidence Data):
>>> # Assume small_sample_pred is a loaded DataFrame >>> # Create dummy data if it doesn't exist >>> try: ... small_sample_pred ... except NameError: ... print("Creating dummy small sample prediction data...") ... N_small = 200 ... small_sample_pred = pd.DataFrame({ ... 'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small), ... 'subsidence_2023_q10': np.random.rand(N_small)*10, ... 'subsidence_2023_q90': np.random.rand(N_small)*10 + 10, ... 'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01 ... }) >>> # Ensure Q90 > Q10 >>> small_sample_pred['subsidence_2023_q90'] = ( ... small_sample_pred['subsidence_2023_q10'] + ... np.abs(small_sample_pred['subsidence_2023_q90'] - ... small_sample_pred['subsidence_2023_q10']) + 0.1 ... )
>>> ax_cov_sub = plot_coverage_diagnostic( ... df=small_sample_pred, ... actual_col='subsidence_2023', ... q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'], ... theta_col=None, # Use index order ... acov='half_circle', # Use 180 degrees ... as_bars=True, # Use bars instead of scatter ... coverage_line_color='darkgreen', ... title='Coverage Evaluation for 2023 (Q10Q90)', ... mask_angle=False, # Show angle labels if meaningful ... fill_gradient=False, # Turn off background gradient ... gradient_levels=[0.5, 0.8, 0.9], # Custom reference lines ... verbose=1 ... ) >>> # plt.show() called internally
- Parameters:
df (DataFrame)
actual_col (str)
theta_col (str | None)
acov (str)
title (str | None)
show_grid (bool)
grid_props (dict | None)
cmap (str)
alpha (float)
s (int)
as_bars (bool)
coverage_line_color (str)
buffer_pts (int)
fill_gradient (bool)
gradient_size (int)
gradient_cmap (str)
mask_angle (bool)
savefig (str | None)
verbose (int)