kdiagram.plot.uncertainty.plot_anomaly_magnitude¶
- kdiagram.plot.uncertainty.plot_anomaly_magnitude(df, actual_col, q_cols, theta_col=None, acov='default', title='Anomaly Magnitude Polar Plot', figsize=(8.0, 8.0), cmap_under='Blues', cmap_over='Reds', s=30, alpha=0.8, show_grid=True, verbose=1, cbar=False, dpi=300, savefig=None, mask_angle=False, ax=None)[source]¶
Visualize magnitude and type of prediction-interval anomalies.
This function draws a polar scatter plot to highlight prediction-interval failures—cases where the observed value lies outside a user-specified interval (e.g., \([Q_{0.10}, Q_{0.90}]\)). It encodes the anomaly’s order/location (angle), magnitude (radius), and type (color). See Kouadio[1]; background on coverage and calibration in [2][3].
Angular position (:math:`theta`): maps each row to an angle over the chosen span (
acov). Iftheta_colis provided, rows are ordered by that column before angles are assigned; otherwise the original row order is used. Angles are spaced linearly across the selected coverage.Radial distance (:math:`r`): the anomaly magnitude for interval violations, computed as \(r=\lvert y - B\rvert\), where \(B\in\{L,U\}\) is the violated bound (lower \(L\) or upper \(U\)). Points satisfying \(L\le y\le U\) are omitted.
Color: indicates anomaly type and scales with magnitude on a shared normalization.
cmap_undercolors under-predictions (\(y<L\));cmap_overcolors over-predictions (\(y>U\)).
This diagnostic helps to (i) localize clusters of failures along the ordering induced by
theta_col(or row index), (ii) assess how severe misses are via larger radii, and (iii) support calibration checks and targeted model refinement, alongside coverage and reliability analyses, see [2][3].- Parameters:
- df
pandas.DataFrame Input table containing the actual values and the two quantile (interval) columns. Must be non-empty after NaN removal in the required columns (see below).
- actual_col
str Name of the column with the observed/ground-truth values used to check interval coverage.
- q_cols
list[str]ortuple[str,str] Two-element sequence with the lower and upper quantile column names, in that order:
[q_low, q_up]. Each referenced column must exist indfand be numeric. Semantically, rows are expected to satisfy \(q_\text{low} \le q_\text{up}\).- theta_col
str,optional Column used only to order points angularly before mapping them into the selected coverage span. Useful for spatial or categorical ordering (e.g.,
'latitude','station_id'). If omitted or non-numeric after NaN filtering, the original row order is used.- acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},
default=’default’ Angular coverage of the plot:
'default': full circle, \(2\pi\)'half_circle': \(\pi\)'quarter_circle': \(\pi/2\)'eighth_circle': \(\pi/4\)
(Value is case-insensitive; invalid values fall back to full circle with a warning.)
- title
str, default=’AnomalyMagnitudePolarPlot’ Figure title.
- figsize
tuple[float,float], default=(8.0, 8.0) Figure size in inches; each dimension must be positive.
- cmap_under
str, default=’Blues’ Matplotlib colormap name used for under-predictions (\(y<q_\text{low}\)). If invalid, a warning is issued and a default is used.
- cmap_over
str, default=’Reds’ Matplotlib colormap name used for over-predictions (\(y>q_\text{up}\)). If invalid, a warning is issued and a default is used.
- s
int, default=30 Marker size for anomaly points (points with interval failure); must be positive.
- alpha
float, default=0.8 Point transparency in \([0,1]\).
- show_gridbool, default=True
Whether to draw polar grid lines.
- verbose
int, default=1 Verbosity level. If \(>0\), prints a short anomaly summary (counts of under-/over-predictions).
- cbarbool, default=False
If
True, draw a colorbar for the shared magnitude normalization. The bar uses thecmap_overcolormap for display, but its scale matches both under/over magnitudes.- savefig
str,optional Path to save the figure (e.g.,
'anomaly_plot.png'). If omitted, the figure is shown interactively. Errors during saving are reported via a printed message.- mask_anglebool, default=False
If
True, hide angular tick labels (useful when the angle order is arbitrary or index-based).
- df
- Returns:
- ax
matplotlib.axes.AxesorNone The polar
Axescontaining the scatter plot. ReturnsNoneif the DataFrame becomes empty after dropping NaNs in the required columns and no plot can be produced. If no anomalies are found, an empty polar frame is returned with a notice text, notNone.
- ax
- Raises:
ValueErrorq_colsis not a 2-item sequence.Any of
actual_col, the twoq_cols, or the providedtheta_col(when used) is missing fromdf.
TypeErrorRequired columns exist but cannot be coerced to numeric dtype after NaN handling.
- Parameters:
See also
kdiagram.plot.uncertainty.plot_coverageAggregate empirical coverage comparison.
kdiagram.plot.uncertainty.plot_coverage_diagnosticPoint-wise coverage successes/failures.
kdiagram.plot.uncertainty.plot_interval_widthMagnitude of prediction-interval widths.
kdiagram.plot.uncertainty.plot_interval_consistencyStability of interval widths over time/steps.
validate_qcolsHelper for validating the lower/upper quantile columns.
Notes
Anomalies are detected by comparing
actual_colto the bounds inq_colsfor each row.Rows with NaNs in any required column (
actual_col, the two quantile columns, andtheta_colif provided) are dropped before analysis.The anomaly magnitude \(r\) is non-negative and measures the distance to the exceeded bound.
theta_colcontrols ordering only, not spacing. Angles are always linearly spaced within the selected coverage (acov) and then assigned according to the sort order oftheta_col.Color intensity for both under- and over-predictions reflects the shared normalization of magnitudes, based on the maximum observed \(r\).
The helper
validate_qcols(from your utilities, e.g. gofast) performs initial structure checks onq_cols.
Let \(y_j\) be the actual value, \(L_j\) the lower bound, and \(U_j\) the upper bound for location \(j\).
Anomaly masks
Under-prediction: \(M_{\text{under},j} = (y_j < L_j)\)
Over-prediction: \(M_{\text{over},j} = (y_j > U_j)\)
Anomaly magnitude (radial coordinate :math:`r`)
(1)¶\[\begin{split}r_j = \begin{cases} L_j - y_j, & \text{if } y_j < L_j,\\ y_j - U_j, & \text{if } y_j > U_j,\\ 0, & \text{otherwise.} \end{cases}\end{split}\]Only points with \(r_j > 0\) are plotted.
Angular coordinate (:math:`theta`)
For \(N\) retained rows, define base angles
(2)¶\[\theta'_k \;=\; \frac{k}{N}\,S, \qquad k=0,\ldots,N-1,\]where \(S \in \{2\pi, \pi, \pi/2, \pi/4\}\) is the span implied by
acov.If
theta_colexists, sort by that column to obtain a permutation \(\pi\). The plotted angle for original row \(j\) is \(\theta_{\pi^{-1}(j)}=\theta'_{\pi^{-1}(j)}\). Iftheta_colis absent, use the original row order: \(\theta_j=\theta'_j\).
Color mapping
Normalize magnitudes
(3)¶\[\text{norm}(r_j) \;=\; \frac{r_j}{\max(r_1,\ldots,r_N) + \varepsilon},\]with a small \(\varepsilon>0\) for numerical safety.
Apply colormaps:
cmap_under(norm(r_j))for under-predictions andcmap_over(norm(r_j))for over-predictions.
References
Examples
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.uncertainty import plot_anomaly_magnitude
1. Random Example:
>>> np.random.seed(42) >>> N_points = 150 >>> df_anomaly_rand = pd.DataFrame({ ... 'id': range(N_points), ... 'actual': np.random.randn(N_points) * 5 + 10, ... 'pred_q10': np.random.randn(N_points) * 1 + 7, # Interval around 10 ... 'pred_q90': np.random.randn(N_points) * 1 + 13, ... 'feature_order': np.random.rand(N_points) * 100 # For ordering ... }) >>> # Introduce some anomalies >>> df_anomaly_rand.loc[5:15, 'actual'] = 0 # Under-predictions >>> df_anomaly_rand.loc[100:110, 'actual'] = 25 # Over-predictions >>> >>> ax_rand_anomaly = plot_anomaly_magnitude( ... df=df_anomaly_rand, ... actual_col='actual', ... q_cols=['pred_q10', 'pred_q90'], ... theta_col='feature_order', # Order by this feature ... acov='default', ... title='Random Anomaly Distribution', ... cmap_under='GnBu', ... cmap_over='OrRd', ... s=40, ... cbar=True, ... verbose=1 ... ) >>> # Output will show anomaly counts... >>> # plt.show() called internally
2. Concrete Example (Subsidence Data - adapted from docstring):
>>> # Assume small_sample_pred is a loaded DataFrame like: >>> # Create dummy data if it doesn't exist >>> try: ... small_sample_pred ... except NameError: ... print("Creating dummy small sample prediction data...") ... N_small = 200 ... small_sample_pred = pd.DataFrame({ ... 'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small), ... 'subsidence_2023_q10': np.random.rand(N_small)*10, ... 'subsidence_2023_q90': np.random.rand(N_small)*10 + 10, ... 'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01 ... }) ... # Ensure some anomalies exist in dummy data ... anom_indices_under = np.random.choice(N_small, 15, replace=False) ... anom_indices_over = np.random.choice( ... list(set(range(N_small)) - set(anom_indices_under)), 20, replace=False ... ) ... small_sample_pred.loc[anom_indices_under, 'subsidence_2023'] = ( ... small_sample_pred.loc[anom_indices_under, 'subsidence_2023_q10'] ... - np.random.rand(15)*5 - 1 ... ) ... small_sample_pred.loc[anom_indices_over, 'subsidence_2023'] = ( ... small_sample_pred.loc[anom_indices_over, 'subsidence_2023_q90'] ... + np.random.rand(20)*5 + 1 ... )
>>> ax_sub_anomaly = plot_anomaly_magnitude( ... df=small_sample_pred, ... actual_col='subsidence_2023', ... q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'], ... theta_col='latitude', # Order points by latitude ... acov='quarter_circle', # Use only 90 degrees ... title='Anomaly Magnitude (2023) Zhongshan', ... figsize=(9, 9), ... s=35, ... cbar=True, # Show colorbar ... mask_angle=True, # Hide angle labels ... verbose=1 # Print anomaly counts ... ) >>> # Output will show anomaly counts... >>> # plt.show() called internally