kdiagram.plot.uncertainty.plot_anomaly_magnitude¶

kdiagram.plot.uncertainty.plot_anomaly_magnitude(df, actual_col, q_cols, theta_col=None, acov='default', title='Anomaly Magnitude Polar Plot', figsize=(8.0, 8.0), cmap_under='Blues', cmap_over='Reds', s=30, alpha=0.8, show_grid=True, verbose=1, cbar=False, dpi=300, savefig=None, mask_angle=False, ax=None)[source]¶

Visualize magnitude and type of prediction-interval anomalies.

This function draws a polar scatter plot to highlight prediction-interval failures—cases where the observed value lies outside a user-specified interval (e.g., \([Q_{0.10}, Q_{0.90}]\)). It encodes the anomaly’s order/location (angle), magnitude (radius), and type (color). See Kouadio[1]; background on coverage and calibration in [2][3].

Angular position (:math:`theta`): maps each row to an angle over the chosen span (acov). If theta_col is provided, rows are ordered by that column before angles are assigned; otherwise the original row order is used. Angles are spaced linearly across the selected coverage.
Radial distance (:math:`r`): the anomaly magnitude for interval violations, computed as \(r=\lvert y - B\rvert\), where \(B\in\{L,U\}\) is the violated bound (lower \(L\) or upper \(U\)). Points satisfying \(L\le y\le U\) are omitted.
Color: indicates anomaly type and scales with magnitude on a shared normalization. cmap_under colors under-predictions (\(y<L\)); cmap_over colors over-predictions (\(y>U\)).

This diagnostic helps to (i) localize clusters of failures along the ordering induced by theta_col (or row index), (ii) assess how severe misses are via larger radii, and (iii) support calibration checks and targeted model refinement, alongside coverage and reliability analyses, see [2][3].

Parameters:

dfpandas.DataFrame

Input table containing the actual values and the two quantile (interval) columns. Must be non-empty after NaN removal in the required columns (see below).

actual_colstr

Name of the column with the observed/ground-truth values used to check interval coverage.

q_colslist[str] or tuple[str, str]

Two-element sequence with the lower and upper quantile column names, in that order: [q_low, q_up]. Each referenced column must exist in df and be numeric. Semantically, rows are expected to satisfy \(q_\text{low} \le q_\text{up}\).

theta_colstr, optional

Column used only to order points angularly before mapping them into the selected coverage span. Useful for spatial or categorical ordering (e.g., 'latitude', 'station_id'). If omitted or non-numeric after NaN filtering, the original row order is used.

acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},

default=’default’ Angular coverage of the plot:

'default': full circle, \(2\pi\)
'half_circle': \(\pi\)
'quarter_circle': \(\pi/2\)
'eighth_circle': \(\pi/4\)

(Value is case-insensitive; invalid values fall back to full circle with a warning.)

titlestr, default=’Anomaly Magnitude Polar Plot’

Figure title.

figsizetuple[float, float], default=(8.0, 8.0)

Figure size in inches; each dimension must be positive.

cmap_understr, default=’Blues’

Matplotlib colormap name used for under-predictions (\(y<q_\text{low}\)). If invalid, a warning is issued and a default is used.

cmap_overstr, default=’Reds’

Matplotlib colormap name used for over-predictions (\(y>q_\text{up}\)). If invalid, a warning is issued and a default is used.

sint, default=30

Marker size for anomaly points (points with interval failure); must be positive.

alphafloat, default=0.8

Point transparency in \([0,1]\).

show_gridbool, default=True

Whether to draw polar grid lines.

verboseint, default=1

Verbosity level. If \(>0\), prints a short anomaly summary (counts of under-/over-predictions).

cbarbool, default=False

If True, draw a colorbar for the shared magnitude normalization. The bar uses the cmap_over colormap for display, but its scale matches both under/over magnitudes.

savefigstr, optional

Path to save the figure (e.g., 'anomaly_plot.png'). If omitted, the figure is shown interactively. Errors during saving are reported via a printed message.

mask_anglebool, default=False

If True, hide angular tick labels (useful when the angle order is arbitrary or index-based).

Returns:

axmatplotlib.axes.Axes or None: The polar Axes containing the scatter plot. Returns None if the DataFrame becomes empty after dropping NaNs in the required columns and no plot can be produced. If no anomalies are found, an empty polar frame is returned with a notice text, not None.

Raises:

ValueError

q_cols is not a 2-item sequence.
Any of actual_col, the two q_cols, or the provided theta_col (when used) is missing from df.

TypeError

Required columns exist but cannot be coerced to numeric dtype after NaN handling.

Parameters:

df (DataFrame)
actual_col (str)
q_cols (list[str] | tuple[str, str])
theta_col (str | None)
acov (str)
title (str)
figsize (tuple[float, float])
cmap_under (str)
cmap_over (str)
s (int)
alpha (float)
show_grid (bool)
verbose (int)
cbar (bool)
dpi (int)
savefig (str | None)
mask_angle (bool)
ax (Axes | None)

See also

kdiagram.plot.uncertainty.plot_coverage: Aggregate empirical coverage comparison.
kdiagram.plot.uncertainty.plot_coverage_diagnostic: Point-wise coverage successes/failures.
kdiagram.plot.uncertainty.plot_interval_width: Magnitude of prediction-interval widths.
kdiagram.plot.uncertainty.plot_interval_consistency: Stability of interval widths over time/steps.
validate_qcols: Helper for validating the lower/upper quantile columns.

Notes

Anomalies are detected by comparing actual_col to the bounds in q_cols for each row.
Rows with NaNs in any required column (actual_col, the two quantile columns, and theta_col if provided) are dropped before analysis.
The anomaly magnitude \(r\) is non-negative and measures the distance to the exceeded bound.
theta_col controls ordering only, not spacing. Angles are always linearly spaced within the selected coverage (acov) and then assigned according to the sort order of theta_col.
Color intensity for both under- and over-predictions reflects the shared normalization of magnitudes, based on the maximum observed \(r\).
The helper validate_qcols (from your utilities, e.g. gofast) performs initial structure checks on q_cols.

Let \(y_j\) be the actual value, \(L_j\) the lower bound, and \(U_j\) the upper bound for location \(j\).

Anomaly masks
- Under-prediction: \(M_{\text{under},j} = (y_j < L_j)\)
- Over-prediction: \(M_{\text{over},j} = (y_j > U_j)\)
Anomaly magnitude (radial coordinate :math:`r`)

(1)¶\[\begin{split}r_j = \begin{cases} L_j - y_j, & \text{if } y_j < L_j,\\ y_j - U_j, & \text{if } y_j > U_j,\\ 0, & \text{otherwise.} \end{cases}\end{split}\]

Only points with \(r_j > 0\) are plotted.
Angular coordinate (:math:`theta`)
- For \(N\) retained rows, define base angles
  
  (2)¶\[\theta'_k \;=\; \frac{k}{N}\,S, \qquad k=0,\ldots,N-1,\]
  
  where \(S \in \{2\pi, \pi, \pi/2, \pi/4\}\) is the span implied by acov.
- If theta_col exists, sort by that column to obtain a permutation \(\pi\). The plotted angle for original row \(j\) is \(\theta_{\pi^{-1}(j)}=\theta'_{\pi^{-1}(j)}\). If theta_col is absent, use the original row order: \(\theta_j=\theta'_j\).
Color mapping
- Normalize magnitudes
  
  (3)¶\[\text{norm}(r_j) \;=\; \frac{r_j}{\max(r_1,\ldots,r_N) + \varepsilon},\]
  
  with a small \(\varepsilon>0\) for numerical safety.
- Apply colormaps: cmap_under(norm(r_j)) for under-predictions and cmap_over(norm(r_j)) for over-predictions.

References

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_anomaly_magnitude

1. Random Example:

>>> np.random.seed(42)
>>> N_points = 150
>>> df_anomaly_rand = pd.DataFrame({
...     'id': range(N_points),
...     'actual': np.random.randn(N_points) * 5 + 10,
...     'pred_q10': np.random.randn(N_points) * 1 + 7, # Interval around 10
...     'pred_q90': np.random.randn(N_points) * 1 + 13,
...     'feature_order': np.random.rand(N_points) * 100 # For ordering
... })
>>> # Introduce some anomalies
>>> df_anomaly_rand.loc[5:15, 'actual'] = 0 # Under-predictions
>>> df_anomaly_rand.loc[100:110, 'actual'] = 25 # Over-predictions
>>>
>>> ax_rand_anomaly = plot_anomaly_magnitude(
...     df=df_anomaly_rand,
...     actual_col='actual',
...     q_cols=['pred_q10', 'pred_q90'],
...     theta_col='feature_order', # Order by this feature
...     acov='default',
...     title='Random Anomaly Distribution',
...     cmap_under='GnBu',
...     cmap_over='OrRd',
...     s=40,
...     cbar=True,
...     verbose=1
... )
>>> # Output will show anomaly counts...
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data - adapted from docstring):

>>> # Assume small_sample_pred is a loaded DataFrame like:
>>> # Create dummy data if it doesn't exist
>>> try:
...    small_sample_pred
... except NameError:
...    print("Creating dummy small sample prediction data...")
...    N_small = 200
...    small_sample_pred = pd.DataFrame({
...        'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small),
...        'subsidence_2023_q10': np.random.rand(N_small)*10,
...        'subsidence_2023_q90': np.random.rand(N_small)*10 + 10,
...        'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01
...     })
...     # Ensure some anomalies exist in dummy data
...     anom_indices_under = np.random.choice(N_small, 15, replace=False)
...     anom_indices_over = np.random.choice(
...         list(set(range(N_small)) - set(anom_indices_under)), 20, replace=False
...     )
...     small_sample_pred.loc[anom_indices_under, 'subsidence_2023'] = (
...         small_sample_pred.loc[anom_indices_under, 'subsidence_2023_q10']
...         - np.random.rand(15)*5 - 1
...         )
...     small_sample_pred.loc[anom_indices_over, 'subsidence_2023'] = (
...         small_sample_pred.loc[anom_indices_over, 'subsidence_2023_q90']
...         + np.random.rand(20)*5 + 1
...         )

>>> ax_sub_anomaly = plot_anomaly_magnitude(
...     df=small_sample_pred,
...     actual_col='subsidence_2023',
...     q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'],
...     theta_col='latitude',      # Order points by latitude
...     acov='quarter_circle',   # Use only 90 degrees
...     title='Anomaly Magnitude (2023)  Zhongshan',
...     figsize=(9, 9),
...     s=35,
...     cbar=True,               # Show colorbar
...     mask_angle=True,         # Hide angle labels
...     verbose=1                # Print anomaly counts
... )
>>> # Output will show anomaly counts...
>>> # plt.show() called internally