kdiagram.plot.uncertainty.plot_anomaly_magnitude

kdiagram.plot.uncertainty.plot_anomaly_magnitude(df, actual_col, q_cols, theta_col=None, acov='default', title='Anomaly Magnitude Polar Plot', figsize=(8.0, 8.0), cmap_under='Blues', cmap_over='Reds', s=30, alpha=0.8, show_grid=True, verbose=1, cbar=False, savefig=None, mask_angle=False)[source]

Visualize magnitude and type of prediction anomalies polar plot.

This function generates a polar scatter plot designed to highlight prediction anomalies – instances where the actual ground truth value falls outside a specified prediction interval (defined by a lower and an upper quantile, e.g., Q10 and Q90). It visually maps the location, magnitude, and type of these anomalies.

  • Angular Position (`theta`): Represents each data point (location). If theta_col is provided and valid, points are ordered angularly based on the values in that column (e.g., latitude, longitude, station index). Otherwise, points are plotted in their original DataFrame order. The angles are mapped linearly onto the specified angular coverage (acov).

  • Radial Distance (`r`): Represents the magnitude of the anomaly for points falling outside the prediction interval. It’s calculated as the absolute difference between the actual value and the nearest violated interval bound (\(|y_{actual} - y_{bound}|\)). Points within the interval are not plotted.

  • Color: Distinguishes the type of anomaly and indicates its magnitude. Separate colormaps are used:

    • cmap_under (default: Blues) for under-predictions (\(y_{actual} < y_{lower\_bound}\)).

    • cmap_over (default: Reds) for over-predictions (\(y_{actual} > y_{upper\_bound}\)).

    The color intensity within each map corresponds to the anomaly magnitude r, based on a shared normalization scale.

This plot serves as a powerful diagnostic tool for evaluating prediction models, especially those providing uncertainty estimates. It helps to: - Identify specific locations or regions where the model

significantly misestimates outcomes (under or over).

  • Assess the severity (magnitude) of these prediction errors.

  • Guide post-hoc analysis, model calibration checks, or targeted field validation efforts.

dfpd.DataFrame

Input DataFrame containing the actual values and the quantile prediction columns. Decorators ensure it’s a valid, non-empty pandas DataFrame.

actual_colstr

The name of the column containing the true observed or actual values (ground truth) against which predictions are compared.

q_colslist or tuple of str

A sequence containing exactly two string elements: the column name for the lower quantile bound (e.g., ‘prediction_q10’) and the column name for the upper quantile bound (e.g., ‘prediction_q90’). The order must be [lower, upper].

theta_colstr, optional

The name of a column in df whose values determine the angular ordering of the points. Useful for arranging points spatially (e.g., using ‘latitude’ or ‘longitude’) or by some other meaningful index. If None or the column is not found, points are plotted in their original DataFrame row order. Default is None.

acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’}, default=’default’

Specifies the angular coverage (span) of the polar plot. Maps the ordered points onto the specified fraction of a circle: - 'default': Full circle (360° or 2p radians). - 'half_circle': 180° or p radians. - 'quarter_circle': 90° or p/2 radians. - 'eighth_circle': 45° or p/4 radians.

titlestr, default=’Anomaly Magnitude Polar Plot’

The title displayed above the polar plot.

figsizetuple of (float, float), default=(8.0, 8.0)

The width and height of the figure in inches.

cmap_understr, default=’Blues’

The name of the Matplotlib colormap used for points representing under-predictions (actual < lower bound).

cmap_overstr, default=’Reds’

The name of the Matplotlib colormap used for points representing over-predictions (actual > upper bound).

sint, default=30

The marker size for the scatter points representing anomalies.

alphafloat, default=0.8

The transparency level for the scatter points (0=transparent, 1=opaque). Useful if points overlap.

show_gridbool, default=True

If True, display the polar grid lines (radial and angular).

verboseint, default=1

Controls the level of printed output. If > 0, prints summary statistics about the detected anomalies (total count, count of under- and over-predictions).

cbarbool, default=False

If True, adds a color bar to the plot representing the anomaly magnitude scale. Note: While the normalization scale is consistent for both under- and over-predictions, the colorbar itself visually uses the `cmap_over` colormap in the current implementation.

savefigstr, optional

The file path (e.g., ‘anomaly_plot.png’) where the plot image should be saved. If None, the plot is displayed interactively. Default is None.

mask_anglebool, default=False

If True, hides the angular tick labels (degrees/radians). Useful if the angular order is based on index or is not easily interpretable.

axmatplotlib.axes.Axes

The Matplotlib Axes object containing the polar scatter plot.

ValueError

If q_cols does not contain exactly two column names. If any columns specified in actual_col, q_cols, or theta_col are not found in the DataFrame df.

TypeError

If data in the required columns is not numeric after handling NaNs.

kdiagram.plot.uncertainty.plot_velocity :

Visualize average velocity in polar coordinates.

plot_interval_consistency : Visualize consistency of interval widths. validate_qcols : Helper function for validating quantile columns. matplotlib.pyplot.scatter : Function used for plotting points. matplotlib.colors.Normalize : Used for scaling magnitude to color.

  • The function first identifies anomalies by comparing actual_col values against the bounds defined by q_cols.

  • Rows containing NaN values in any of the essential columns (actual_col, q_cols, theta_col if provided) are dropped before analysis.

  • The anomaly magnitude r is always non-negative, representing the distance to the exceeded bound.

  • The theta_col parameter provides ordering, not direct angle mapping. The angles are still linearly spaced within the acov range but assigned according to the sorted order of theta_col.

  • The color intensity for both under- and over-predictions reflects the anomaly magnitude based on a shared normalization scale derived from the maximum observed anomaly magnitude.

  • Helper function validate_qcols (assumed from gofast) is used for initial validation of q_cols.

Let \(y_j\) be the actual value, \(L_j\) the lower quantile bound, and \(U_j\) the upper quantile bound for data point (location) \(j\).

  1. Anomaly Masks: - Under-prediction: \(M_{under, j} = (y_j < L_j)\) - Over-prediction: \(M_{over, j} = (y_j > U_j)\)

  2. Anomaly Magnitude (Radial Coordinate `r`): .. math:

    r_j = egin{cases} L_j - y_j &       ext{if }\
        M_{under, j} \ y_j - U_j &       ext{if } M_{over, j} \ 0 &      ext{otherwise} \end{cases}
    

    Only points where \(r_j > 0\) are plotted.

  3. Angular Coordinate (`theta`): - Generate base angles for \(N\) points (after NaN removal):

    :math:` heta’_{k} =

rac{k}{N} imes S`, where \(S\) is

the angular span from acov, \(k=0, ..., N-1\).

  • If theta_col (\(\mathbf{t}\)) is provided: Find the permutation \(\pi\) such that \(t_{\pi(0)} \le t_{\pi(1)} \le \dots \le t_{\pi(N-1)}\).

  • The final angle for the original point \(j\) (which is now at sorted position \(k = \pi^{-1}(j)\)) is :math:` heta_k =

    heta’_k`. The data (\(r_j\), masks) is reordered using

    \(\pi\) before plotting against :math:` heta_k`.

  • If no theta_col, :math:` heta_j = heta’_j`.

  1. Color Mapping: - Normalize magnitudes: :math:` ext{norm}(r_j) =

rac{r_j}{max(r_1, dots, r_N)}`.
  • Apply colormaps: cmap_under(norm(r_j)) for under-predictions, cmap_over(norm(r_j)) for over-predictions.

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_anomaly_magnitude

1. Random Example:

>>> np.random.seed(42)
>>> N_points = 150
>>> df_anomaly_rand = pd.DataFrame({
...     'id': range(N_points),
...     'actual': np.random.randn(N_points) * 5 + 10,
...     'pred_q10': np.random.randn(N_points) * 1 + 7, # Interval around 10
...     'pred_q90': np.random.randn(N_points) * 1 + 13,
...     'feature_order': np.random.rand(N_points) * 100 # For ordering
... })
>>> # Introduce some anomalies
>>> df_anomaly_rand.loc[5:15, 'actual'] = 0 # Under-predictions
>>> df_anomaly_rand.loc[100:110, 'actual'] = 25 # Over-predictions
>>>
>>> ax_rand_anomaly = plot_anomaly_magnitude(
...     df=df_anomaly_rand,
...     actual_col='actual',
...     q_cols=['pred_q10', 'pred_q90'],
...     theta_col='feature_order', # Order by this feature
...     acov='default',
...     title='Random Anomaly Distribution',
...     cmap_under='GnBu',
...     cmap_over='OrRd',
...     s=40,
...     cbar=True,
...     verbose=1
... )
>>> # Output will show anomaly counts...
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data - adapted from docstring):

>>> # Assume small_sample_pred is a loaded DataFrame like:
>>> # Create dummy data if it doesn't exist
>>> try:
...    small_sample_pred
... except NameError:
...    print("Creating dummy small sample prediction data...")
...    N_small = 200
...    small_sample_pred = pd.DataFrame({
...        'subsidence_2023': np.random.rand(N_small)*15 + np.linspace(0, 5, N_small),
...        'subsidence_2023_q10': np.random.rand(N_small)*10,
...        'subsidence_2023_q90': np.random.rand(N_small)*10 + 10,
...        'latitude': np.linspace(22.3, 22.7, N_small) + np.random.randn(N_small)*0.01
...     })
...     # Ensure some anomalies exist in dummy data
...     anom_indices_under = np.random.choice(N_small, 15, replace=False)
...     anom_indices_over = np.random.choice(
...         list(set(range(N_small)) - set(anom_indices_under)), 20, replace=False
...     )
...     small_sample_pred.loc[anom_indices_under, 'subsidence_2023'] = (
...         small_sample_pred.loc[anom_indices_under, 'subsidence_2023_q10']
...         - np.random.rand(15)*5 - 1
...         )
...     small_sample_pred.loc[anom_indices_over, 'subsidence_2023'] = (
...         small_sample_pred.loc[anom_indices_over, 'subsidence_2023_q90']
...         + np.random.rand(20)*5 + 1
...         )
>>> ax_sub_anomaly = plot_anomaly_magnitude(
...     df=small_sample_pred,
...     actual_col='subsidence_2023',
...     q_cols=['subsidence_2023_q10', 'subsidence_2023_q90'],
...     theta_col='latitude',      # Order points by latitude
...     acov='quarter_circle',   # Use only 90 degrees
...     title='Anomaly Magnitude (2023) – Zhongshan',
...     figsize=(9, 9),
...     s=35,
...     cbar=True,               # Show colorbar
...     mask_angle=True,         # Hide angle labels
...     verbose=1                # Print anomaly counts
... )
>>> # Output will show anomaly counts...
>>> # plt.show() called internally
Parameters: