kdiagram.plot.uncertainty.plot_actual_vs_predicted

kdiagram.plot.uncertainty.plot_actual_vs_predicted(df, actual_col, pred_col, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, line=True, r_label=None, cmap=None, alpha=0.3, actual_props=None, pred_props=None, show_grid=True, grid_props=None, show_legend=True, mask_angle=False, dpi=300, savefig=None, ax=None)[source]

Polar plot comparing actual observed vs predicted values.

This function generates a polar plot to visually compare actual ground truth values against model predictions (typically a central estimate like the median, Q50) for multiple data points or locations arranged circularly [1].

  • Angular Position (`theta`): Represents each data point or location. Points are currently plotted in their DataFrame index order, mapped linearly onto the specified angular coverage (acov). The theta_col parameter is intended for future use in ordering points based on a specific feature (like latitude) but is currently ignored for positioning.

  • Radial Distance (`r`): Represents the magnitude of the values. Both the actual value (actual_col) and the predicted value (pred_col) are plotted at the corresponding angle theta.

  • Visual Comparison:

    • Actual and predicted values are shown as either continuous lines or individual dots based on the line parameter [2].

    • Gray vertical lines connect the actual and predicted values at each angle, visually highlighting the magnitude and direction (over- or under-prediction) of the difference at each point.

This plot facilitates:

  • Quick visual assessment of prediction accuracy and bias across samples.

  • Identification of regions or conditions (if angle relates to a feature) where the model performs well or poorly.

  • Communication of model performance to stakeholders.

Parameters:
dfpd.DataFrame

Input DataFrame containing actual and predicted value columns. Decorators ensure it’s a valid, non-empty pandas DataFrame.

actual_colstr

Name of the column holding the actual observed (ground truth) values.

pred_colstr

Name of the column holding the corresponding predicted values (e.g., the Q50 median prediction).

theta_colstr, optional

Intended column name for ordering points angularly based on its values (e.g., ‘latitude’). Note: This parameter is currently ignored for positioning/ordering in this implementation; points use DataFrame index order. A warning is issued if provided. Default is None.

acov{‘default’, ‘half_circle’, ‘quarter_circle’,

‘eighth_circle’}, default=’default’ Specifies the angular coverage (span) of the polar plot: 'default' (360°), 'half_circle' (180°), 'quarter_circle' (90°), 'eighth_circle' (45°).

figsizetuple of (float, float), default=(8.0, 8.0)

Width and height of the figure in inches.

titlestr, optional

Custom title for the plot. If None, a default title is used. Default is None.

linebool, default=True

Determines the plotting style:

  • If True, actual and predicted values are plotted as lines connecting consecutive points.

  • If False, values are plotted as individual scatter dots.

r_labelstr, optional

Custom label for the radial axis (representing value magnitude). If None, no label is set. Default is None.

cmapstr, optional

Note: This parameter is currently unused in the function. It might be intended for future use, perhaps coloring the difference lines. Default is None.

alphafloat, default=0.3

Transparency level applied to the gray difference lines drawn between actual and predicted values, and also to the predicted dots if line=False.

actual_propsdict, optional

Dictionary of keyword arguments passed directly to the Matplotlib plot or scatter function for the ‘Actual’ data series. Allows customization (e.g., {'color': 'blue', 'linestyle': '--'}). Defaults to basic black line/dots if None.

pred_propsdict, optional

Dictionary of keyword arguments passed directly to the Matplotlib plot or scatter function for the ‘Predicted’ data series. Allows customization (e.g., {'color': 'orange', 'marker': 'x'}). Defaults to basic red line/dots if None.

show_gridbool, default=True

If True, display the polar grid lines.

show_legendbool, default=True

If True, display a legend labeling the ‘Actual’ and ‘Predicted’ series.

mute_degreebool, default=False

If True, hide the angular tick labels (degrees).

savefigstr, optional

File path to save the plot image. If None, displays the plot interactively. Default is None.

Returns:
axmatplotlib.axes._axes.Axes

The Matplotlib Axes object containing the polar plot. Note that due to subplot_kw, it’s specifically a PolarAxesSubplot.

Raises:
ValueError

If actual_col or pred_col are not found in df.

TypeError

If data in actual or predicted columns is not numeric.

Parameters:

See also

plot_anomaly_magnitude

Visualize only points outside prediction intervals.

matplotlib.pyplot.plot

Function for line plots.

matplotlib.pyplot.scatter

Function for scatter plots.

Notes

  • Rows with NaN values in actual_col or pred_col (or theta_col if specified, though currently unused for position) are dropped.

  • The gray lines indicating the difference are drawn individually for each point using a loop. Warning: This approach can be very slow for large datasets (many thousands of points). An alternative like fill_between might be more efficient for showing shaded areas but would require sorting by theta.

  • The theta_col parameter is currently ignored for positioning; angles are always based on the DataFrame index order after NaN removal.

  • The cmap parameter is currently unused. The difference lines are hardcoded to ‘gray’.

  • Default plotting styles are black for actual and red for predicted, but can be overridden using actual_props and pred_props [1].

Let \(y_j\) be the actual value and \(\hat{y}_j\) the predicted value for data point (location) \(j\) (\(j=0, \dots, N-1\) after NaN removal).

  1. Angular Coordinate (`theta`): Let \(S\) be the angular span and \(\theta_{min}\) the start angle from acov.

    (1)\[\theta_j = \left( \frac{j}{N} \times S \right) + \theta_{min}\]
  2. Radial Coordinates: The radial coordinates are directly the values: \(r_{actual, j} = y_j\) and \(r_{pred, j} = \hat{y}_j\).

  3. Plotting:

    • Plot points/lines connecting \((r_{actual, j}, \theta_j)\) and \((r_{pred, j}, \theta_j)\).

    • For each \(j\), draw a gray line segment connecting the points \((\min(y_j, \hat{y}_j), \theta_j)\) and \((\max(y_j, \hat{y}_j), \theta_j)\).

References

[1] (1,2)

Kouadio, K. L., Liu, R., Loukou, K. G. H., Liu, J., & Liu, W. (2025). Analytics Framework for Interpreting Spatiotemporal Probabilistic Forecasts. International Journal of Forecasting. Manuscript submitted.

[2]

Matplotlib documentation: https://matplotlib.org/stable/

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_actual_vs_predicted

1. Random Example:

>>> np.random.seed(0)
>>> N = 100
>>> df_avp_rand = pd.DataFrame({
...     'Time': pd.date_range('2023-01-01', periods=N, freq='D'),
...     'ActualTemp': 15 + 10 * np.sin(np.linspace(0, 4 * np.pi, N)) + np.random.randn(N) * 2,
...     'PredictedTemp': 16 + 9 * np.sin(np.linspace(0, 4 * np.pi, N) + 0.1) + np.random.randn(N) * 1.5
... })
>>> ax_avp_rand = plot_actual_vs_predicted(
...     df=df_avp_rand,
...     actual_col='ActualTemp',
...     pred_col='PredictedTemp',
...     theta_col='Time', # Note: Ignored for positioning
...     acov='default',
...     title='Temperature: Actual vs. Predicted',
...     line=True, # Use lines
...     r_label='Temperature (°C)',
...     actual_props={'color': 'navy', 'linestyle': '-'},
...     pred_props={'color': 'crimson', 'linestyle': '--'}
... )
>>> # plt.show() called internally

2. Concrete Example (Subsidence Data - using dots):

>>> # Assume zhongshan_pred_2023_2026 is a loaded DataFrame
>>> # Create dummy data if it doesn't exist
>>> try:
...    zhongshan_pred_2023_2026
... except NameError:
...    print("Creating dummy subsidence data for example...")
...    N_sub = 150
...    zhongshan_pred_2023_2026 = pd.DataFrame({
...       'latitude': np.linspace(22.2, 22.8, N_sub),
...       'subsidence_2023': np.random.rand(N_sub)*15 + np.linspace(0, 5, N_sub),
...       'subsidence_2023_q50': np.random.rand(N_sub)*14 + np.linspace(0.5, 5.5, N_sub),
...       # Add other columns if needed by other examples
...       **{f'subsidence_{yr}_q10': np.random.rand(N_sub)*(yr-2022)*2 + 1
...          for yr in range(2023, 2027)},
...       **{f'subsidence_{yr}_q90': np.random.rand(N_sub)*(yr-2022)*2 + 5
...          + np.linspace(0, (yr-2022)*3, N_sub)
...          for yr in range(2023, 2027)},
...     })
>>> ax_avp_sub = plot_actual_vs_predicted(
...     df=zhongshan_pred_2023_2026.head(100), # Use subset for speed
...     actual_col='subsidence_2023',
...     pred_col='subsidence_2023_q50',
...     theta_col='latitude',      # Note: Ignored for positioning
...     acov='half_circle',      # Use 180 degrees
...     title='Actual vs Predicted Subsidence (2023)',
...     line=False,              # Use dots instead of lines
...     r_label="Subsidence (mm)",
...     mute_degree=True,
...     pred_props={'marker': 'x', 'color': 'purple'} # Customize predicted dots
... )
>>> # plt.show() called internally