kdiagram.plot.uncertainty.plot_actual_vs_predicted¶
- kdiagram.plot.uncertainty.plot_actual_vs_predicted(df, actual_col, pred_col, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, line=True, r_label=None, cmap=None, alpha=0.3, actual_props=None, pred_props=None, show_grid=True, grid_props=None, show_legend=True, mask_angle=False, dpi=300, savefig=None, ax=None)[source]¶
Polar plot comparing actual observed vs predicted values.
This function generates a polar plot to visually compare actual ground truth values against model predictions (typically a central estimate like the median, Q50) for multiple data points or locations arranged circularly [1].
Angular Position (`theta`): Represents each data point or location. Points are currently plotted in their DataFrame index order, mapped linearly onto the specified angular coverage (acov). The theta_col parameter is intended for future use in ordering points based on a specific feature (like latitude) but is currently ignored for positioning.
Radial Distance (`r`): Represents the magnitude of the values. Both the actual value (actual_col) and the predicted value (pred_col) are plotted at the corresponding angle theta.
Visual Comparison:
Actual and predicted values are shown as either continuous lines or individual dots based on the line parameter [2].
Gray vertical lines connect the actual and predicted values at each angle, visually highlighting the magnitude and direction (over- or under-prediction) of the difference at each point.
This plot facilitates:
Quick visual assessment of prediction accuracy and bias across samples.
Identification of regions or conditions (if angle relates to a feature) where the model performs well or poorly.
Communication of model performance to stakeholders.
- Parameters:
- df
pd.DataFrame Input DataFrame containing actual and predicted value columns. Decorators ensure it’s a valid, non-empty pandas DataFrame.
- actual_col
str Name of the column holding the actual observed (ground truth) values.
- pred_col
str Name of the column holding the corresponding predicted values (e.g., the Q50 median prediction).
- theta_col
str,optional Intended column name for ordering points angularly based on its values (e.g., ‘latitude’). Note: This parameter is currently ignored for positioning/ordering in this implementation; points use DataFrame index order. A warning is issued if provided. Default is
None.- acov{‘default’, ‘half_circle’, ‘quarter_circle’,
‘eighth_circle’}, default=’default’ Specifies the angular coverage (span) of the polar plot:
'default'(360°),'half_circle'(180°),'quarter_circle'(90°),'eighth_circle'(45°).- figsize
tupleof(float,float), default=(8.0, 8.0) Width and height of the figure in inches.
- title
str,optional Custom title for the plot. If
None, a default title is used. Default isNone.- linebool, default=True
Determines the plotting style:
If
True, actual and predicted values are plotted as lines connecting consecutive points.If
False, values are plotted as individual scatter dots.
- r_label
str,optional Custom label for the radial axis (representing value magnitude). If
None, no label is set. Default isNone.- cmap
str,optional Note: This parameter is currently unused in the function. It might be intended for future use, perhaps coloring the difference lines. Default is
None.- alpha
float, default=0.3 Transparency level applied to the gray difference lines drawn between actual and predicted values, and also to the predicted dots if
line=False.- actual_props
dict,optional Dictionary of keyword arguments passed directly to the Matplotlib plot or scatter function for the ‘Actual’ data series. Allows customization (e.g.,
{'color': 'blue', 'linestyle': '--'}). Defaults to basic black line/dots ifNone.- pred_props
dict,optional Dictionary of keyword arguments passed directly to the Matplotlib plot or scatter function for the ‘Predicted’ data series. Allows customization (e.g.,
{'color': 'orange', 'marker': 'x'}). Defaults to basic red line/dots ifNone.- show_gridbool, default=True
If
True, display the polar grid lines.- show_legendbool, default=True
If
True, display a legend labeling the ‘Actual’ and ‘Predicted’ series.- mute_degreebool, default=False
If
True, hide the angular tick labels (degrees).- savefig
str,optional File path to save the plot image. If
None, displays the plot interactively. Default isNone.
- df
- Returns:
- ax
matplotlib.axes._axes.Axes The Matplotlib Axes object containing the polar plot. Note that due to subplot_kw, it’s specifically a PolarAxesSubplot.
- ax
- Raises:
ValueErrorIf actual_col or pred_col are not found in df.
TypeErrorIf data in actual or predicted columns is not numeric.
- Parameters:
See also
plot_anomaly_magnitudeVisualize only points outside prediction intervals.
matplotlib.pyplot.plotFunction for line plots.
matplotlib.pyplot.scatterFunction for scatter plots.
Notes
Rows with NaN values in actual_col or pred_col (or theta_col if specified, though currently unused for position) are dropped.
The gray lines indicating the difference are drawn individually for each point using a loop. Warning: This approach can be very slow for large datasets (many thousands of points). An alternative like fill_between might be more efficient for showing shaded areas but would require sorting by theta.
The theta_col parameter is currently ignored for positioning; angles are always based on the DataFrame index order after NaN removal.
The cmap parameter is currently unused. The difference lines are hardcoded to ‘gray’.
Default plotting styles are black for actual and red for predicted, but can be overridden using actual_props and pred_props [1].
Let \(y_j\) be the actual value and \(\hat{y}_j\) the predicted value for data point (location) \(j\) (\(j=0, \dots, N-1\) after NaN removal).
Angular Coordinate (`theta`): Let \(S\) be the angular span and \(\theta_{min}\) the start angle from acov.
(1)¶\[\theta_j = \left( \frac{j}{N} \times S \right) + \theta_{min}\]Radial Coordinates: The radial coordinates are directly the values: \(r_{actual, j} = y_j\) and \(r_{pred, j} = \hat{y}_j\).
Plotting:
Plot points/lines connecting \((r_{actual, j}, \theta_j)\) and \((r_{pred, j}, \theta_j)\).
For each \(j\), draw a gray line segment connecting the points \((\min(y_j, \hat{y}_j), \theta_j)\) and \((\max(y_j, \hat{y}_j), \theta_j)\).
References
[1] (1,2)Kouadio, K. L., Liu, R., Loukou, K. G. H., Liu, J., & Liu, W. (2025). Analytics Framework for Interpreting Spatiotemporal Probabilistic Forecasts. International Journal of Forecasting. Manuscript submitted.
[2]Matplotlib documentation: https://matplotlib.org/stable/
Examples
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.uncertainty import plot_actual_vs_predicted
1. Random Example:
>>> np.random.seed(0) >>> N = 100 >>> df_avp_rand = pd.DataFrame({ ... 'Time': pd.date_range('2023-01-01', periods=N, freq='D'), ... 'ActualTemp': 15 + 10 * np.sin(np.linspace(0, 4 * np.pi, N)) + np.random.randn(N) * 2, ... 'PredictedTemp': 16 + 9 * np.sin(np.linspace(0, 4 * np.pi, N) + 0.1) + np.random.randn(N) * 1.5 ... }) >>> ax_avp_rand = plot_actual_vs_predicted( ... df=df_avp_rand, ... actual_col='ActualTemp', ... pred_col='PredictedTemp', ... theta_col='Time', # Note: Ignored for positioning ... acov='default', ... title='Temperature: Actual vs. Predicted', ... line=True, # Use lines ... r_label='Temperature (°C)', ... actual_props={'color': 'navy', 'linestyle': '-'}, ... pred_props={'color': 'crimson', 'linestyle': '--'} ... ) >>> # plt.show() called internally
2. Concrete Example (Subsidence Data - using dots):
>>> # Assume zhongshan_pred_2023_2026 is a loaded DataFrame >>> # Create dummy data if it doesn't exist >>> try: ... zhongshan_pred_2023_2026 ... except NameError: ... print("Creating dummy subsidence data for example...") ... N_sub = 150 ... zhongshan_pred_2023_2026 = pd.DataFrame({ ... 'latitude': np.linspace(22.2, 22.8, N_sub), ... 'subsidence_2023': np.random.rand(N_sub)*15 + np.linspace(0, 5, N_sub), ... 'subsidence_2023_q50': np.random.rand(N_sub)*14 + np.linspace(0.5, 5.5, N_sub), ... # Add other columns if needed by other examples ... **{f'subsidence_{yr}_q10': np.random.rand(N_sub)*(yr-2022)*2 + 1 ... for yr in range(2023, 2027)}, ... **{f'subsidence_{yr}_q90': np.random.rand(N_sub)*(yr-2022)*2 + 5 ... + np.linspace(0, (yr-2022)*3, N_sub) ... for yr in range(2023, 2027)}, ... })
>>> ax_avp_sub = plot_actual_vs_predicted( ... df=zhongshan_pred_2023_2026.head(100), # Use subset for speed ... actual_col='subsidence_2023', ... pred_col='subsidence_2023_q50', ... theta_col='latitude', # Note: Ignored for positioning ... acov='half_circle', # Use 180 degrees ... title='Actual vs Predicted Subsidence (2023)', ... line=False, # Use dots instead of lines ... r_label="Subsidence (mm)", ... mute_degree=True, ... pred_props={'marker': 'x', 'color': 'purple'} # Customize predicted dots ... ) >>> # plt.show() called internally