kdiagram.plot.uncertainty.plot_actual_vs_predicted¶
- kdiagram.plot.uncertainty.plot_actual_vs_predicted(df, actual_col, pred_col, theta_col=None, acov='default', figsize=(8.0, 8.0), title=None, line=True, r_label=None, cmap=None, alpha=0.3, actual_props=None, pred_props=None, show_grid=True, grid_props=None, show_legend=True, mask_angle=False, savefig=None)[source]¶
Polar plot comparing actual observed vs. predicted values.
This function generates a polar plot to visually compare actual ground truth values against model predictions (typically a central estimate like the median, Q50) for multiple data points or locations arranged circularly.
Angular Position (`theta`): Represents each data point or location. Points are currently plotted in their DataFrame index order, mapped linearly onto the specified angular coverage (acov). The theta_col parameter is intended for future use in ordering points based on a specific feature (like latitude) but is currently ignored for positioning.
Radial Distance (`r`): Represents the magnitude of the values. Both the actual value (actual_col) and the predicted value (pred_col) are plotted at the corresponding angle theta.
- Visual Comparison:
Actual and predicted values are shown as either continuous lines or individual dots based on the line parameter.
Gray vertical lines connect the actual and predicted values at each angle, visually highlighting the magnitude and direction (over- or under-prediction) of the difference at each point.
This plot facilitates: - Quick visual assessment of prediction accuracy and bias across
samples.
Identification of regions or conditions (if angle relates to a feature) where the model performs well or poorly.
Communication of model performance to stakeholders.
- dfpd.DataFrame
Input DataFrame containing actual and predicted value columns. Decorators ensure it’s a valid, non-empty pandas DataFrame.
- actual_colstr
Name of the column holding the actual observed (ground truth) values.
- pred_colstr
Name of the column holding the corresponding predicted values (e.g., the Q50 median prediction).
- theta_colstr, optional
Intended column name for ordering points angularly based on its values (e.g., ‘latitude’). Note: This parameter is currently ignored for positioning/ordering in this implementation; points use DataFrame index order. A warning is issued if provided. Default is
None.- acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’}, default=’default’
Specifies the angular coverage (span) of the polar plot:
'default'(360°),'half_circle'(180°),'quarter_circle'(90°),'eighth_circle'(45°).- figsizetuple of (float, float), default=(8.0, 8.0)
Width and height of the figure in inches.
- titlestr, optional
Custom title for the plot. If
None, a default title is used. Default isNone.- linebool, default=True
Determines the plotting style: - If
True, actual and predicted values are plotted as linesconnecting consecutive points.
If
False, values are plotted as individual scatter dots.
- r_labelstr, optional
Custom label for the radial axis (representing value magnitude). If
None, no label is set. Default isNone.- cmapstr, optional
Note: This parameter is currently unused in the function. It might be intended for future use, perhaps coloring the difference lines. Default is
None.- alphafloat, default=0.3
Transparency level applied to the gray difference lines drawn between actual and predicted values, and also to the predicted dots if
line=False.- actual_propsdict, optional
Dictionary of keyword arguments passed directly to the Matplotlib plot or scatter function for the ‘Actual’ data series. Allows customization (e.g.,
{'color': 'blue', 'linestyle': '--'}). Defaults to basic black line/dots ifNone.- pred_propsdict, optional
Dictionary of keyword arguments passed directly to the Matplotlib plot or scatter function for the ‘Predicted’ data series. Allows customization (e.g.,
{'color': 'orange', 'marker': 'x'}). Defaults to basic red line/dots ifNone.- show_gridbool, default=True
If
True, display the polar grid lines.- show_legendbool, default=True
If
True, display a legend labeling the ‘Actual’ and ‘Predicted’ series.- mute_degreebool, default=False
If
True, hide the angular tick labels (degrees).- savefigstr, optional
File path to save the plot image. If
None, displays the plot interactively. Default isNone.
- axmatplotlib.axes._axes.Axes
The Matplotlib Axes object containing the polar plot. Note that due to subplot_kw, it’s specifically a PolarAxesSubplot.
- ValueError
If actual_col or pred_col are not found in df.
- TypeError
If data in actual or predicted columns is not numeric.
- plot_anomaly_magnitudeVisualize only points outside prediction
intervals.
matplotlib.pyplot.plot : Function for line plots. matplotlib.pyplot.scatter : Function for scatter plots.
Rows with NaN values in actual_col or pred_col (or theta_col if specified, though currently unused for position) are dropped.
The gray lines indicating the difference are drawn individually for each point using a loop. Warning: This approach can be very slow for large datasets (many thousands of points). An alternative like fill_between might be more efficient for showing shaded areas but would require sorting by theta.
The theta_col parameter is currently ignored for positioning; angles are always based on the DataFrame index order after NaN removal.
The cmap parameter is currently unused. The difference lines are hardcoded to ‘gray’.
Default plotting styles are black for actual and red for predicted, but can be overridden using actual_props and pred_props.
Let \(y_j\) be the actual value and \(\hat{y}_j\) the predicted value for data point (location) \(j\) (\(j=0, \dots, N-1\) after NaN removal).
Angular Coordinate (`theta`): Let \(S\) be the angular span and :math:` heta_{min}` the start angle from acov. .. math:
heta_j = \left(
rac{j}{N} imes S ight) + heta_{min}
Radial Coordinates: The radial coordinates are directly the values: \(r_{actual, j} = y_j\) and \(r_{pred, j} = \hat{y}_j\).
Plotting: - Plot points/lines connecting \((r_{actual, j}, heta_j)\)
and \((r_{pred, j}, heta_j)\).
For each \(j\), draw a gray line segment connecting the points \((\min(y_j, \hat{y}_j), heta_j)\) and \((\max(y_j, \hat{y}_j), heta_j)\).
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.uncertainty import plot_actual_vs_predicted
1. Random Example:
>>> np.random.seed(0) >>> N = 100 >>> df_avp_rand = pd.DataFrame({ ... 'Time': pd.date_range('2023-01-01', periods=N, freq='D'), ... 'ActualTemp': 15 + 10 * np.sin(np.linspace(0, 4 * np.pi, N)) + np.random.randn(N) * 2, ... 'PredictedTemp': 16 + 9 * np.sin(np.linspace(0, 4 * np.pi, N) + 0.1) + np.random.randn(N) * 1.5 ... }) >>> ax_avp_rand = plot_actual_vs_predicted( ... df=df_avp_rand, ... actual_col='ActualTemp', ... pred_col='PredictedTemp', ... theta_col='Time', # Note: Ignored for positioning ... acov='default', ... title='Temperature: Actual vs. Predicted', ... line=True, # Use lines ... r_label='Temperature (°C)', ... actual_props={'color': 'navy', 'linestyle': '-'}, ... pred_props={'color': 'crimson', 'linestyle': '--'} ... ) >>> # plt.show() called internally
2. Concrete Example (Subsidence Data - using dots):
>>> # Assume zhongshan_pred_2023_2026 is a loaded DataFrame >>> # Create dummy data if it doesn't exist >>> try: ... zhongshan_pred_2023_2026 ... except NameError: ... print("Creating dummy subsidence data for example...") ... N_sub = 150 ... zhongshan_pred_2023_2026 = pd.DataFrame({ ... 'latitude': np.linspace(22.2, 22.8, N_sub), ... 'subsidence_2023': np.random.rand(N_sub)*15 + np.linspace(0, 5, N_sub), ... 'subsidence_2023_q50': np.random.rand(N_sub)*14 + np.linspace(0.5, 5.5, N_sub), ... # Add other columns if needed by other examples ... **{f'subsidence_{yr}_q10': np.random.rand(N_sub)*(yr-2022)*2 + 1 ... for yr in range(2023, 2027)}, ... **{f'subsidence_{yr}_q90': np.random.rand(N_sub)*(yr-2022)*2 + 5 ... + np.linspace(0, (yr-2022)*3, N_sub) ... for yr in range(2023, 2027)}, ... })
>>> ax_avp_sub = plot_actual_vs_predicted( ... df=zhongshan_pred_2023_2026.head(100), # Use subset for speed ... actual_col='subsidence_2023', ... pred_col='subsidence_2023_q50', ... theta_col='latitude', # Note: Ignored for positioning ... acov='half_circle', # Use 180 degrees ... title='Actual vs Predicted Subsidence (2023)', ... line=False, # Use dots instead of lines ... r_label="Subsidence (mm)", ... mute_degree=True, ... pred_props={'marker': 'x', 'color': 'purple'} # Customize predicted dots ... ) >>> # plt.show() called internally