kdiagram.plot.errors.plot_error_bands¶

kdiagram.plot.errors.plot_error_bands(df, error_col, theta_col, *, theta_period=None, theta_bins=24, n_std=1.0, title=None, figsize=(8.0, 8.0), cmap='viridis', show_grid=True, grid_props=None, mask_angle=False, savefig=None, dpi=300, acov='default', ax=None, **fill_kws)[source]¶

Plot polar error bands to visualize systemic vs random error.

This function aggregates forecast errors across bins of a cyclical or ordered feature (like month or hour) and plots the mean error and its standard deviation. It is a powerful diagnostic tool for identifying systemic biases and variations in model performance.

Parameters:

dfpd.DataFrame: The input DataFrame containing the error and feature data.
error_colstr: Name of the column containing the forecast error values, typically calculated as actual - predicted.
theta_colstr: Name of the column representing the feature to bin against, which will be mapped to the angular axis.
theta_periodfloat, optional: The period of the cyclical data in theta_col. For example, if theta_col is the month of the year, the period is 12. This ensures the data wraps around the circle correctly.
theta_binsint, default=24: The number of angular bins to group the data into for calculating statistics.
n_stdfloat, default=1.0: The number of standard deviations to display in the shaded error band around the mean error line.
titlestr, optional: The title for the plot. If None, a default is generated.
figsizetuple of (float, float), default=(8, 8): Figure size in inches.
cmapstr, default=’viridis’: Note: This parameter is currently not used in this function as colors are fixed for clarity (black, red, and a fill color).
show_gridbool, default=True: Toggle gridlines via the package helper set_axis_grid.
grid_propsdict, optional: Keyword arguments passed to set_axis_grid for grid customization.
mask_anglebool, default=False: If True, hide the angular tick labels.
savefigstr, optional: If provided, save the figure to this path; otherwise the plot is shown interactively.
dpiint, default=300: Resolution for the saved figure.
**fill_kwsdict, optional: Additional keyword arguments passed to the ax.fill_between call for the shaded error band (e.g., color, alpha).

Returns:

axmatplotlib.axes.Axes or None: The Matplotlib Axes object containing the plot, or None if the plot could not be generated.

Parameters:

df (DataFrame)
error_col (str)
theta_col (str)
theta_period (float | None)
theta_bins (int)
n_std (float)
title (str | None)
figsize (tuple[float, float])
cmap (str)
show_grid (bool)
grid_props (dict[str, Any] | None)
mask_angle (bool)
savefig (str | None)
dpi (int)
acov (Literal['default', 'half_circle', 'quarter_circle', 'eighth_circle'])
ax (Axes | None)

Notes

The plot visualizes the first two moments (mean and standard deviation) of the error distribution conditioned on the angular variable \(\theta\).

Binning: The data is first partitioned into \(K\) bins based on the values in theta_col. Let \(B_k\) be the set of indices of data points belonging to the \(k\)-th bin.
Mean Error Calculation: For each bin \(B_k\), the mean error \(\mu_{e,k}\) is calculated. This value is plotted as a point on the central black line.

(1)¶\[\mu_{e,k} = \frac{1}{|B_k|} \sum_{i \in B_k} e_i\]

where \(e_i\) is the error for data point \(i\). A consistent deviation of this line from the zero-error circle indicates a systemic bias.
Error Variance Calculation: For each bin, the standard deviation of the error, \(\sigma_{e,k}\), is also calculated.

(2)¶\[\begin{split}\sigma_{e,k} = \sqrt{\frac{1}{|B_k|-1}\\ \sum_{i \in B_k} (e_i - \mu_{e,k})^2}\end{split}\]
Band Construction: A shaded band is drawn between the lower and upper bounds, defined by the mean plus or minus a multiple of the standard deviation.

(3)¶\[\begin{split}\text{Upper Bound}_k &= \mu_{e,k} + n_{std} \cdot \sigma_{e,k} \\ \text{Lower Bound}_k &= \mu_{e,k} - n_{std} \cdot \sigma_{e,k}\end{split}\]

The width of this band indicates the random error or inconsistency of the model within that bin.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from kdiagram.plot.errors import plot_error_bands
>>>
>>> # Simulate a model with seasonal error patterns
>>> np.random.seed(42)
>>> n_points = 2000
>>> day_of_year = np.arange(n_points) % 365
>>> month = (day_of_year // 30) + 1
>>>
>>> # Create a bias (positive error) in summer and more noise in winter
>>> seasonal_bias = np.sin((day_of_year - 90) * np.pi / 180) * 5
>>> seasonal_noise = 2 + 2 * np.cos(day_of_year * np.pi / 180)**2
>>> errors = seasonal_bias + np.random.normal(0, seasonal_noise, n_points)
>>>
>>> df_seasonal = pd.DataFrame({'month': month, 'forecast_error': errors})
>>>
>>> # Generate the plot
>>> ax = plot_error_bands(
...     df=df_seasonal,
...     error_col='forecast_error',
...     theta_col='month',
...     theta_period=12,
...     theta_bins=12,
...     n_std=1.5,
...     title='Seasonal Forecast Error Analysis',
...     color='#2980B9',
...     alpha=0.3
... )