kdiagram.plot.errors.plot_error_violins

kdiagram.plot.errors.plot_error_violins(df, *error_cols, names=None, title=None, figsize=(9.0, 9.0), cmap='viridis', colors=None, show_grid=True, grid_props=None, savefig=None, dpi=300, acov='default', ax=None, mode='optimized', bw_method=None, overlay=False, overlay_angle=None, show_stats=False, **violin_kws)[source]

Plot polar violin plots to compare multiple error distributions.

This function creates a polar plot where each angular sector contains a violin plot representing the error distribution of a different model or dataset. It is a powerful tool for visually comparing bias, variance, and the overall shape of error distributions [1].

Parameters:
dfpd.DataFrame

The input DataFrame containing the error data.

*error_colsstr

One or more column names from df, each containing the error values (e.g., actual - predicted) for a model to be plotted.

nameslist of str, optional

Display names for each of the models corresponding to error_cols. If not provided, generic names like 'Model 1' will be generated. The list length must match the number of error columns.

titlestr, optional

The title for the plot. If None, a default is generated.

figsizetuple of (float, float), default=(9, 9)

Figure size in inches.

cmapstr, default=’viridis’

Matplotlib colormap used to assign a unique color to each violin plot.

colorslist of str, optional

An explicit list of colors to use for the violins. If provided, this overrides cmap. The list will cycle if it is shorter than the number of error columns.

acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},

default=’default’ Angular coverage (span) of the plot:

  • 'default': \(2\pi\) (full circle)

  • 'half_circle': \(\pi\)

  • 'quarter_circle': \(\tfrac{\pi}{2}\)

  • 'eighth_circle': \(\tfrac{\pi}{4}\)

mode{‘optimized’, ‘basic’}, default=’cbueth’

The plotting mode to use.

  • 'optimized' or 'cbueth' (Default) A mode inspired by reviewer feedback (see Notes). It maps error magnitude to the radius, splits violins into positive/negative lobes, and uses a central dot for the zero reference. This mode is optimized for detecting bias and skew.

  • 'basic': The original implementation where the radial axis directly represents the error value (positive and negative), and violins are centered on their assigned angle. The zero reference is a dashed circle.

bw_methodfloat, str, or None, default=None

The method used to calculate the estimator bandwidth for the KDE. This is passed directly to scipy.stats.gaussian_kde. If None, it uses the default “scott”.

overlaybool or ‘auto’, default=’auto’

Applies to ‘cbueth’ mode only. If True, all violins are overlaid on a single shared spoke for direct comparison (best for k=1 or k=2). If False, each violin gets its own spoke. If 'auto' (default), overlay is enabled if k <= 2 and disabled otherwise.

overlay_anglefloat, optional

Applies to ‘cbueth’ mode when ``overlay=True``. The angle (in radians) for the shared spoke. If None, defaults to \(\pi/2\) (vertical North).

show_statsbool, default=False

Applies to ‘cbueth’ mode only. If True, appends the median and skew of each distribution to its legend entry (e.g., “Model A (med=0.45; skew=-0.12)”).

show_gridbool, default=True

Toggle gridlines via the package helper set_axis_grid.

grid_propsdict, optional

Keyword arguments passed to set_axis_grid for grid customization.

savefigstr, optional

If provided, save the figure to this path; otherwise the plot is shown interactively.

dpiint, default=300

Resolution for the saved figure.

**violin_kwsdict, optional

Additional keyword arguments passed to the ax.fill call for each violin (e.g., alpha, edgecolor).

Returns:
axmatplotlib.axes.Axes or None

The Matplotlib Axes object containing the plot, or None if the plot could not be generated.

Parameters:

Notes

The plot visualizes and compares several one-dimensional error distributions. It adapts the standard violin plot [1] to a polar coordinate system for multi-model comparison.

  1. Kernel Density Estimation (KDE): For each model’s error data \(\mathbf{x} = \{x_1, x_2, ..., x_n\}\), the probability density function (PDF), \(\hat{f}_h(x)\), is estimated using a Gaussian kernel. This creates a smooth curve representing the distribution’s shape.

    (1)\[\hat{f}_h(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)\]

    where \(K\) is the Gaussian kernel and \(h\) is the bandwidth, a smoothing parameter.

  2. Violin Construction: The violin shape is created by plotting the density curve \(\hat{f}_h(x)\) symmetrically around a central axis. The width of the violin at any given error value \(x\) is proportional to its estimated density.

  3. Polar Arrangement: Each model’s violin is assigned a unique angular sector on the polar plot. The radial axis represents the error value, with a reference circle at \(r=0\) indicating a perfect forecast. The violin is drawn radially within its assigned sector.

The ‘cbueth’ Mode (Default)

The default mode='cbueth' was developed in response to insightful feedback during the JOSS paper review process. A reviewer noted that the original “basic” mode could sometimes make skewed distributions difficult to interpret, especially when comparing only two models.

> “…wouldn’t it be easier to just plot them on top of each other > with transparency? … Maybe this is just a problem when just > comparing two and not more models; with three it is already > prettier.”

To honor this contribution, the new mode was named after the reviewer’s GitHub handle. It addresses this feedback with key design changes:

  1. Radial Axis: The radius maps to absolute error \(|E|\), so all data starts from the center. The zero-error reference is a single point at the origin.

  2. Two-Lobe Design: Each violin is split into two lobes around its central spoke:

    • Right Lobe: Distribution of positive errors (\(E > 0\)).

    • Left Lobe: Distribution of negative errors (\(E < 0\)).

  3. Interpretation: This design makes key metrics instantly visible:

    • Bias: An imbalance in the size of the two lobes

      (e.g., a larger right lobe means a positive bias).

    • Skew: Asymmetry within a single lobe.

    • Variance: The overall radial extent of the lobes.

  4. Auto-Overlay: As suggested, when only two models are plotted (overlay='auto'), they are drawn on top of each other with transparency for a direct, “face-off” style comparison. For 3+ models, they are given separate spokes.

References

[1] (1,2)

Hintze, J. L., & Nelson, R. D. (1998). Violin Plots: A Box Plot-Density Trace Synergism. The American Statistician, 52(2), 181-184.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from kdiagram.plot.errors import plot_polar_error_violins
>>>
>>> # Simulate errors from three different models
>>> np.random.seed(0)
>>> n_points = 1000
>>> df_errors = pd.DataFrame({
...     'Model A (Good)': np.random.normal(
...           loc=0.5, scale=1.5, size=n_points),
...     'Model B (Biased)': np.random.normal(
...           loc=-4.0, scale=1.5, size=n_points),
...     'Model C (Inconsistent)': np.random.normal(
...           loc=0, scale=4.0, size=n_points),
... })
>>>
>>> # Generate the polar violin plot
>>> ax = plot_polar_error_violins(
...     df_errors,
...     'Model A (Good)',
...     'Model B (Biased)',
...     'Model C (Inconsistent)',
...     title='Comparison of Model Error Distributions',
...     cmap='plasma',
...     alpha=0.7
... )