kdiagram.plot.context.plot_error_distribution

kdiagram.plot.context.plot_error_distribution(df, actual_col, pred_col, title=None, xlabel=None, **hist_kwargs)[source]

Plots a histogram and KDE of the forecast errors.

This function creates a distribution plot of the forecast errors (residuals), combining a histogram with a smooth Kernel Density Estimate (KDE) curve. It is a fundamental diagnostic for checking if a model’s errors are unbiased (centered at zero) and normally distributed.

For more details, refer to Error Autocorrelation (ACF) Plot User Guide

Parameters:
dfpd.DataFrame

The input DataFrame containing the actual and predicted values.

actual_colstr

The name of the column containing the true observed values.

pred_colstr

The name of the column containing the point forecast values.

titlestr, optional

The title for the plot. If None, a default is generated.

xlabelstr, optional

The label for the x-axis. If None, a default is generated.

**hist_kwargs

Additional keyword arguments passed directly to the underlying plot_hist_kde() function (e.g., bins, kde_color, figsize).

Returns:
axmatplotlib.axes.Axes

The Matplotlib Axes object containing the plot.

Parameters:
  • df (DataFrame)

  • actual_col (str)

  • pred_col (str)

  • title (str | None)

  • xlabel (str | None)

See also

plot_qq

A complementary plot for checking error normality.

plot_hist_kde

The general-purpose histogram utility this function wraps.

Contextual Diagnostic Plots

The user guide for contextual plots.

Notes

This function first calculates the forecast errors (or residuals), \(e_i = y_{true,i} - y_{pred,i}\). It then visualizes the distribution of these errors using two standard non-parametric methods:

  1. Histogram: The range of errors is divided into bins, and the height of each bar represents the frequency (or density) of errors in that bin.

  2. Kernel Density Estimate (KDE): This provides a smooth, continuous estimate of the error’s probability density function, based on foundational work in density estimation [1].

A well-behaved model should ideally produce errors that are normally distributed and centered around zero.

References

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.context import plot_error_distribution
>>>
>>> # Generate synthetic data with normally distributed errors
>>> np.random.seed(0)
>>> n_samples = 500
>>> y_true = np.linspace(0, 50, n_samples)
>>> errors = np.random.normal(0, 5, n_samples) # Normal errors
>>> y_pred = y_true + errors
>>>
>>> df = pd.DataFrame({'actual': y_true, 'predicted': y_pred})
>>>
>>> # Generate the plot
>>> ax = plot_error_distribution(
...     df,
...     actual_col='actual',
...     pred_col='predicted',
...     title="Distribution of Normally-Distributed Errors",
...     bins=40
... )