kdiagram.plot.context.plot_error_distribution¶

kdiagram.plot.context.plot_error_distribution(df, actual_col, pred_col, title=None, xlabel=None, **hist_kwargs)[source]¶

Plots a histogram and KDE of the forecast errors.

This function creates a distribution plot of the forecast errors (residuals), combining a histogram with a smooth Kernel Density Estimate (KDE) curve. It is a fundamental diagnostic for checking if a model’s errors are unbiased (centered at zero) and normally distributed.

For more details, refer to Error Autocorrelation (ACF) Plot User Guide

Parameters:

dfpd.DataFrame: The input DataFrame containing the actual and predicted values.
actual_colstr: The name of the column containing the true observed values.
pred_colstr: The name of the column containing the point forecast values.
titlestr, optional: The title for the plot. If None, a default is generated.
xlabelstr, optional: The label for the x-axis. If None, a default is generated.
**hist_kwargs: Additional keyword arguments passed directly to the underlying plot_hist_kde() function (e.g., bins, kde_color, figsize).

Returns:

axmatplotlib.axes.Axes: The Matplotlib Axes object containing the plot.

Parameters:

df (DataFrame)
actual_col (str)
pred_col (str)
title (str | None)
xlabel (str | None)

See also

plot_qq: A complementary plot for checking error normality.
plot_hist_kde: The general-purpose histogram utility this function wraps.
Contextual Diagnostic Plots: The user guide for contextual plots.

Notes

This function first calculates the forecast errors (or residuals), \(e_i = y_{true,i} - y_{pred,i}\). It then visualizes the distribution of these errors using two standard non-parametric methods:

Histogram: The range of errors is divided into bins, and the height of each bar represents the frequency (or density) of errors in that bin.
Kernel Density Estimate (KDE): This provides a smooth, continuous estimate of the error’s probability density function, based on foundational work in density estimation [1].

A well-behaved model should ideally produce errors that are normally distributed and centered around zero.

References

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.context import plot_error_distribution
>>>
>>> # Generate synthetic data with normally distributed errors
>>> np.random.seed(0)
>>> n_samples = 500
>>> y_true = np.linspace(0, 50, n_samples)
>>> errors = np.random.normal(0, 5, n_samples) # Normal errors
>>> y_pred = y_true + errors
>>>
>>> df = pd.DataFrame({'actual': y_true, 'predicted': y_pred})
>>>
>>> # Generate the plot
>>> ax = plot_error_distribution(
...     df,
...     actual_col='actual',
...     pred_col='predicted',
...     title="Distribution of Normally-Distributed Errors",
...     bins=40
... )