kdiagram.plot.context.plot_error_distribution¶
- kdiagram.plot.context.plot_error_distribution(df, actual_col, pred_col, title=None, xlabel=None, **hist_kwargs)[source]¶
Plots a histogram and KDE of the forecast errors.
This function creates a distribution plot of the forecast errors (residuals), combining a histogram with a smooth Kernel Density Estimate (KDE) curve. It is a fundamental diagnostic for checking if a model’s errors are unbiased (centered at zero) and normally distributed.
For more details, refer to Error Autocorrelation (ACF) Plot User Guide
- Parameters:
- df
pd.DataFrame The input DataFrame containing the actual and predicted values.
- actual_col
str The name of the column containing the true observed values.
- pred_col
str The name of the column containing the point forecast values.
- title
str,optional The title for the plot. If
None, a default is generated.- xlabel
str,optional The label for the x-axis. If
None, a default is generated.- **hist_kwargs
Additional keyword arguments passed directly to the underlying
plot_hist_kde()function (e.g., bins, kde_color, figsize).
- df
- Returns:
- ax
matplotlib.axes.Axes The Matplotlib Axes object containing the plot.
- ax
- Parameters:
See also
plot_qqA complementary plot for checking error normality.
plot_hist_kdeThe general-purpose histogram utility this function wraps.
- Contextual Diagnostic Plots
The user guide for contextual plots.
Notes
This function first calculates the forecast errors (or residuals), \(e_i = y_{true,i} - y_{pred,i}\). It then visualizes the distribution of these errors using two standard non-parametric methods:
Histogram: The range of errors is divided into bins, and the height of each bar represents the frequency (or density) of errors in that bin.
Kernel Density Estimate (KDE): This provides a smooth, continuous estimate of the error’s probability density function, based on foundational work in density estimation [1].
A well-behaved model should ideally produce errors that are normally distributed and centered around zero.
References
Examples
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.plot.context import plot_error_distribution >>> >>> # Generate synthetic data with normally distributed errors >>> np.random.seed(0) >>> n_samples = 500 >>> y_true = np.linspace(0, 50, n_samples) >>> errors = np.random.normal(0, 5, n_samples) # Normal errors >>> y_pred = y_true + errors >>> >>> df = pd.DataFrame({'actual': y_true, 'predicted': y_pred}) >>> >>> # Generate the plot >>> ax = plot_error_distribution( ... df, ... actual_col='actual', ... pred_col='predicted', ... title="Distribution of Normally-Distributed Errors", ... bins=40 ... )