kdiagram.utils.plot_hist_kde

kdiagram.utils.plot_hist_kde(data, column=None, *, bins=50, x_label=None, title='Distribution (Histogram + KDE)', bandwidth=None, show_kde=True, savefig=None, dpi=300, figsize=(8, 6), kde_color='orange', hist_color='skyblue', hist_edge_color='white', kde_line_width=2, hist_alpha=0.7, normalize_kde=False, show_grid=True, grid_props=None, return_ax=False, **hist_kws)[source]

Plot histogram and Kernel Density Estimate (KDE) for uncertainty evaluation.

This function combines a histogram and a Kernel Density Estimate (KDE) to visualize the distribution of the provided data. It allows users to evaluate the uncertainty in predictions by plotting the histogram of the data along with an optional KDE to estimate the probability density function.

Parameters:
dataUnion[np.ndarray, pd.Series, pd.DataFrame]

The data to be plotted. This can be a numpy array, a pandas Series, or a pandas DataFrame. If a DataFrame is provided, the ‘column’ parameter must be specified to select the column to plot.

columnOptional[str], default=None

The name of the column to plot if the input data is a DataFrame. If data is a Series, this parameter is ignored.

binsint, default=50

The number of bins to use in the histogram.

x_labelstr, default=’Value’

The label for the x-axis.

titlestr, default=’Distribution (Histogram + KDE)’

The title of the plot.

bandwidthOptional[float], default=None

The bandwidth for the Kernel Density Estimate. If None, the bandwidth will be estimated using Silverman’s rule of thumb.

show_kdebool, default=True

Whether or not to display the KDE on the plot. If False, only the histogram will be plotted.

savefigOptional[str], default=None

The file path where the plot will be saved. If None, the plot will be displayed on the screen.

dpiint, default=300

The resolution of the saved plot (dots per inch) when savefig is specified.

figsizeTuple[float, float], default=(8, 6)

The size of the plot in inches.

kde_colorstr, default=’orange’

The color of the KDE line.

hist_colorstr, default=’skyblue’

The color of the histogram bars.

hist_edge_colorstr, default=’white’

The color of the edges of the histogram bars.

kde_line_widthfloat, default=2

The line width of the KDE line.

hist_alphafloat, default=0.7

The transparency level of the histogram bars. A value between 0 and 1.

hist_edge_alphafloat, default=1.0

The transparency level of the histogram edges. A value between 0 and 1.

normalize_kdebool, default=False

If True, the KDE will be normalized so that the maximum value is 1.

show_gridbool, default=True

Whether or not to display a grid on the plot.

grid_propsOptional[dict], default=None

A dictionary of grid properties. If provided, these will be applied to customize the grid appearance. By default, a dotted grid with 0.7 alpha is used.

**kwsadditional keyword arguments

Additional keyword arguments that can be passed to customize the plot, such as adjusting the axis properties or applying specific formatting.

Returns:
gridnp.ndarray

The x-values grid for the KDE evaluation.

pdfnp.ndarray

The estimated probability density function (PDF) values computed from the KDE.

Parameters:
Return type:

tuple[ndarray, ndarray]

See also

scipy.stats.gaussian_kde

For the Kernel Density Estimate implementation.

matplotlib.pyplot.hist

For plotting histograms in matplotlib.

pandas.Series.hist

For creating histograms from pandas Series.

Notes

  • The function estimates the KDE using a Gaussian kernel with a specified or automatically calculated bandwidth.

  • The KDE can be normalized to fit the range [0, 1], which is useful for comparison purposes, especially when overlaid with histograms.

  • The function automatically handles different input data types, such as pandas DataFrames, Series, or numpy arrays.

References

[1]

Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press.

[2]

Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley-Interscience.

Examples

>>> import numpy as np
>>> from kdiagram.utils import plot_hist_kde
>>> data = np.random.normal(0, 1, 1000)
>>> plot_hist_kde(data, bins=30, kde_color='blue')
>>> import pandas as pd
>>> df = pd.DataFrame({'values': np.random.normal(0, 1, 1000)})
>>> plot_hist_kde(df, column='values', bins=30, show_kde=True)
>>> plot_hist_kde(data, bins=30, title="Histogram with KDE",
>>>                savefig="output.png")