kdiagram.utils.plot_hist_kde¶
- kdiagram.utils.plot_hist_kde(data, column=None, *, bins=50, x_label=None, title='Distribution (Histogram + KDE)', bandwidth=None, show_kde=True, savefig=None, dpi=300, figsize=(8, 6), kde_color='orange', hist_color='skyblue', hist_edge_color='white', kde_line_width=2, hist_alpha=0.7, normalize_kde=False, show_grid=True, grid_props=None, return_ax=False, **hist_kws)[source]¶
Plot histogram and Kernel Density Estimate (KDE) for uncertainty evaluation.
This function combines a histogram and a Kernel Density Estimate (KDE) to visualize the distribution of the provided data. It allows users to evaluate the uncertainty in predictions by plotting the histogram of the data along with an optional KDE to estimate the probability density function.
- Parameters:
- data
Union[np.ndarray,pd.Series,pd.DataFrame] The data to be plotted. This can be a numpy array, a pandas Series, or a pandas DataFrame. If a DataFrame is provided, the ‘column’ parameter must be specified to select the column to plot.
- column
Optional[str], default=None The name of the column to plot if the input data is a DataFrame. If data is a Series, this parameter is ignored.
- bins
int, default=50 The number of bins to use in the histogram.
- x_label
str, default=’Value’ The label for the x-axis.
- title
str, default=’Distribution (Histogram+KDE)’ The title of the plot.
- bandwidth
Optional[float], default=None The bandwidth for the Kernel Density Estimate. If None, the bandwidth will be estimated using Silverman’s rule of thumb.
- show_kdebool, default=True
Whether or not to display the KDE on the plot. If False, only the histogram will be plotted.
- savefig
Optional[str], default=None The file path where the plot will be saved. If None, the plot will be displayed on the screen.
- dpi
int, default=300 The resolution of the saved plot (dots per inch) when savefig is specified.
- figsize
Tuple[float,float], default=(8, 6) The size of the plot in inches.
- kde_color
str, default=’orange’ The color of the KDE line.
- hist_color
str, default=’skyblue’ The color of the histogram bars.
- hist_edge_color
str, default=’white’ The color of the edges of the histogram bars.
- kde_line_width
float, default=2 The line width of the KDE line.
- hist_alpha
float, default=0.7 The transparency level of the histogram bars. A value between 0 and 1.
- hist_edge_alpha
float, default=1.0 The transparency level of the histogram edges. A value between 0 and 1.
- normalize_kdebool, default=False
If True, the KDE will be normalized so that the maximum value is 1.
- show_gridbool, default=True
Whether or not to display a grid on the plot.
- grid_props
Optional[dict], default=None A dictionary of grid properties. If provided, these will be applied to customize the grid appearance. By default, a dotted grid with 0.7 alpha is used.
- **kws
additionalkeywordarguments Additional keyword arguments that can be passed to customize the plot, such as adjusting the axis properties or applying specific formatting.
- data
- Returns:
- grid
np.ndarray The x-values grid for the KDE evaluation.
- pdf
np.ndarray The estimated probability density function (PDF) values computed from the KDE.
- grid
- Parameters:
data (ndarray | Series | DataFrame)
column (str | None)
bins (int)
x_label (str | None)
title (str)
bandwidth (float | None)
show_kde (bool)
savefig (str | None)
dpi (int)
kde_color (str)
hist_color (str)
hist_edge_color (str)
kde_line_width (float)
hist_alpha (float)
normalize_kde (bool)
show_grid (bool)
grid_props (dict | None)
return_ax (bool)
- Return type:
See also
scipy.stats.gaussian_kdeFor the Kernel Density Estimate implementation.
matplotlib.pyplot.histFor plotting histograms in matplotlib.
pandas.Series.histFor creating histograms from pandas Series.
Notes
The function estimates the KDE using a Gaussian kernel with a specified or automatically calculated bandwidth.
The KDE can be normalized to fit the range [0, 1], which is useful for comparison purposes, especially when overlaid with histograms.
The function automatically handles different input data types, such as pandas DataFrames, Series, or numpy arrays.
References
[1]Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press.
[2]Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley-Interscience.
Examples
>>> import numpy as np >>> from kdiagram.utils import plot_hist_kde >>> data = np.random.normal(0, 1, 1000) >>> plot_hist_kde(data, bins=30, kde_color='blue')
>>> import pandas as pd >>> df = pd.DataFrame({'values': np.random.normal(0, 1, 1000)}) >>> plot_hist_kde(df, column='values', bins=30, show_kde=True)
>>> plot_hist_kde(data, bins=30, title="Histogram with KDE", >>> savefig="output.png")