kdiagram.plot.comparison.plot_reliability_diagram¶
- kdiagram.plot.comparison.plot_reliability_diagram(y_true, *y_preds, names=None, sample_weight=None, n_bins=10, strategy='uniform', positive_label=1, class_index=None, clip_probs=(0.0, 1.0), normalize_probs=True, error_bars='wilson', conf_level=0.95, show_diagonal=True, diagonal_kwargs=None, show_ece=True, show_brier=True, counts_panel='bottom', counts_norm='fraction', counts_alpha=0.35, figsize=(9, 7), title=None, xlabel='Predicted probability', ylabel='Observed frequency', cmap='tab10', color_palette=None, marker='o', s=40, linewidth=2.0, alpha=0.9, connect=True, legend=True, legend_loc='best', show_grid=True, grid_props=None, xlim=(0.0, 1.0), ylim=(0.0, 1.0), savefig=None, return_data=False, ax=None, **kw)[source]¶
Plot a reliability diagram (calibration plot) for one or more classification models.
This compares predicted probabilities to observed frequencies across bins of predicted probability. Perfect calibration lies on the diagonal \(y=x\).
- Parameters:
- y_truearray_like
ofshape(n_samples,) Ground truth labels. For binary calibration, values are compared to
positive_labelafter validation and flattening.- *y_predsarray_like(
s) One or more model predictions. Each item may be:
1D array of positive-class probabilities in
[0, 1].2D array of shape
(n_samples, n_classes); useclass_indexto select a column. If omitted, the last column is used.
- names
listofstr,optional Labels for each model curve. If fewer names are provided than models, placeholders like
'Model_1'are appended.- sample_weightarray_like
ofshape(n_samples,),optional Per-sample weights used for observed frequencies, ECE, and Brier score. If
None, equal weights are used.- n_bins
int, default=10 Number of probability bins.
- strategy{‘uniform’, ‘quantile’}, default=’uniform’
Binning strategy.
'uniform': equally spaced edges in[0, 1].'quantile': edges are empirical quantiles of the pooled predictions. If edges are not unique, the method falls back to uniform binning with a warning.
- positive_label
intorfloatorstr, default=1 Label in
y_truetreated as the positive class when constructing the binary target.- class_index
int,optional Column index to pick from 2D probability arrays. If omitted, the last column is used.
- clip_probs
tupleof(float,float), default=(0.0, 1.0) Inclusive clipping range applied to predictions. A warning is issued if clipping occurs.
- normalize_probsbool, default=True
If
True, attempts to linearly rescale predictions into[0, 1]when minor out-of-range values are detected, then applies clipping.- error_bars{‘wilson’, ‘normal’, ‘none’}, default=’wilson’
Per-bin uncertainty for observed frequencies.
'wilson': Wilson interval usingconf_level.'normal': normal approximation.'none': no error bars.
- conf_level
float, default=0.95 Confidence level used for error bars when applicable.
- show_diagonalbool, default=True
Draw the reference diagonal \(y=x\).
- diagonal_kwargs
dict,optional Matplotlib keyword arguments for the diagonal reference line (e.g.,
linestyle,color).- show_ecebool, default=True
Compute Expected Calibration Error (ECE) and append a summary to each model label.
- show_brierbool, default=True
Compute (weighted) Brier score and append a summary to each model label.
- counts_panel{‘none’, ‘bottom’}, default=’bottom’
If not
'none', draw a compact histogram below the main panel that shows per-bin totals for each model.- counts_norm{‘fraction’, ‘count’}, default=’fraction’
Normalization for the counts panel.
'fraction'divides by the total weight;'count'shows raw weighted sums.- counts_alpha
float, default=0.35 Alpha for bars in the counts panel.
- figsize
tupleof(float,float), default=(9, 7) Figure size for the layout. When
counts_panel='bottom', a two-row gridspec is used.- title
str,optional Title for the plot. If
None, no title is set.- xlabel
str,optional Label for the x-axis. Defaults to
'Predicted probability'.- ylabel
str,optional Label for the y-axis. Defaults to
'Observed frequency'.- cmap
str, default=’tab10’ Matplotlib colormap name used to generate model colors.
- color_palette
list,optional Explicit list of colors. When provided, colors are cycled from this list instead of the colormap.
- marker
str, default=’o’ Marker used for the bin points.
- s
int, default=40 Marker size for the bin points.
- linewidth
float, default=2.0 Line width used when connecting bin points.
- alpha
float, default=0.9 Alpha for points and lines in the main panel.
- connectbool, default=True
Connect bin points with a line for each model.
- legendbool, default=True
Display a legend. Summary metrics (ECE, Brier) are shown next to model names when enabled.
- legend_loc
str, default=’best’ Legend location passed to Matplotlib.
- show_gridbool, default=True
Toggle gridlines via the package helper
set_axis_grid.- grid_props
dict,optional Keyword arguments passed to
set_axis_gridfor grid customization (e.g.,linestyle,alpha).- xlim
tupleof(float,float), default=(0.0, 1.0) X-axis limits.
- ylim
tupleof(float,float), default=(0.0, 1.0) Y-axis limits.
- savefig
str,optional If provided, save the figure to this path; otherwise the plot is shown interactively.
- return_databool, default=False
If
True, return(ax, data_dict)where values are per-modelpandas.DataFrameobjects with per-bin stats:['bin_left', 'bin_right', 'bin_center', 'n', 'w_sum', 'p_mean', 'y_rate', 'y_low', 'y_high', 'ece_contrib']. Otherwise, return only the Matplotlib axes.
- y_truearray_like
- Returns:
- ax
matplotlib.axes.Axes Axes of the main calibration plot. When
counts_panel='bottom', the second axes (counts panel) is not returned.
- ax
- Parameters:
n_bins (int)
strategy (str)
class_index (int | None)
normalize_probs (bool)
error_bars (str)
conf_level (float)
show_diagonal (bool)
show_ece (bool)
show_brier (bool)
counts_panel (str)
counts_norm (Literal['fraction', 'count'])
counts_alpha (float)
title (str | None)
xlabel (str | None)
ylabel (str | None)
cmap (str)
marker (str)
s (int)
linewidth (float)
alpha (float)
connect (bool)
legend (bool)
legend_loc (str)
show_grid (bool)
grid_props (dict | None)
savefig (str | None)
return_data (bool)
ax (Axes | None)
Notes
Calibration compares confidence to accuracy within bins. For bin \(b\), let \(\hat{p}_i\) be predictions and \(y_i\in\{0,1\}\) be binary targets with weights \(w_i\ge 0\). Define the weighted bin mean probability and accuracy as
(1)¶\[\bar{p}_b \;=\; \frac{\sum_{i\in b} w_i \hat{p}_i} {\sum_{i\in b} w_i}, \qquad \bar{y}_b \;=\; \frac{\sum_{i\in b} w_i y_i} {\sum_{i\in b} w_i}.\]The Expected Calibration Error (ECE) is
(2)¶\[\mathrm{ECE} \;=\; \sum_b \left( \frac{\sum_{i\in b} w_i}{\sum_i w_i} \right) \left| \bar{y}_b - \bar{p}_b \right|.\]The (weighted) Brier score is
(3)¶\[\mathrm{Brier} \;=\; \frac{\sum_i w_i \left(\hat{p}_i - y_i\right)^2} {\sum_i w_i}.\]Wilson confidence intervals for \(\bar{y}_b\) use \(z = \Phi^{-1}\!\left(\tfrac{1+\alpha}{2}\right)\) and effective count \(n_b=\sum_{i\in b} w_i\):
(4)¶\[\mathrm{center} \;=\; \frac{\bar{y}_b + \frac{z^2}{2 n_b}} {1 + \frac{z^2}{n_b}}, \qquad \mathrm{radius} \;=\; \frac{z}{1 + \frac{z^2}{n_b}} \sqrt{\frac{\bar{y}_b(1-\bar{y}_b)}{n_b} + \frac{z^2}{4 n_b^2}}.\]The interval is \([\mathrm{center}-\mathrm{radius}, \mathrm{center}+\mathrm{radius}]\), clipped to
[0, 1]. The normal interval replaces the term with the usual standard error \(\sqrt{\bar{y}_b(1-\bar{y}_b)/n_b}\).When
strategy='quantile', bin edges are the empirical quantiles of the pooled predictions. If many identical values exist, edges can collapse; in that case, the function falls back to uniform edges with a warning.Examples
Binary example with quantile bins and Wilson intervals.
>>> import numpy as np >>> from kdiagram.plot.comparison import \ ... plot_reliability_diagram >>> rng = np.random.default_rng(0) >>> y = (rng.random(1000) < 0.4).astype(int) >>> p1 = 0.4 * np.ones_like(y) + 0.15 * rng.random(len(y)) >>> p2 = 0.4 * np.ones_like(y) + 0.05 * rng.random(len(y)) >>> ax = plot_reliability_diagram( ... y, p1, p2, ... names=['Wide', 'Tight'], ... n_bins=12, ... strategy='quantile', ... error_bars='wilson', ... counts_panel='bottom', ... show_ece=True, ... show_brier=True, ... title=('Reliability Diagram ' ... '(Quantile bins + Wilson CIs)'), ... )