kdiagram.metrics.cluster_aware_severity_score

kdiagram.metrics.cluster_aware_severity_score(y_true, y_pred, *, sample_weight=None, window_size=21, sort_by=None, normalize='band', density_source='indicator', kernel='triangular', lambda_=1.0, gamma=1.0, eps=1e-12, multioutput='uniform_average', nan_policy='omit', return_details=False)[source]

Compute the Cluster-Aware Severity (CAS) score.

This metric evaluates prediction intervals by penalizing not only the magnitude of interval failures (anomalies) but also their local concentration in time or space. CAS highlights models that generate runs of misses, which are often more operationally risky than isolated errors with similar size.

Formally, for observation \(y_t\) and interval \([L_t, U_t]\) at level \(1-\alpha\), define the signed excess and magnitude \(m_t=\max(L_t-y_t,0)+\max(y_t-U_t,0)\). With the band width \(w_t=U_t-L_t\) and small \(\varepsilon>0\), the normalized excess is \(\tilde m_t = m_t / (w_t+\varepsilon)\). Let \(A_t=\mathbf{1}\{y_t<L_t \text{ or } y_t>U_t\}\) and \(d_t\) be a centered kernel average of either indicators (\(A_t\)) or magnitudes (\(\tilde m_t\)) over a window of size window_size. The pointwise severity is

(1)\[S_t \;=\; \tilde m_t \Bigl(1 + \lambda\, d_t^{\gamma}\Bigr),\]

with \(\lambda\ge 0\) and \(\gamma\ge 1\). The CAS score is the average \(n^{-1}\sum_t S_t\). Lower values indicate fewer and less clustered violations.

Parameters:
y_truearray_like of shape (n_samples,)

or (n_samples, n_outputs) Ground-truth targets. For multioutput, the same prediction interval (from y_pred) is applied to each output unless your wrapper expands bounds per output.

y_predarray_like of shape (n_samples, 2)

Predicted interval bounds. Column 0 is the lower bound \(L_t\); column 1 is the upper bound \(U_t\).

sample_weightarray_like of shape (n_samples,), default=None

Optional weights for averaging the final severities.

window_sizeint, default=21

Half-width plus one for the centered smoothing window used to compute \(d_t\). Larger values capture longer runs.

sort_byarray_like of shape (n_samples,), optional

Key used to order samples before computing \(d_t\). Typical choices are time, a spatial coordinate, or any ordering that makes clustering meaningful.

normalize{‘band’, ‘mad’, ‘none’}, default=’band’

Normalization for the excess \(m_t\).

  • ‘band’: divide by \(w_t=U_t-L_t\) (unit-free).

  • ‘mad’: divide by a robust global scale (median absolute deviation).

  • ‘none’: no normalization (units of the series).

density_source{‘indicator’, ‘magnitude’}, default=’indicator’

Source for computing \(d_t\).

  • ‘indicator’: kernel average of \(A_t\) (0/1 misses), matching the CAS definition in the paper.

  • ‘magnitude’: kernel average of normalized magnitude (more sensitive to large single misses).

kernel{‘box’, ‘triangular’, ‘epan’, ‘gaussian’}, default=’box’

Smoothing kernel used to compute the local density \(d_t\). The kernel’s shape determines how neighboring points are weighted when calculating the concentration of anomalies.

  • ‘box’: A rectangular (or uniform) kernel that gives equal weight to all points inside the window. This kernel is best for emphasizing the raw run length of anomalies, as it effectively counts misses within a fixed-size region.

  • ‘triangular’: A simple linear kernel where the central point receives the maximum weight, which then decreases linearly to zero at the window’s edges. It provides a smoother density estimate than the ‘box’ kernel.

  • ‘epan’: The Epanechnikov kernel, which assigns weights using an inverted parabola. It is statistically efficient and gives more weight to central points while smoothly tapering to zero. It’s a good choice for emphasizing the local prevalence of anomalies near the center of a cluster.

  • ‘gaussian’: A smooth kernel that assigns weights using a Gaussian (bell curve) function. It provides the smoothest density estimate, implying that an anomaly’s influence decays exponentially with distance from the center point.

lambda_float, default=1.0

Cluster penalty weight \(\lambda\). Larger values increase the contribution of \(d_t\).

gammafloat, default=1.0

Density nonlinearity \(\gamma\). Values \(>1\) accentuate dense clusters relative to sparse ones.

epsfloat, default=1e-12

Small positive number used in the band normalization denominator \((w_t+\varepsilon)\).

multioutput{‘raw_values’, ‘uniform_average’}, default=’uniform_average’

Aggregation across outputs when y_true is 2D.

  • ‘raw_values’: return per-output scores.

  • ‘uniform_average’: return the average score.

nan_policy{‘omit’, ‘propagate’, ‘raise’}, default=’omit’

How to handle NaN/inf in any inputs (y_true, bounds, sort_by, sample_weight). After optional sorting, a mask is built over all required columns.

  • ‘omit’ : drop invalid rows before computing CAS.

  • ‘propagate’ : return NaN (and None for details).

  • ‘raise’ : raise ValueError with a row count.

return_detailsbool, default=False

If True, also return a DataFrame with per-sample fields (is_anomaly, type, magnitude, local_density, severity). For multioutput, a list of DataFrames may be returned.

Returns:
scorefloat or ndarray of shape (n_outputs,)

The CAS score. Smaller is better.

(score, details)tuple

Returned if return_details=True. details contains the per-sample components used to compute CAS.

Parameters:

See also

clustered_anomaly_severity

Helper that accepts arrays or DataFrame columns and returns the CAS score (and details if requested).

kdiagram.utils.plot.plot_cas_layers

Layered, publication-ready line plot of intervals, severity stems, and anomalies.

kdiagram.utils.plot.plot_anomaly_glyphs

Polar glyph visualization that emphasizes clustering.

Notes

CAS complements proper scoring rules and coverage by focusing on organization of errors rather than only their average frequency or size. It is translation-invariant and, with normalize='band', unit-free. Setting lambda_=0 reduces CAS to an average normalized excess outside the interval, akin to the distance penalty in interval/Winkler scores. In contrast, lambda_>0 increases the score when violations cluster, capturing burstiness that aggregate scores may blur. The default density source (‘indicator’) follows the definition in the paper and is recommended for diagnostics.

Time complexity for a box kernel with window W is \(\mathcal{O}(nW)\) and memory \(\mathcal{O}(n)\). With FFT-based convolution for smooth kernels, the cost is typically \(\mathcal{O}(n\log n)\).

References

[R99bee64cbb50-1]

Gneiting, T., Balabdaoui, F., & Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. JRSS Series B, 69(2), 243–268.

[R99bee64cbb50-2]

Koenker, R., & Xiao, Z. (2006). Quantile autoregression. JASA, 101, 980–990.

[R99bee64cbb50-3]

Podsztavek, O., Jordan, A. I., Tvrdík, P., & Polsterer, K. L. (2024). Automatic Miscalibration Diagnosis: Interpreting PIT Histograms. ESANN.

[R99bee64cbb50-4]

Sokol, A. (2025). Fan charts 2.0: Flexible forecast distributions with expert judgement. International Journal of Forecasting, 41(3), 1148–1164.

Examples

Basic usage

>>> import numpy as np
>>> y_true = np.array([10, 25, 30, 45, 50])
>>> y_pred = np.array([[8, 12], [24, 26], [32, 33],
...                    [44, 46], [48, 52]])
>>> cas = cluster_aware_severity_score(
...     y_true, y_pred, window_size=3
... )
>>> float(cas)  

Sorting to control clustering

>>> sort_key = np.array([0, 2, 4, 1, 3])
>>> cas_unsorted = cluster_aware_severity_score(
...     y_true, y_pred, window_size=3
... )
>>> cas_sorted = cluster_aware_severity_score(
...     y_true, y_pred, window_size=3, sort_by=sort_key
... )
>>> (float(cas_sorted), float(cas_unsorted))  

Adjusting density source and kernel

>>> cas_mag = cluster_aware_severity_score(
...     y_true, y_pred, window_size=5,
...     density_source="magnitude", kernel="triangular"
... )
>>> float(cas_mag)  

Weighting and stronger cluster penalty

>>> w = np.array([1, 1, 5, 1, 1])
>>> cas_w = cluster_aware_severity_score(
...     y_true, y_pred, sample_weight=w,
...     lambda_=2.0, gamma=2.0
... )
>>> float(cas_w)