kdiagram.metrics.clustered_anomaly_severity

kdiagram.metrics.clustered_anomaly_severity(y_true: str, y_qlow: str, y_qup: str, data: pd.DataFrame, window_size: int = 21, return_details: Literal[False] = False) float[source]
kdiagram.metrics.clustered_anomaly_severity(y_true: str, y_qlow: str, y_qup: str, data: pd.DataFrame, return_details: Literal[True], window_size: int = 21) tuple[float, pd.DataFrame]
kdiagram.metrics.clustered_anomaly_severity(y_true: ArrayLike, y_qlow: ArrayLike, y_qup: ArrayLike, data: Literal[None] = None, window_size: int = 21, return_details: Literal[False] = False) float
kdiagram.metrics.clustered_anomaly_severity(y_true: ArrayLike, y_qlow: ArrayLike, y_qup: ArrayLike, return_details: Literal[True], data: Literal[None] = None, window_size: int = 21) tuple[float, pd.DataFrame]

Computes the Clustered Anomaly Severity (CAS) score.

This function serves as a direct helper for calculating the CAS score. It is designed to penalize not just the magnitude of forecast failures but also their concentration, providing a more nuanced view of reliability than standard metrics.

Parameters:
y_truestr or array_like

The ground truth (correct) target values. Can be a column name if data is provided, or a 1D array-like object.

y_qlowstr or array_like

The lower bound of the prediction interval. Can be a column name if data is provided, or a 1D array-like object.

y_qupstr or array_like

The upper bound of the prediction interval. Can be a column name if data is provided, or a 1D array-like object.

datapd.DataFrame, optional

Optional DataFrame containing the data for y_true, y_qlow, and y_qup. If provided, the aforementioned parameters must be strings representing column names.

window_sizeint, default=21

The size of the moving window used to calculate the local density of anomalies. A larger window considers a wider neighborhood for defining a “cluster”.

return_detailsbool, default=False

If True, the function returns a tuple containing the final CAS score and a DataFrame with detailed intermediate calculations (magnitude, density, severity, etc.).

Returns:
cas_scorefloat

The calculated Clustered Anomaly Severity (CAS) score. A lower score indicates better performance (less severe and less clustered anomalies).

(cas_score, details_df)tuple of (float, pd.DataFrame)

Returned only if return_details is True. The second element is a DataFrame containing per-sample calculations.

See also

cluster_aware_severity_score

A Scikit-learn compliant version of this metric with extended functionality (e.g., sample_weight, sort_by).

kdiagram.plot.uncertainty.plot_anomaly_severity

The specialized polar plot to visualize this metric.

Notes

The CAS score is composed of two main components, calculated for each data point \(i\):

  1. Anomaly Magnitude (\(m_i\)): The absolute distance from the true value to the nearest violated interval bound. It is zero if the point is covered.

    (1)\[\begin{split}m_i = \begin{cases} y_{qlow,i} - y_{true,i} & \text{if } y_{true,i} < y_{qlow,i} \\ y_{true,i} - y_{qup,i} & \text{if } y_{true,i} > y_{qup,i} \\ 0 & \text{otherwise} \end{cases}\end{split}\]
  2. Local Cluster Density (\(d_i\)): A measure of how concentrated anomalies are around point \(i\). It is calculated using a centered moving average of the magnitudes within a window_size.

The final severity for each point is \(s_i = m_i \cdot d_i\), and the overall CAS score is the mean of these severities.

References

Examples

>>> import numpy as np
>>> import pandas as pd
>>> # Example 1: Using NumPy arrays
>>> y_true = np.array([10, 25, 30, 45, 50])
>>> y_qlow = np.array([8, 24, 32, 44, 48])
>>> y_qup = np.array([12, 26, 33, 46, 52])
>>> cas = clustered_anomaly_severity(
...     y_true, y_qlow, y_qup, window_size=3
... )
>>> print(f"CAS Score (from arrays): {cas:.4f}")
CAS Score (from arrays): 0.2222
>>> # Example 2: Using a DataFrame and column names
>>> df = pd.DataFrame({
...     'actual': y_true, 'lower_bound': y_qlow, 'upper_bound': y_qup
... })
>>> cas_df, details = clustered_anomaly_severity(
...     'actual', 'lower_bound', 'upper_bound',
...     data=df, window_size=3, return_details=True
... )
>>> print(f"CAS Score (from DataFrame): {cas_df:.4f}")
CAS Score (from DataFrame): 0.2222
>>> print(details.head())
   y_true  y_qlow  y_qup  magnitude  is_anomaly   type  local_density  severity
0      10       8     12          0       False   none       0.000000  0.000000
1      25      24     26          0       False   none       0.666667  0.000000
2      30      32     33          2        True  under       0.666667  1.333333
3      45      44     46          0       False   none       0.666667  0.000000
4      50      48     52          0       False   none       0.000000  0.000000