kdiagram.metrics.clustered_anomaly_severity¶
- kdiagram.metrics.clustered_anomaly_severity(y_true: str, y_qlow: str, y_qup: str, data: pd.DataFrame, window_size: int = 21, return_details: Literal[False] = False) float[source]¶
- kdiagram.metrics.clustered_anomaly_severity(y_true: str, y_qlow: str, y_qup: str, data: pd.DataFrame, return_details: Literal[True], window_size: int = 21) tuple[float, pd.DataFrame]
- kdiagram.metrics.clustered_anomaly_severity(y_true: ArrayLike, y_qlow: ArrayLike, y_qup: ArrayLike, data: Literal[None] = None, window_size: int = 21, return_details: Literal[False] = False) float
- kdiagram.metrics.clustered_anomaly_severity(y_true: ArrayLike, y_qlow: ArrayLike, y_qup: ArrayLike, return_details: Literal[True], data: Literal[None] = None, window_size: int = 21) tuple[float, pd.DataFrame]
Computes the Clustered Anomaly Severity (CAS) score.
This function serves as a direct helper for calculating the CAS score. It is designed to penalize not just the magnitude of forecast failures but also their concentration, providing a more nuanced view of reliability than standard metrics.
- Parameters:
- y_true
stror array_like The ground truth (correct) target values. Can be a column name if data is provided, or a 1D array-like object.
- y_qlow
stror array_like The lower bound of the prediction interval. Can be a column name if data is provided, or a 1D array-like object.
- y_qup
stror array_like The upper bound of the prediction interval. Can be a column name if data is provided, or a 1D array-like object.
- data
pd.DataFrame,optional Optional DataFrame containing the data for y_true, y_qlow, and y_qup. If provided, the aforementioned parameters must be strings representing column names.
- window_size
int, default=21 The size of the moving window used to calculate the local density of anomalies. A larger window considers a wider neighborhood for defining a “cluster”.
- return_detailsbool, default=False
If
True, the function returns a tuple containing the final CAS score and a DataFrame with detailed intermediate calculations (magnitude, density, severity, etc.).
- y_true
- Returns:
- cas_score
float The calculated Clustered Anomaly Severity (CAS) score. A lower score indicates better performance (less severe and less clustered anomalies).
- (cas_score, details_df)
tupleof(float,pd.DataFrame) Returned only if return_details is
True. The second element is a DataFrame containing per-sample calculations.
- cas_score
See also
cluster_aware_severity_scoreA Scikit-learn compliant version of this metric with extended functionality (e.g., sample_weight, sort_by).
kdiagram.plot.uncertainty.plot_anomaly_severityThe specialized polar plot to visualize this metric.
Notes
The CAS score is composed of two main components, calculated for each data point \(i\):
Anomaly Magnitude (\(m_i\)): The absolute distance from the true value to the nearest violated interval bound. It is zero if the point is covered.
(1)¶\[\begin{split}m_i = \begin{cases} y_{qlow,i} - y_{true,i} & \text{if } y_{true,i} < y_{qlow,i} \\ y_{true,i} - y_{qup,i} & \text{if } y_{true,i} > y_{qup,i} \\ 0 & \text{otherwise} \end{cases}\end{split}\]Local Cluster Density (\(d_i\)): A measure of how concentrated anomalies are around point \(i\). It is calculated using a centered moving average of the magnitudes within a window_size.
The final severity for each point is \(s_i = m_i \cdot d_i\), and the overall CAS score is the mean of these severities.
References
Examples
>>> import numpy as np >>> import pandas as pd >>> # Example 1: Using NumPy arrays >>> y_true = np.array([10, 25, 30, 45, 50]) >>> y_qlow = np.array([8, 24, 32, 44, 48]) >>> y_qup = np.array([12, 26, 33, 46, 52]) >>> cas = clustered_anomaly_severity( ... y_true, y_qlow, y_qup, window_size=3 ... ) >>> print(f"CAS Score (from arrays): {cas:.4f}") CAS Score (from arrays): 0.2222
>>> # Example 2: Using a DataFrame and column names >>> df = pd.DataFrame({ ... 'actual': y_true, 'lower_bound': y_qlow, 'upper_bound': y_qup ... }) >>> cas_df, details = clustered_anomaly_severity( ... 'actual', 'lower_bound', 'upper_bound', ... data=df, window_size=3, return_details=True ... ) >>> print(f"CAS Score (from DataFrame): {cas_df:.4f}") CAS Score (from DataFrame): 0.2222 >>> print(details.head()) y_true y_qlow y_qup magnitude is_anomaly type local_density severity 0 10 8 12 0 False none 0.000000 0.000000 1 25 24 26 0 False none 0.666667 0.000000 2 30 32 33 2 True under 0.666667 1.333333 3 45 44 46 0 False none 0.666667 0.000000 4 50 48 52 0 False none 0.000000 0.000000