kdiagram.utils.compute_forecast_errors¶
- kdiagram.utils.compute_forecast_errors(df, actual_col, *pred_cols, error_type='raw', prefix='error_', inplace=False)[source]¶
Computes forecast errors for one or more models.
This is a core data preparation utility that calculates the difference between true and predicted values. It supports several common error types and can operate on multiple prediction columns at once, making it easy to prepare data for the diagnostic plots in the
kdiagram.plot.errorsmodule.- Parameters:
- df
pd.DataFrame The input DataFrame containing the actual and predicted values.
- actual_col
str The name of the column containing the true observed values.
- *pred_cols
str One or more column names containing the predicted values from different models.
- error_type{‘raw’, ‘absolute’, ‘squared’, ‘percentage’}, default=’raw’
The type of error to calculate:
‘raw’: \(y_{true} - y_{pred}\)
‘absolute’: \(|y_{true} - y_{pred}|\)
‘squared’: \((y_{true} - y_{pred})^2\)
‘percentage’: \(100 \cdot (y_{true} - y_{pred}) / y_{true}\)
- prefix
str, default=’error_’ The prefix to add to the new error column names. For example, a prediction column ‘Model_A’ will become ‘error_Model_A’.
- inplacebool, default=False
If
True, modifies the original DataFrame by adding the new columns. IfFalse(default), returns a new DataFrame.
- df
- Returns:
pd.DataFrameThe DataFrame with the new error column(s) added.
- Raises:
ValueErrorIf no prediction columns are provided or if the specified
error_typeis invalid.
- Parameters:
- Return type:
DataFrame
See also
plot_error_violinsA plot that directly uses these error columns.
plot_error_bandsA plot that uses these errors for aggregation.
Notes
The forecast error (or residual), \(e_i\), for an observation \(i\) is the fundamental quantity for diagnosing model performance. This function calculates it in several forms:
Raw Error: The simple difference, which preserves the direction of the error (positive for under-prediction, negative for over-prediction).
(1)¶\[e_i = y_{true,i} - y_{pred,i}\]Absolute Error: The magnitude of the error, which is always non-negative.
(2)¶\[e_{abs,i} = |y_{true,i} - y_{pred,i}|\]Squared Error: Penalizes larger errors more heavily.
(3)¶\[e_{sq,i} = (y_{true,i} - y_{pred,i})^2\]Percentage Error: Expresses the error as a percentage of the true value. Note that this can be unstable if \(y_{true,i}\) is close to zero.
(4)¶\[e_{\%,i} = 100 \cdot \frac{y_{true,i} - y_{pred,i}}{y_{true,i}}\]
Examples
>>> import pandas as pd >>> from kdiagram.utils.forecast_utils import compute_forecast_errors >>> >>> df = pd.DataFrame({ ... 'actual': [10, 20, 30], ... 'model_A_preds': [12, 18, 33], ... 'model_B_preds': [10, 25, 28], ... }) >>> >>> # Calculate raw and absolute errors for both models >>> df_errors_raw = compute_forecast_errors( ... df, 'actual', 'model_A_preds', 'model_B_preds' ... ) >>> df_errors_abs = compute_forecast_errors( ... df, 'actual', 'model_A_preds', 'model_B_preds', ... error_type='absolute', prefix='abs_error_' ... ) >>> print(df_errors_raw) actual model_A_preds model_B_preds error_model_A_preds error_model_B_preds 0 10 12 10 -2 0 1 20 18 25 2 -5 2 30 33 28 -3 2