kdiagram.utils.compute_forecast_errors

kdiagram.utils.compute_forecast_errors(df, actual_col, *pred_cols, error_type='raw', prefix='error_', inplace=False)[source]

Computes forecast errors for one or more models.

This is a core data preparation utility that calculates the difference between true and predicted values. It supports several common error types and can operate on multiple prediction columns at once, making it easy to prepare data for the diagnostic plots in the kdiagram.plot.errors module.

Parameters:
dfpd.DataFrame

The input DataFrame containing the actual and predicted values.

actual_colstr

The name of the column containing the true observed values.

*pred_colsstr

One or more column names containing the predicted values from different models.

error_type{‘raw’, ‘absolute’, ‘squared’, ‘percentage’}, default=’raw’

The type of error to calculate:

  • ‘raw’: \(y_{true} - y_{pred}\)

  • ‘absolute’: \(|y_{true} - y_{pred}|\)

  • ‘squared’: \((y_{true} - y_{pred})^2\)

  • ‘percentage’: \(100 \cdot (y_{true} - y_{pred}) / y_{true}\)

prefixstr, default=’error_’

The prefix to add to the new error column names. For example, a prediction column ‘Model_A’ will become ‘error_Model_A’.

inplacebool, default=False

If True, modifies the original DataFrame by adding the new columns. If False (default), returns a new DataFrame.

Returns:
pd.DataFrame

The DataFrame with the new error column(s) added.

Raises:
ValueError

If no prediction columns are provided or if the specified error_type is invalid.

Parameters:
  • df (DataFrame)

  • actual_col (str)

  • pred_cols (str)

  • error_type (Literal['raw', 'absolute', 'squared', 'percentage'])

  • prefix (str)

  • inplace (bool)

Return type:

DataFrame

See also

plot_error_violins

A plot that directly uses these error columns.

plot_error_bands

A plot that uses these errors for aggregation.

Notes

The forecast error (or residual), \(e_i\), for an observation \(i\) is the fundamental quantity for diagnosing model performance. This function calculates it in several forms:

  1. Raw Error: The simple difference, which preserves the direction of the error (positive for under-prediction, negative for over-prediction).

    (1)\[e_i = y_{true,i} - y_{pred,i}\]
  2. Absolute Error: The magnitude of the error, which is always non-negative.

    (2)\[e_{abs,i} = |y_{true,i} - y_{pred,i}|\]
  3. Squared Error: Penalizes larger errors more heavily.

    (3)\[e_{sq,i} = (y_{true,i} - y_{pred,i})^2\]
  4. Percentage Error: Expresses the error as a percentage of the true value. Note that this can be unstable if \(y_{true,i}\) is close to zero.

    (4)\[e_{\%,i} = 100 \cdot \frac{y_{true,i} - y_{pred,i}}{y_{true,i}}\]

Examples

>>> import pandas as pd
>>> from kdiagram.utils.forecast_utils import compute_forecast_errors
>>>
>>> df = pd.DataFrame({
...     'actual': [10, 20, 30],
...     'model_A_preds': [12, 18, 33],
...     'model_B_preds': [10, 25, 28],
... })
>>>
>>> # Calculate raw and absolute errors for both models
>>> df_errors_raw = compute_forecast_errors(
...     df, 'actual', 'model_A_preds', 'model_B_preds'
... )
>>> df_errors_abs = compute_forecast_errors(
...     df, 'actual', 'model_A_preds', 'model_B_preds',
...     error_type='absolute', prefix='abs_error_'
... )
>>> print(df_errors_raw)
   actual  model_A_preds  model_B_preds  error_model_A_preds  error_model_B_preds
0      10             12             10                   -2                    0
1      20             18             25                    2                   -5
2      30             33             28                   -3                    2