kdiagram.utils.get_forecast_arrays¶

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: None = None, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → ndarray[source]¶

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: str, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → ndarray

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → ndarray

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: str | list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → tuple[ndarray, ndarray]

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: None = None, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → Series

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: str, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → Series

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → DataFrame

kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: str | list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) → tuple[Series, Series | DataFrame]

Extract true and/or predicted values from a DataFrame.

This is a flexible bridge between a DataFrame-centric workflow and NumPy-based utilities. It supports dropping or filling NAs, numeric coercion, and optional index return, providing a robust way to prepare data for analysis.

Parameters:

dfpd.DataFrame

The source DataFrame.

actual_colstr, optional

The name of the column holding the ground-truth values.

pred_colsstr or list of str, optional

The name(s) of the prediction column(s). A string implies a single point forecast; a list implies multiple columns, such as for quantile forecasts.

drop_nabool, default=True

If True, drop rows with missing data according to the na_policy.

na_policy{“any”, “all”, “none”}, default=”any”

The policy for dropping rows with NA values:

“any”: Drop rows if any selected column has an NA.
“all”: Drop rows only if all selected columns are NA.
“none”: Do not drop rows based on NAs.

fillnascalar, dict or {“ffill”, “bfill”}, optional

A value or method to use for filling NA values before any dropping occurs.

return_as{“numpy”, “pandas”}, default=”numpy”

The desired container type for the output.

squeezebool, default=True

If True and a single prediction column is requested, the output will be squeezed to a 1D array or Series.

with_indexbool, default=False

If True, the DataFrame index is returned as the first item in the output tuple.

sort_indexbool, default=False

If True, the DataFrame is sorted by its index before extracting the data.

dtypeobject, optional

The target data type for the output arrays or Series.

ensure_numericbool, default=False

If True, raises an error if any selected column is not of a numeric data type.

coerce_numericbool, default=False

If True and ensure_numeric=True, attempts to convert non-numeric columns to a numeric type, with invalid parsing resulting in NaN.

copybool, default=True

If True, operates on a copy of the data, ensuring the original DataFrame is not modified.

Returns:

np.ndarray, pd.Series, pd.DataFrame, or tuple

The return type depends on the input parameters:

If only actual_col is provided -> y_true
If only pred_cols is provided -> y_pred(s)
If both are provided -> (y_true, y_pred(s))

If with_index=True, the index is prepended to the return value(s).

See also

compute_forecast_errors: A utility that uses this function’s output.
compute_pit: Another utility that benefits from this data extraction.

Notes

This function is designed to be the primary entry point for extracting data before passing it to the mathematical or plotting functions in k-diagram. It provides a single, consistent interface for handling various data cleaning and formatting tasks.

For return_as="numpy" with a single prediction column, the output is a 1D array by default. To preserve the 2D column vector shape (n, 1), set squeeze=False.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.utils.mathext import get_forecast_arrays
>>>
>>> df = pd.DataFrame({
...     'actual': [10, 20, 30, 40, np.nan],
...     'pred_point': [12, 18, 33, 42, 48],
...     'q10': [8, 15, 25, 35, 45],
...     'q90': [12, 25, 35, 45, 55],
... })
>>>
>>> # Example 1: Get both true and quantile predictions as NumPy arrays
>>> y_true, y_preds_q = get_forecast_arrays(
...     df, actual_col='actual', pred_cols=['q10', 'q90']
... )
>>> print("--- True Values (NumPy) ---")
>>> print(y_true)
>>> print("\\n--- Quantile Predictions (NumPy) ---")
>>> print(y_preds_q)

Expected Output for Example 1¶

--- True Values (NumPy) ---
[10. 20. 30. 40.]

--- Quantile Predictions (NumPy) ---
[[ 8 12]
 [15 25]
 [25 35]
 [35 45]]

>>> # Example 2: Get a single prediction as a pandas Series
>>> y_preds_series = get_forecast_arrays(
...     df, pred_cols='pred_point', return_as='pandas', drop_na=False
... )
>>> print("\\n--- Point Predictions (pandas Series) ---")
>>> print(y_preds_series)

Expected Output for Example 2¶

--- Point Predictions (pandas Series) ---
  12
  18
  33
  42
  48
Name: pred_point, dtype: int64