kdiagram.utils.get_forecast_arrays¶
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: None = None, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) ndarray[source]¶
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: str, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) ndarray
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) ndarray
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: str | list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['numpy'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) tuple[ndarray, ndarray]
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: None = None, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) Series
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: str, *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) Series
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: None, pred_cols: list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) DataFrame
- kdiagram.utils.get_forecast_arrays(df: DataFrame, actual_col: str, pred_cols: str | list[str], *, drop_na: bool = True, na_policy: Literal['any', 'all', 'none'] = 'any', fillna: object | None = None, return_as: Literal['pandas'] = 'numpy', squeeze: bool = True, with_index: bool = False, sort_index: bool = False, dtype: object | None = None, ensure_numeric: bool = False, coerce_numeric: bool = False, copy: bool = True) tuple[Series, Series | DataFrame]
Extract true and/or predicted values from a DataFrame.
This is a flexible bridge between a DataFrame-centric workflow and NumPy-based utilities. It supports dropping or filling NAs, numeric coercion, and optional index return, providing a robust way to prepare data for analysis.
- Parameters:
- df
pd.DataFrame The source DataFrame.
- actual_col
str,optional The name of the column holding the ground-truth values.
- pred_cols
strorlistofstr,optional The name(s) of the prediction column(s). A string implies a single point forecast; a list implies multiple columns, such as for quantile forecasts.
- drop_nabool, default=True
If
True, drop rows with missing data according to thena_policy.- na_policy{“any”, “all”, “none”}, default=”any”
The policy for dropping rows with NA values:
“any”: Drop rows if any selected column has an NA.
“all”: Drop rows only if all selected columns are NA.
“none”: Do not drop rows based on NAs.
- fillnascalar,
dictor{“ffill”, “bfill”},optional A value or method to use for filling NA values before any dropping occurs.
- return_as{“numpy”, “pandas”}, default=”numpy”
The desired container type for the output.
- squeezebool, default=True
If
Trueand a single prediction column is requested, the output will be squeezed to a 1D array or Series.- with_indexbool, default=False
If
True, the DataFrame index is returned as the first item in the output tuple.- sort_indexbool, default=False
If
True, the DataFrame is sorted by its index before extracting the data.- dtype
object,optional The target data type for the output arrays or Series.
- ensure_numericbool, default=False
If
True, raises an error if any selected column is not of a numeric data type.- coerce_numericbool, default=False
If
Trueandensure_numeric=True, attempts to convert non-numeric columns to a numeric type, with invalid parsing resulting in NaN.- copybool, default=True
If
True, operates on a copy of the data, ensuring the original DataFrame is not modified.
- df
- Returns:
np.ndarray,pd.Series,pd.DataFrame,ortupleThe return type depends on the input parameters:
If only
actual_colis provided -> y_trueIf only
pred_colsis provided -> y_pred(s)If both are provided -> (y_true, y_pred(s))
If
with_index=True, the index is prepended to the return value(s).
See also
compute_forecast_errorsA utility that uses this function’s output.
compute_pitAnother utility that benefits from this data extraction.
Notes
This function is designed to be the primary entry point for extracting data before passing it to the mathematical or plotting functions in k-diagram. It provides a single, consistent interface for handling various data cleaning and formatting tasks.
For
return_as="numpy"with a single prediction column, the output is a 1D array by default. To preserve the 2D column vector shape(n, 1), setsqueeze=False.Examples
>>> import pandas as pd >>> import numpy as np >>> from kdiagram.utils.mathext import get_forecast_arrays >>> >>> df = pd.DataFrame({ ... 'actual': [10, 20, 30, 40, np.nan], ... 'pred_point': [12, 18, 33, 42, 48], ... 'q10': [8, 15, 25, 35, 45], ... 'q90': [12, 25, 35, 45, 55], ... }) >>> >>> # Example 1: Get both true and quantile predictions as NumPy arrays >>> y_true, y_preds_q = get_forecast_arrays( ... df, actual_col='actual', pred_cols=['q10', 'q90'] ... ) >>> print("--- True Values (NumPy) ---") >>> print(y_true) >>> print("\\n--- Quantile Predictions (NumPy) ---") >>> print(y_preds_q)
Expected Output for Example 1¶--- True Values (NumPy) --- [10. 20. 30. 40.] --- Quantile Predictions (NumPy) --- [[ 8 12] [15 25] [25 35] [35 45]]
>>> # Example 2: Get a single prediction as a pandas Series >>> y_preds_series = get_forecast_arrays( ... df, pred_cols='pred_point', return_as='pandas', drop_na=False ... ) >>> print("\\n--- Point Predictions (pandas Series) ---") >>> print(y_preds_series)
Expected Output for Example 2¶--- Point Predictions (pandas Series) --- 0 12 1 18 2 33 3 42 4 48 Name: pred_point, dtype: int64