kdiagram.utils.bin_by_feature¶
- kdiagram.utils.bin_by_feature(df, bin_on_col, target_cols, n_bins=10, agg_funcs='mean')[source]¶
Bins data by a feature and computes aggregate statistics.
This is a powerful data wrangling utility that groups a DataFrame into bins based on the values in a specified column (
bin_on_col). It then calculates aggregate statistics (like mean, std, etc.) for one or more target columns within each bin. This is the core logic behind plots likeplot_error_bands.- Parameters:
- df
pd.DataFrame The input DataFrame.
- bin_on_col
str The name of the column whose values will be used for binning. This column must contain numeric data.
- target_cols
strorlistofstr The name(s) of the column(s) for which to compute statistics.
- n_bins
int, default=10 The number of equal-width bins to create.
- agg_funcs
str,listofstr,ordict, default=’mean’ The aggregation function(s) to apply. Can be any function accepted by pandas’
.agg()method (e.g., ‘mean’, ‘std’, [‘mean’, ‘std’], or {‘col_A’: ‘sum’}).
- df
- Returns:
pd.DataFrameA DataFrame containing the aggregate statistics for each bin.
- Parameters:
- Return type:
DataFrame
See also
pandas.cutThe underlying pandas function used for binning.
pandas.DataFrame.groupbyThe underlying pandas function for aggregation.
plot_error_bandsA plot that uses this binning logic.
Notes
This function first uses
pandas.cutto partition the values inbin_on_colinton_binsdiscrete, equal-width intervals. It then usespandas.DataFrame.groupbyto group the DataFrame by these new bins and applies the specified aggregation function(s) to thetarget_colsfor each group.Examples
>>> import pandas as pd >>> from kdiagram.utils.forecast_utils import bin_by_feature >>> >>> df = pd.DataFrame({ ... 'forecast_value': [10, 12, 20, 22, 30, 32], ... 'error': [-1, 1.5, -2, 2.5, -3, 3.5] ... }) >>> >>> # Calculate the mean and standard deviation of the error, >>> # binned by the forecast value. >>> binned_stats = bin_by_feature( ... df, ... bin_on_col='forecast_value', ... target_cols='error', ... n_bins=3, ... agg_funcs=['mean', 'std'] ... ) >>> print(binned_stats) forecast_value_bin mean std 0 (9.978, 17.333] 0.25 1.767767 1 (17.333, 24.667] 0.25 3.181981 2 (24.667, 32.0] 0.25 4.596194