kdiagram.utils.detect_quantiles_in¶
- kdiagram.utils.detect_quantiles_in(df, col_prefix=None, dt_value=None, mode='soft', return_types='columns', verbose=0)[source]¶
Detect quantile columns in a DataFrame using naming patterns and value validation.
Identifies columns containing quantile data through structured naming conventions and value validation. Supports both absolute and normalized quantile representations through mode-based value adjustment.
- Parameters:
df (
pd.DataFrame) – Input DataFrame containing potential quantile columns. Column names must be strings.col_prefix (
str, optional) – Column name prefix for targeted search (e.g.,'price'forprice_q0.25). If None, scans all columns.dt_value (
listofstr, optional) – Date filters for temporal quantile detection (e.g.,['2023']matches columns likeprice_2023_q0.5).mode (
{'soft', 'strict'}, default'soft') – Value handling strategy: -'soft': Normalizes values >1 to 1.0 using min-max scaling -'strict': Excludes values outside [0,1] rangereturn_types (
{'columns', 'q_val', 'values', 'frame'}, default'columns') – Return format specification: -'columns': List of column names -'q_val': Sorted unique quantile values -'values': Column data arrays -'frame': DataFrame subsetverbose (
{0, 1, 2, 3}, default0) – Output verbosity: - 0: Silent - 1: Basic scan info - 2: Per-column matches - 3: Full diagnostic output
- Returns:
Quantile data in format specified by
return_types. Returns None if no quantiles detected.- Return type:
Union[List[str],List[float],List[np.ndarray],pd.DataFrame,None]
Notes
The detection adjustment can be formulated as :
\[\begin{split}q_{\text{adj}} = \begin{cases} \min(1, \max(0, q_{\text{raw}})) & \text{if } mode=\text{'soft'} \\ q_{\text{raw}} & \text{if } q \in [0,1] \text{ and } mode=\text{'strict'} \end{cases}\end{split}\]Examples
>>> from kdiagram.utils.diagnose_q import detect_quantiles_in >>> import pandas as pd
# Basic detection >>> df = pd.DataFrame({‘sales_q0.25’: [4.2], ‘sales_q0.75’: [5.8]}) >>> detect_quantiles_in(df, col_prefix=’sales’) [‘sales_q0.25’, ‘sales_q0.75’]
# Temporal quantile filtering >>> df = pd.DataFrame({‘temp_2023_q0.5’: [22.1], ‘temp_2024_q0.5’: [23.4]}) >>> detect_quantiles_in(df, dt_value=[‘2023’], return_types=’q_val’) [0.5]
# Value normalization >>> df = pd.DataFrame({‘risk_q150’: [0.8]}) >>> detect_quantiles_in(df, mode=’soft’, return_types=’q_val’) [1.0]
Notes
Column name pattern requirements: - Requires
_qXsuffix where X is numeric - Temporal format:{prefix}_{date}_q{value}- Non-temporal format:{prefix}_q{value}Value adjustment in soft mode uses piecewise function: - Clips values to [0,1] range - Preserves original values within valid range
See also
gofast.utils.validate_quantilesFor quantile value validation
pandas.DataFrame.filterFor column selection by pattern
References