kdiagram.utils.detect_quantiles_in

kdiagram.utils.detect_quantiles_in(df, col_prefix=None, dt_value=None, mode='soft', return_types='columns', verbose=0)[source]

Detect quantile columns in a DataFrame using naming patterns and value validation.

Identifies columns containing quantile data through structured naming conventions and value validation [1]. Supports both absolute and normalized quantile representations through mode-based value adjustment [2].

Parameters:
dfpd.DataFrame

Input DataFrame containing potential quantile columns. Column names must be strings.

col_prefixstr, optional

Column name prefix for targeted search (e.g., 'price' for price_q0.25). If None, scans all columns.

dt_valuelist of str, optional

Date filters for temporal quantile detection (e.g., ['2023'] matches columns like price_2023_q0.5).

mode{‘soft’, ‘strict’}, default=’soft’

Value handling strategy: - 'soft': Normalizes values >1 to 1.0 using min-max scaling - 'strict': Excludes values outside [0,1] range

return_types{‘columns’, ‘q_val’, ‘values’, ‘frame’}, default=’columns’

Return format specification: - 'columns': List of column names - 'q_val': Sorted unique quantile values - 'values': Column data arrays - 'frame': DataFrame subset

verbose{0, 1, 2, 3}, default=0

Output verbosity: - 0: Silent - 1: Basic scan info - 2: Per-column matches - 3: Full diagnostic output

Returns:
Union[List[str], List[float], List[np.ndarray], pd.DataFrame, None]

Quantile data in format specified by return_types. Returns None if no quantiles detected.

Parameters:
  • df (DataFrame)

  • col_prefix (str | None)

  • dt_value (list[str] | None)

  • mode (str)

  • return_types (str)

  • verbose (int)

Return type:

list[str] | list[float] | list[ndarray] | DataFrame | None

See also

kdiagram.utils.validate_quantiles

For quantile value validation

pandas.DataFrame.filter

For column selection by pattern

Notes

The detection adjustment can be formulated as :

(1)\[\begin{split}q_{\text{adj}} = \begin{cases} \min(1, \max(0, q_{\text{raw}})) & \text{if } mode=\text{'soft'} \\ q_{\text{raw}} & \text{if } q \in [0,1] \text{ and } mode=\text{'strict'} \end{cases}\end{split}\]
  1. Column name pattern requirements: - Requires _qX suffix where X is numeric - Temporal format: {prefix}_{date}_q{value} - Non-temporal format: {prefix}_q{value}

  2. Value adjustment in soft mode uses piecewise function: - Clips values to [0,1] range - Preserves original values within valid range

References

[1]

Regular Expression HOWTO, Python Documentation

[2]

Pandas API Reference: DataFrame operations

Examples

>>> from kdiagram.utils.diagnose_q import detect_quantiles_in
>>> import pandas as pd
>>>
>>> # Basic detection
>>> df = pd.DataFrame({'sales_q0.25': [4.2], 'sales_q0.75': [5.8]})
>>> detect_quantiles_in(df, col_prefix='sales')
['sales_q0.25', 'sales_q0.75']
>>>
>>> # Temporal quantile filtering
>>> df = pd.DataFrame({'temp_2023_q0.5': [22.1], 'temp_2024_q0.5': [23.4]})
>>> detect_quantiles_in(df, dt_value=['2023'], return_types='q_val')
[0.5]
>>>
>>> # Value normalization
>>> df = pd.DataFrame({'risk_q150': [0.8]})
>>> detect_quantiles_in(df, mode='soft', return_types='q_val')
[1.0]