kdiagram.utils.minmax_scaler¶

kdiagram.utils.minmax_scaler(X: ndarray | DataFrame | Series, y: None = None, feature_range: tuple[float, float] = (0.0, 1.0), eps: float = 1e-08) → ndarray[source]¶

kdiagram.utils.minmax_scaler(X: ndarray | DataFrame | Series, y: ndarray | DataFrame | Series, feature_range: tuple[float, float] = (0.0, 1.0), eps: float = 1e-08) → tuple[ndarray, ndarray]

Scale features to a specified range using a Min-Max approach.

This function transforms features by scaling each feature to a given range, typically [0, 1]. This method is robust to features with zero variance by adding a small epsilon to the denominator to prevent division-by-zero errors.

Parameters:

X{numpy.ndarray, pandas.DataFrame, pandas.Series}: The input data to scale. Can be a 1D array or a 2D matrix of features.
y{numpy.ndarray, pandas.DataFrame, pandas.Series}, optional: Optional target values to scale using the same approach. If provided, it is scaled independently of X.
feature_rangetuple of (float, float), default=(0.0, 1.0): The desired range of the transformed data.
epsfloat, default=1e-8: A small constant added to the denominator to ensure numerical stability when a feature has zero variance.

Returns:

X_scalednumpy.ndarray: The transformed version of X, with each feature scaled to the specified feature_range.
y_scalednumpy.ndarray, optional: The scaled version of y, returned only if y is provided.

See also

sklearn.preprocessing.MinMaxScaler: The scikit-learn equivalent.

Notes

The Min-Max scaling is a common preprocessing step for many machine learning algorithms that are sensitive to the magnitude of features.

For each feature (column) in the input data \(\mathbf{X}\), the transformation is calculated as:

(1)¶\[X_{\text{scaled}} = \text{min}_{\text{range}} + (\text{max}_{\text{range}} - \text{min}_{\text{range}}) \cdot \frac{\mathbf{X} - \min(\mathbf{X})} {(\max(\mathbf{X}) - \min(\mathbf{X})) + \varepsilon}\]

where \(\text{min}_{\text{range}}\) and \(\text{max}_{\text{range}}\) are the bounds of the feature_range, and \(\varepsilon\) is a small epsilon to prevent division by zero.

Examples

>>> import numpy as np
>>> from kdiagram.utils.mathext import minmax_scaler
>>>
>>> # Scale a 2D array
>>> X = np.array([[1, 10], [2, 20], [3, 30]])
>>> X_scaled = minmax_scaler(X)
>>> print(X_scaled)
[[0.  0. ]
 [0.5 0.5]
 [1.  1. ]]

>>> # Scale to a different range
>>> X_scaled_custom = minmax_scaler(X, feature_range=(-1, 1))
>>> print(X_scaled_custom)
[[-1. -1.]
 [ 0.  0.]
 [ 1.  1.]]