kdiagram.utils.minmax_scaler¶
- kdiagram.utils.minmax_scaler(X: ndarray | DataFrame | Series, y: None = None, feature_range: tuple[float, float] = (0.0, 1.0), eps: float = 1e-08) ndarray[source]¶
- kdiagram.utils.minmax_scaler(X: ndarray | DataFrame | Series, y: ndarray | DataFrame | Series, feature_range: tuple[float, float] = (0.0, 1.0), eps: float = 1e-08) tuple[ndarray, ndarray]
Scale features to a specified range using a Min-Max approach.
This function transforms features by scaling each feature to a given range, typically [0, 1]. This method is robust to features with zero variance by adding a small epsilon to the denominator to prevent division-by-zero errors.
- Parameters:
- X{
numpy.ndarray,pandas.DataFrame,pandas.Series} The input data to scale. Can be a 1D array or a 2D matrix of features.
- y{
numpy.ndarray,pandas.DataFrame,pandas.Series},optional Optional target values to scale using the same approach. If provided, it is scaled independently of
X.- feature_range
tupleof(float,float), default=(0.0, 1.0) The desired range of the transformed data.
- eps
float, default=1e-8 A small constant added to the denominator to ensure numerical stability when a feature has zero variance.
- X{
- Returns:
- X_scaled
numpy.ndarray The transformed version of
X, with each feature scaled to the specifiedfeature_range.- y_scaled
numpy.ndarray,optional The scaled version of
y, returned only ifyis provided.
- X_scaled
See also
sklearn.preprocessing.MinMaxScalerThe scikit-learn equivalent.
Notes
The Min-Max scaling is a common preprocessing step for many machine learning algorithms that are sensitive to the magnitude of features.
For each feature (column) in the input data \(\mathbf{X}\), the transformation is calculated as:
(1)¶\[X_{\text{scaled}} = \text{min}_{\text{range}} + (\text{max}_{\text{range}} - \text{min}_{\text{range}}) \cdot \frac{\mathbf{X} - \min(\mathbf{X})} {(\max(\mathbf{X}) - \min(\mathbf{X})) + \varepsilon}\]where \(\text{min}_{\text{range}}\) and \(\text{max}_{\text{range}}\) are the bounds of the
feature_range, and \(\varepsilon\) is a small epsilon to prevent division by zero.Examples
>>> import numpy as np >>> from kdiagram.utils.mathext import minmax_scaler >>> >>> # Scale a 2D array >>> X = np.array([[1, 10], [2, 20], [3, 30]]) >>> X_scaled = minmax_scaler(X) >>> print(X_scaled) [[0. 0. ] [0.5 0.5] [1. 1. ]]
>>> # Scale to a different range >>> X_scaled_custom = minmax_scaler(X, feature_range=(-1, 1)) >>> print(X_scaled_custom) [[-1. -1.] [ 0. 0.] [ 1. 1.]]