Skip to content

StandardScaler

yohou.preprocessing.sklearn_wrappers.StandardScaler

Bases: SklearnScaler

Standardize features by removing the mean and scaling to unit variance.

The standard score of a sample x is calculated as::

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform().

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

StandardScaler is sensitive to outliers, and the features may scale differently from each other in the presence of outliers. For outlier-robust scaling, use RobustScaler instead.

This is a Yohou wrapper that preserves the polars DataFrame structure and "time" column.

Parameters

Name Type Description Default
with_mean bool

If True, center the data before scaling.

True
with_std bool

If True, scale the data to unit variance (or equivalently, unit standard deviation).

True

Attributes

Name Type Description
instance_ StandardScaler

The fitted sklearn StandardScaler instance.

scale_ ndarray of shape (n_features,) or None

Per feature relative scaling of the data to achieve zero mean and unit variance. Equal to None when with_std=False.

mean_ ndarray of shape (n_features,) or None

The mean value for each feature in the training set. Equal to None when with_mean=False and with_std=False.

var_ ndarray of shape (n_features,) or None

The variance for each feature in the training set. Equal to None when with_mean=False and with_std=False.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import StandardScaler
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
... })
>>> scaler = StandardScaler()
>>> scaler.fit(X)
StandardScaler(...)
>>> X_scaled = scaler.transform(X)
>>> # Values are standardized (mean=0, std=1)
>>> round(X_scaled["value"].mean(), 10)
0.0

See Also

Source Code

Show/Hide source
class StandardScaler(SklearnScaler):
    """Standardize features by removing the mean and scaling to unit variance.

    The standard score of a sample ``x`` is calculated as::

        z = (x - u) / s

    where ``u`` is the mean of the training samples or zero if ``with_mean=False``,
    and ``s`` is the standard deviation of the training samples or one if
    ``with_std=False``.

    Centering and scaling happen independently on each feature by computing the
    relevant statistics on the samples in the training set. Mean and standard
    deviation are then stored to be used on later data using ``transform()``.

    Standardization of a dataset is a common requirement for many machine learning
    estimators: they might behave badly if the individual features do not more or
    less look like standard normally distributed data (e.g. Gaussian with 0 mean
    and unit variance).

    ``StandardScaler`` is sensitive to outliers, and the features may scale
    differently from each other in the presence of outliers. For outlier-robust
    scaling, use ``RobustScaler`` instead.

    This is a Yohou wrapper that preserves the polars DataFrame structure and
    "time" column.

    Parameters
    ----------
    with_mean : bool, default=True
        If True, center the data before scaling.

    with_std : bool, default=True
        If True, scale the data to unit variance (or equivalently, unit standard
        deviation).

    Attributes
    ----------
    instance_ : sklearn.preprocessing.StandardScaler
        The fitted sklearn StandardScaler instance.

    scale_ : ndarray of shape (n_features,) or None
        Per feature relative scaling of the data to achieve zero mean and unit
        variance. Equal to ``None`` when ``with_std=False``.

    mean_ : ndarray of shape (n_features,) or None
        The mean value for each feature in the training set. Equal to ``None``
        when ``with_mean=False`` and ``with_std=False``.

    var_ : ndarray of shape (n_features,) or None
        The variance for each feature in the training set. Equal to ``None``
        when ``with_mean=False`` and ``with_std=False``.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.preprocessing import StandardScaler
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
    ...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
    ... })
    >>> scaler = StandardScaler()
    >>> scaler.fit(X)  # doctest: +ELLIPSIS
    StandardScaler(...)
    >>> X_scaled = scaler.transform(X)
    >>> # Values are standardized (mean=0, std=1)
    >>> round(X_scaled["value"].mean(), 10)
    0.0

    See Also
    --------
    - [`MinMaxScaler`][yohou.preprocessing.sklearn_wrappers.MinMaxScaler] : Scale features to a given range.
    - [`RobustScaler`][yohou.preprocessing.sklearn_wrappers.RobustScaler] : Scale using statistics robust to outliers.

    """

    _estimator_default_class = sklearn_StandardScaler

    def __init__(self, with_mean=True, with_std=True, copy=True, **kwargs):
        super().__init__(with_mean=with_mean, with_std=with_std, copy=copy, **kwargs)

    @property
    def scale_(self) -> np.ndarray:
        """Per feature relative scaling of the data."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.scale_

    @property
    def mean_(self) -> np.ndarray:
        """The mean value for each feature in the training set."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.mean_

    @property
    def var_(self) -> np.ndarray:
        """The variance for each feature in the training set."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.var_

Methods

scale_ property

Per feature relative scaling of the data.

mean_ property

The mean value for each feature in the training set.

var_ property

The variance for each feature in the training set.

Tutorials

The following example notebooks use this component:

  • How to Use Scikit-learn Scalers


    Data-Features

    Wrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.

    View · Open in marimo