StandardScaler¶

`yohou.preprocessing.sklearn_wrappers.StandardScaler` ¶

Bases: SklearnScaler

Standardize features by removing the mean and scaling to unit variance.

The standard score of a sample x is calculated as::

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform().

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

StandardScaler is sensitive to outliers, and the features may scale differently from each other in the presence of outliers. For outlier-robust scaling, use RobustScaler instead.

This is a Yohou wrapper that preserves the polars DataFrame structure and "time" column.

Parameters¶

Name	Type	Description	Default
`with_mean`	`bool`	If True, center the data before scaling.	`True`
`with_std`	`bool`	If True, scale the data to unit variance (or equivalently, unit standard deviation).	`True`

Attributes¶

Name	Type	Description
`instance_`	`StandardScaler`	The fitted sklearn StandardScaler instance.
`scale_`	`ndarray of shape (n_features,) or None`	Per feature relative scaling of the data to achieve zero mean and unit variance. Equal to `None` when `with_std=False`.
`mean_`	`ndarray of shape (n_features,) or None`	The mean value for each feature in the training set. Equal to `None` when `with_mean=False` and `with_std=False`.
`var_`	`ndarray of shape (n_features,) or None`	The variance for each feature in the training set. Equal to `None` when `with_mean=False` and `with_std=False`.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import StandardScaler
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
... })
>>> scaler = StandardScaler()
>>> scaler.fit(X)
StandardScaler(...)
>>> X_scaled = scaler.transform(X)
>>> # Values are standardized (mean=0, std=1)
>>> round(X_scaled["value"].mean(), 10)
0.0

Source Code¶

View on GitHub

Show/Hide sourceclass StandardScaler(SklearnScaler):
    """Standardize features by removing the mean and scaling to unit variance.

    The standard score of a sample ``x`` is calculated as::

        z = (x - u) / s

    where ``u`` is the mean of the training samples or zero if ``with_mean=False``,
    and ``s`` is the standard deviation of the training samples or one if
    ``with_std=False``.

    Centering and scaling happen independently on each feature by computing the
    relevant statistics on the samples in the training set. Mean and standard
    deviation are then stored to be used on later data using ``transform()``.

    Standardization of a dataset is a common requirement for many machine learning
    estimators: they might behave badly if the individual features do not more or
    less look like standard normally distributed data (e.g. Gaussian with 0 mean
    and unit variance).

    ``StandardScaler`` is sensitive to outliers, and the features may scale
    differently from each other in the presence of outliers. For outlier-robust
    scaling, use ``RobustScaler`` instead.

    This is a Yohou wrapper that preserves the polars DataFrame structure and
    "time" column.

    Parameters
    ----------
    with_mean : bool, default=True
        If True, center the data before scaling.

    with_std : bool, default=True
        If True, scale the data to unit variance (or equivalently, unit standard
        deviation).

    Attributes
    ----------
    instance_ : sklearn.preprocessing.StandardScaler
        The fitted sklearn StandardScaler instance.

    scale_ : ndarray of shape (n_features,) or None
        Per feature relative scaling of the data to achieve zero mean and unit
        variance. Equal to ``None`` when ``with_std=False``.

    mean_ : ndarray of shape (n_features,) or None
        The mean value for each feature in the training set. Equal to ``None``
        when ``with_mean=False`` and ``with_std=False``.

    var_ : ndarray of shape (n_features,) or None
        The variance for each feature in the training set. Equal to ``None``
        when ``with_mean=False`` and ``with_std=False``.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.preprocessing import StandardScaler
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
    ...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
    ... })
    >>> scaler = StandardScaler()
    >>> scaler.fit(X)  # doctest: +ELLIPSIS
    StandardScaler(...)
    >>> X_scaled = scaler.transform(X)
    >>> # Values are standardized (mean=0, std=1)
    >>> round(X_scaled["value"].mean(), 10)
    0.0

    See Also
    --------
    - [`MinMaxScaler`][yohou.preprocessing.sklearn_wrappers.MinMaxScaler] : Scale features to a given range.
    - [`RobustScaler`][yohou.preprocessing.sklearn_wrappers.RobustScaler] : Scale using statistics robust to outliers.

    """

    _estimator_default_class = sklearn_StandardScaler

    def __init__(self, with_mean=True, with_std=True, copy=True, **kwargs):
        super().__init__(with_mean=with_mean, with_std=with_std, copy=copy, **kwargs)

    @property
    def scale_(self) -> np.ndarray:
        """Per feature relative scaling of the data."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.scale_

    @property
    def mean_(self) -> np.ndarray:
        """The mean value for each feature in the training set."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.mean_

    @property
    def var_(self) -> np.ndarray:
        """The variance for each feature in the training set."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.var_

Methods¶

`scale_` `property` ¶

Per feature relative scaling of the data.

`mean_` `property` ¶

The mean value for each feature in the training set.

`var_` `property` ¶

The variance for each feature in the training set.

Tutorials¶

The following example notebooks use this component:

How to Use Scikit-learn Scalers

Data-Features

Wrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.

View · Open in marimo

StandardScaler¶

yohou.preprocessing.sklearn_wrappers.StandardScaler ¶