Skip to content

Normalizer

yohou.preprocessing.sklearn_wrappers.Normalizer

Bases: SklearnTransformer

Normalize samples individually to unit norm.

Each sample (i.e. each row of the data matrix) with at least one non-zero component is rescaled independently of other samples so that its norm (l1, l2 or max) equals one.

This normalizer can be useful as a preprocessing step for classifiers or other algorithms that rely on the angle between vectors, such as cosine similarity for document classification.

This is a Yohou wrapper that preserves the polars DataFrame structure and "time" column.

Parameters

Name Type Description Default
norm ('l1', 'l2', 'max')

The norm to use to normalize each non zero sample. If norm='max' is used, values will be rescaled by the maximum of the absolute values.

'l1'

Attributes

Name Type Description
instance_ Normalizer

The fitted sklearn Normalizer instance.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import Normalizer
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 4)],
...     "a": [1.0, 2.0, 3.0],
...     "b": [2.0, 4.0, 6.0],
... })
>>> normalizer = Normalizer(norm="l2")
>>> normalizer.fit(X)
Normalizer(...)
>>> X_norm = normalizer.transform(X)
>>> # Each row normalized to unit L2 norm
>>> "time" in X_norm.columns
True

See Also

  • StandardScaler : Standardize features by removing mean and scaling to unit variance.

Source Code

Show/Hide source
class Normalizer(SklearnTransformer):
    """Normalize samples individually to unit norm.

    Each sample (i.e. each row of the data matrix) with at least one non-zero
    component is rescaled independently of other samples so that its norm
    (l1, l2 or max) equals one.

    This normalizer can be useful as a preprocessing step for classifiers or
    other algorithms that rely on the angle between vectors, such as cosine
    similarity for document classification.

    This is a Yohou wrapper that preserves the polars DataFrame structure and
    "time" column.

    Parameters
    ----------
    norm : {'l1', 'l2', 'max'}, default='l2'
        The norm to use to normalize each non zero sample. If norm='max' is
        used, values will be rescaled by the maximum of the absolute values.

    Attributes
    ----------
    instance_ : sklearn.preprocessing.Normalizer
        The fitted sklearn Normalizer instance.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.preprocessing import Normalizer
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 4)],
    ...     "a": [1.0, 2.0, 3.0],
    ...     "b": [2.0, 4.0, 6.0],
    ... })
    >>> normalizer = Normalizer(norm="l2")
    >>> normalizer.fit(X)  # doctest: +ELLIPSIS
    Normalizer(...)
    >>> X_norm = normalizer.transform(X)
    >>> # Each row normalized to unit L2 norm
    >>> "time" in X_norm.columns
    True

    See Also
    --------
    - [`StandardScaler`][yohou.preprocessing.sklearn_wrappers.StandardScaler] : Standardize features by removing mean and scaling to unit variance.

    """

    _estimator_default_class = sklearn_Normalizer

    def __init__(self, norm="l2", copy=True, **kwargs):
        super().__init__(norm=norm, copy=copy, **kwargs)