QuantileTransformer¶

`yohou.preprocessing.sklearn_wrappers.QuantileTransformer` ¶

Bases: SklearnTransformer

Transform features using quantiles information.

This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.

The transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function.

This is a Yohou wrapper that preserves the polars DataFrame structure and "time" column.

Parameters¶

Name	Type	Description	Default
`n_quantiles`	`int`	Number of quantiles to be computed. It corresponds to the number of landmarks used to discretize the cumulative distribution function. If n_quantiles is larger than the number of samples, n_quantiles is set to the number of samples.	`1000 or n_samples`
`output_distribution`	`('uniform', 'normal')`	Marginal distribution for the transformed data. The choices are 'uniform' (default) or 'normal'.	`'uniform'`
`ignore_implicit_zeros`	`bool`	Only applies to sparse matrices. If True, the sparse entries of the matrix are discarded to compute the quantile statistics.	`False`
`subsample`	`int`	Maximum number of samples used to estimate the quantiles for computational efficiency.	`10_000`
`random_state`	`int, RandomState instance or None`	Determines random number generation for subsampling and smoothing noise.	`None`

Attributes¶

Name	Type	Description
`instance_`	`QuantileTransformer`	The fitted sklearn QuantileTransformer instance.
`n_quantiles_`	`int`	The actual number of quantiles used to discretize the cumulative distribution function.
`quantiles_`	`ndarray of shape (n_quantiles, n_features)`	The values corresponding to the quantiles of reference.
`references_`	`ndarray of shape (n_quantiles,)`	Quantiles of references.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import QuantileTransformer
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 11)],
...     "value": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 100.0],  # 100 is outlier
... })
>>> qt = QuantileTransformer(n_quantiles=10, output_distribution="uniform")
>>> qt.fit(X)
QuantileTransformer(...)
>>> X_transformed = qt.transform(X)
>>> # Outlier impact is reduced
>>> "time" in X_transformed.columns
True

Source Code¶

View on GitHub

Show/Hide sourceclass QuantileTransformer(SklearnTransformer):
    """Transform features using quantiles information.

    This method transforms the features to follow a uniform or a normal
    distribution. Therefore, for a given feature, this transformation tends
    to spread out the most frequent values. It also reduces the impact of
    (marginal) outliers: this is therefore a robust preprocessing scheme.

    The transformation is applied on each feature independently. First an
    estimate of the cumulative distribution function of a feature is used to
    map the original values to a uniform distribution. The obtained values are
    then mapped to the desired output distribution using the associated
    quantile function.

    This is a Yohou wrapper that preserves the polars DataFrame structure and
    "time" column.

    Parameters
    ----------
    n_quantiles : int, default=1000 or n_samples
        Number of quantiles to be computed. It corresponds to the number of
        landmarks used to discretize the cumulative distribution function.
        If n_quantiles is larger than the number of samples, n_quantiles is set
        to the number of samples.

    output_distribution : {'uniform', 'normal'}, default='uniform'
        Marginal distribution for the transformed data. The choices are
        'uniform' (default) or 'normal'.

    ignore_implicit_zeros : bool, default=False
        Only applies to sparse matrices. If True, the sparse entries of the
        matrix are discarded to compute the quantile statistics.

    subsample : int, default=10_000
        Maximum number of samples used to estimate the quantiles for
        computational efficiency.

    random_state : int, RandomState instance or None, default=None
        Determines random number generation for subsampling and smoothing
        noise.

    Attributes
    ----------
    instance_ : sklearn.preprocessing.QuantileTransformer
        The fitted sklearn QuantileTransformer instance.

    n_quantiles_ : int
        The actual number of quantiles used to discretize the cumulative
        distribution function.

    quantiles_ : ndarray of shape (n_quantiles, n_features)
        The values corresponding to the quantiles of reference.

    references_ : ndarray of shape (n_quantiles,)
        Quantiles of references.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.preprocessing import QuantileTransformer
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 11)],
    ...     "value": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 100.0],  # 100 is outlier
    ... })
    >>> qt = QuantileTransformer(n_quantiles=10, output_distribution="uniform")
    >>> qt.fit(X)  # doctest: +ELLIPSIS
    QuantileTransformer(...)
    >>> X_transformed = qt.transform(X)
    >>> # Outlier impact is reduced
    >>> "time" in X_transformed.columns
    True

    See Also
    --------
    - [`PowerTransformer`][yohou.preprocessing.sklearn_wrappers.PowerTransformer] : Apply a power transform to make data more Gaussian-like.

    """

    _estimator_default_class = sklearn_QuantileTransformer

    def __init__(
        self,
        n_quantiles=1000,
        output_distribution="uniform",
        ignore_implicit_zeros=False,
        subsample=10_000,
        random_state=None,
        copy=True,
        **kwargs,
    ):
        super().__init__(
            n_quantiles=n_quantiles,
            output_distribution=output_distribution,
            ignore_implicit_zeros=ignore_implicit_zeros,
            subsample=subsample,
            random_state=random_state,
            copy=copy,
            **kwargs,
        )

    @property
    def n_quantiles_(self) -> int:
        """The actual number of quantiles used."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.n_quantiles_

    @property
    def quantiles_(self) -> np.ndarray:
        """The values corresponding to the quantiles of reference."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.quantiles_

    @property
    def references_(self) -> np.ndarray:
        """Quantiles of references."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.references_

Methods¶

`n_quantiles_` `property` ¶

The actual number of quantiles used.

`quantiles_` `property` ¶

The values corresponding to the quantiles of reference.

`references_` `property` ¶

Quantiles of references.

QuantileTransformer¶

yohou.preprocessing.sklearn_wrappers.QuantileTransformer ¶

Parameters¶

Attributes¶

Examples¶

See Also¶

Source Code¶

Methods¶

n_quantiles_ property ¶

quantiles_ property ¶

references_ property ¶

`yohou.preprocessing.sklearn_wrappers.QuantileTransformer` ¶

`n_quantiles_` `property` ¶

`quantiles_` `property` ¶

`references_` `property` ¶