SklearnTransformer¶

`yohou.preprocessing.sklearn_base.SklearnTransformer` ¶

Bases: BaseClassWrapper, BaseTransformer

Wrapper to integrate sklearn transformers into the Yohou pipeline.

Preserves the polars DataFrame structure and "time" column while applying sklearn scaling transformations to numeric columns.

This class can be used to:

Wrap any sklearn-compatible transformer for use in yohou pipelines
Serve as a base class for creating yohou transformer extensions

Parameters¶

Name	Type	Description	Default
`transformer`	`type`	The sklearn transformer class to wrap. Must be a subclass of `sklearn.base.TransformerMixin`. If not provided, `_estimator_default_class` is used (subclasses define this).	`None`
`**params`	`dict`	Parameters passed to the underlying sklearn transformer constructor. See the documentation of the specific transformer for available parameters.	`{}`

Attributes¶

Name	Type	Description
`instance_`	`TransformerMixin`	The fitted sklearn transformer instance (created by `BaseClassWrapper`).

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from sklearn.preprocessing import StandardScaler as SklearnStandardScaler
>>> from yohou.preprocessing import SklearnTransformer
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
... })
>>> transformer = SklearnTransformer(transformer=SklearnStandardScaler, with_mean=True)
>>> transformer.fit(X)
SklearnTransformer(...)
>>> X_transformed = transformer.transform(X)
>>> "time" in X_transformed.columns
True

Source Code¶

View on GitHub

Show/Hide sourceclass SklearnTransformer(BaseClassWrapper, BaseTransformer):
    """Wrapper to integrate sklearn transformers into the Yohou pipeline.

    Preserves the polars DataFrame structure and "time" column while applying
    sklearn scaling transformations to numeric columns.

    This class can be used to:

    1. Wrap any sklearn-compatible transformer for use in yohou pipelines
    2. Serve as a base class for creating yohou transformer extensions

    Parameters
    ----------
    transformer : type, default=None
        The sklearn transformer class to wrap. Must be a subclass of
        ``sklearn.base.TransformerMixin``. If not provided,
        ``_estimator_default_class`` is used (subclasses define this).

    **params : dict
        Parameters passed to the underlying sklearn transformer constructor.
        See the documentation of the specific transformer for available parameters.

    Attributes
    ----------
    instance_ : TransformerMixin
        The fitted sklearn transformer instance (created by ``BaseClassWrapper``).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from sklearn.preprocessing import StandardScaler as SklearnStandardScaler
    >>> from yohou.preprocessing import SklearnTransformer
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
    ...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
    ... })
    >>> transformer = SklearnTransformer(transformer=SklearnStandardScaler, with_mean=True)
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    SklearnTransformer(...)
    >>> X_transformed = transformer.transform(X)
    >>> "time" in X_transformed.columns
    True

    See Also
    --------
    - [`StandardScaler`][yohou.preprocessing.sklearn_wrappers.StandardScaler] : Pre-configured wrapper for sklearn's StandardScaler.
    - [`MinMaxScaler`][yohou.preprocessing.sklearn_wrappers.MinMaxScaler] : Pre-configured wrapper for sklearn's MinMaxScaler.
    - [`RobustScaler`][yohou.preprocessing.sklearn_wrappers.RobustScaler] : Pre-configured wrapper for sklearn's RobustScaler.
    - [`MaxAbsScaler`][yohou.preprocessing.sklearn_wrappers.MaxAbsScaler] : Pre-configured wrapper for sklearn's MaxAbsScaler.

    """

    _estimator_name = "transformer"
    _estimator_base_class = TransformerMixin
    _estimator_default_class: type | None = None

    _parameter_constraints: dict = {
        "transformer": [HasMethods(["fit", "transform"]), None],
    }

    def __init__(self, transformer=None, **params):
        if transformer is not None:
            super().__init__(transformer=transformer, **params)
        else:
            super().__init__(**params)

    def __sklearn_tags__(self):
        """Get estimator tags.

        Override to ensure stateful=False before and after fit. The invertible tag
        is set dynamically based on whether the wrapped transformer has inverse_transform.

        Returns
        -------
        Tags
            Estimator tags with stateful=False and invertible based on underlying transformer.

        """
        tags = super().__sklearn_tags__()
        # transformers are always stateless (no memory / observation horizon)
        if tags.transformer_tags is not None:
            tags.transformer_tags.stateful = False
            # Invertible only if underlying transformer has inverse_transform
            tags.transformer_tags.invertible = _transformer_has_inverse(self)
        return tags

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params) -> "SklearnTransformer":
        """Fit the transformer to the data.

        Computes scaling parameters (e.g., mean, std, min, max) from the
        training data, excluding the "time" column.

        Parameters
        ----------
        X : pl.DataFrame
            Input time series with "time" column.

        y : pl.DataFrame or None, default=None
            Target time series. Ignored and only present for API consistency.

        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        self
            Fitted transformer.

        Raises
        ------
        ValueError
            If X does not have a "time" column.

        """
        # Validate input data (checks time column, schema, etc.)
        X = validate_transformer_data(self, X=X, reset=True)

        # Call parent fit (stores schema, memory, etc.)
        BaseTransformer.fit(self, X, y, **params)

        # Strip time column before fitting sklearn transformer
        X_no_time = X.select(~cs.by_name("time"))

        # Configure transformer output and fit (instance_ created by _fit_context)
        self.instance_.set_output(transform="polars")
        self.instance_.fit(X_no_time)

        return self

    def transform(self, X: pl.DataFrame, **params) -> pl.DataFrame:
        """Transform the input time series.

        Applies the learned scaling transformation to each feature.

        Parameters
        ----------
        X : pl.DataFrame
            Feature time series with "time" column.

        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Transformed time series with "time" column preserved.

        """
        check_is_fitted(self, ["instance_", "X_schema_", "feature_names_in_"])

        # Validate input data
        X = validate_transformer_data(self, X=X, reset=False, check_continuity=False)

        # Strip time column before transforming
        time = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        # Apply scaling transformation
        X_scaled_no_time = self.instance_.transform(X_no_time)

        # Reattach time column to the scaled features
        return pl.concat([time, X_scaled_no_time], how="horizontal")

    @available_if(_transformer_has_inverse)
    def inverse_transform(self, X_t: pl.DataFrame, X_p: pl.DataFrame | None = None, **params) -> pl.DataFrame:
        """Apply the inverse transformer transformation to the data.

        This method is only available if the underlying sklearn transformer
        supports inverse_transform (e.g., StandardScaler, PowerTransformer).

        Reverts the scaling transformation, restoring the original data scale.

        Parameters
        ----------
        X_t : pl.DataFrame
            Scaled features with "time" column.

        X_p : pl.DataFrame or None, default=None
            Past observations for stateful inverse transformation. Ignored for
            sklearn wrappers since sklearn transformers are stateless.

        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Unscaled features with "time" column preserved.

        """
        check_is_fitted(self, ["instance_"])
        X_t, _ = validate_transformer_data(self, X=X_t, reset=False, inverse=True, check_continuity=False)

        # Strip time column before inverse transforming
        time = X_t.select(cs.by_name("time"))
        X_no_time = X_t.select(~cs.by_name("time"))

        # Apply inverse scaling transformation (returns numpy array)
        X_unscaled_array = self.instance_.inverse_transform(X_no_time)

        # Convert back to DataFrame with original column names
        X_unscaled_no_time = pl.DataFrame(X_unscaled_array, schema=X_no_time.columns, orient="row")

        # Reattach time column to the unscaled features
        return pl.concat([time, X_unscaled_no_time], how="horizontal")

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : list of str or None, default=None
            Input features. If None, uses feature names from fit.

        Returns
        -------
        list of str
            Transformed feature names (same as input features for transformers).

        """
        check_is_fitted(self, ["instance_"])
        return list(self.instance_.get_feature_names_out(input_features))

Methods¶

`__sklearn_tags__()` ¶

Get estimator tags.

Override to ensure stateful=False before and after fit. The invertible tag is set dynamically based on whether the wrapped transformer has inverse_transform.

Returns¶

Type	Description
`Tags`	Estimator tags with stateful=False and invertible based on underlying transformer.

Source Code¶

View on GitHub

Show/Hide sourcedef __sklearn_tags__(self):
    """Get estimator tags.

    Override to ensure stateful=False before and after fit. The invertible tag
    is set dynamically based on whether the wrapped transformer has inverse_transform.

    Returns
    -------
    Tags
        Estimator tags with stateful=False and invertible based on underlying transformer.

    """
    tags = super().__sklearn_tags__()
    # transformers are always stateless (no memory / observation horizon)
    if tags.transformer_tags is not None:
        tags.transformer_tags.stateful = False
        # Invertible only if underlying transformer has inverse_transform
        tags.transformer_tags.invertible = _transformer_has_inverse(self)
    return tags

`fit(X, y=None, **params)` ¶

Fit the transformer to the data.

Computes scaling parameters (e.g., mean, std, min, max) from the training data, excluding the "time" column.

Parameters¶

Name	Type	Description	Default
`X`	`DataFrame`	Input time series with "time" column.	required
`y`	`DataFrame or None`	Target time series. Ignored and only present for API consistency.	`None`
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`self`	Fitted transformer.

Raises¶

Type	Description
`ValueError`	If X does not have a "time" column.

Source Code¶

View on GitHub

Show/Hide source@_fit_context(prefer_skip_nested_validation=True)
def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params) -> "SklearnTransformer":
    """Fit the transformer to the data.

    Computes scaling parameters (e.g., mean, std, min, max) from the
    training data, excluding the "time" column.

    Parameters
    ----------
    X : pl.DataFrame
        Input time series with "time" column.

    y : pl.DataFrame or None, default=None
        Target time series. Ignored and only present for API consistency.

    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    self
        Fitted transformer.

    Raises
    ------
    ValueError
        If X does not have a "time" column.

    """
    # Validate input data (checks time column, schema, etc.)
    X = validate_transformer_data(self, X=X, reset=True)

    # Call parent fit (stores schema, memory, etc.)
    BaseTransformer.fit(self, X, y, **params)

    # Strip time column before fitting sklearn transformer
    X_no_time = X.select(~cs.by_name("time"))

    # Configure transformer output and fit (instance_ created by _fit_context)
    self.instance_.set_output(transform="polars")
    self.instance_.fit(X_no_time)

    return self

`transform(X, **params)` ¶

Transform the input time series.

Applies the learned scaling transformation to each feature.

Parameters¶

Name	Type	Description	Default
`X`	`DataFrame`	Feature time series with "time" column.	required
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`DataFrame`	Transformed time series with "time" column preserved.

Source Code¶

View on GitHub

Show/Hide sourcedef transform(self, X: pl.DataFrame, **params) -> pl.DataFrame:
    """Transform the input time series.

    Applies the learned scaling transformation to each feature.

    Parameters
    ----------
    X : pl.DataFrame
        Feature time series with "time" column.

    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Transformed time series with "time" column preserved.

    """
    check_is_fitted(self, ["instance_", "X_schema_", "feature_names_in_"])

    # Validate input data
    X = validate_transformer_data(self, X=X, reset=False, check_continuity=False)

    # Strip time column before transforming
    time = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    # Apply scaling transformation
    X_scaled_no_time = self.instance_.transform(X_no_time)

    # Reattach time column to the scaled features
    return pl.concat([time, X_scaled_no_time], how="horizontal")

`inverse_transform(X_t, X_p=None, **params)` ¶

Apply the inverse transformer transformation to the data.

This method is only available if the underlying sklearn transformer supports inverse_transform (e.g., StandardScaler, PowerTransformer).

Reverts the scaling transformation, restoring the original data scale.

Parameters¶

Name	Type	Description	Default
`X_t`	`DataFrame`	Scaled features with "time" column.	required
`X_p`	`DataFrame or None`	Past observations for stateful inverse transformation. Ignored for sklearn wrappers since sklearn transformers are stateless.	`None`
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`DataFrame`	Unscaled features with "time" column preserved.

Source Code¶

View on GitHub

Show/Hide source@available_if(_transformer_has_inverse)
def inverse_transform(self, X_t: pl.DataFrame, X_p: pl.DataFrame | None = None, **params) -> pl.DataFrame:
    """Apply the inverse transformer transformation to the data.

    This method is only available if the underlying sklearn transformer
    supports inverse_transform (e.g., StandardScaler, PowerTransformer).

    Reverts the scaling transformation, restoring the original data scale.

    Parameters
    ----------
    X_t : pl.DataFrame
        Scaled features with "time" column.

    X_p : pl.DataFrame or None, default=None
        Past observations for stateful inverse transformation. Ignored for
        sklearn wrappers since sklearn transformers are stateless.

    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Unscaled features with "time" column preserved.

    """
    check_is_fitted(self, ["instance_"])
    X_t, _ = validate_transformer_data(self, X=X_t, reset=False, inverse=True, check_continuity=False)

    # Strip time column before inverse transforming
    time = X_t.select(cs.by_name("time"))
    X_no_time = X_t.select(~cs.by_name("time"))

    # Apply inverse scaling transformation (returns numpy array)
    X_unscaled_array = self.instance_.inverse_transform(X_no_time)

    # Convert back to DataFrame with original column names
    X_unscaled_no_time = pl.DataFrame(X_unscaled_array, schema=X_no_time.columns, orient="row")

    # Reattach time column to the unscaled features
    return pl.concat([time, X_unscaled_no_time], how="horizontal")

`get_feature_names_out(input_features=None)` ¶

Get output feature names for transformation.

Parameters¶

Name	Type	Description	Default
`input_features`	`list of str or None`	Input features. If None, uses feature names from fit.	`None`

Returns¶

Type	Description
`list of str`	Transformed feature names (same as input features for transformers).

Source Code¶

View on GitHub

Show/Hide sourcedef get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : list of str or None, default=None
        Input features. If None, uses feature names from fit.

    Returns
    -------
    list of str
        Transformed feature names (same as input features for transformers).

    """
    check_is_fitted(self, ["instance_"])
    return list(self.instance_.get_feature_names_out(input_features))

SklearnTransformer¶

yohou.preprocessing.sklearn_base.SklearnTransformer ¶

Parameters¶

Attributes¶

Examples¶

See Also¶

Source Code¶

Methods¶

__sklearn_tags__() ¶

Returns¶

Source Code¶

fit(X, y=None, **params) ¶

Parameters¶

Returns¶

Raises¶

Source Code¶

transform(X, **params) ¶

Parameters¶

Returns¶

Source Code¶

inverse_transform(X_t, X_p=None, **params) ¶

Parameters¶

Returns¶

Source Code¶

get_feature_names_out(input_features=None) ¶

Parameters¶

Returns¶

Source Code¶

`yohou.preprocessing.sklearn_base.SklearnTransformer` ¶

`__sklearn_tags__()` ¶

`fit(X, y=None, **params)` ¶

`transform(X, **params)` ¶

`inverse_transform(X_t, X_p=None, **params)` ¶

`get_feature_names_out(input_features=None)` ¶