ColumnTransformer¶

`yohou.compose.column_transformer.ColumnTransformer` ¶

Bases: BaseTransformer, _BaseComposition

Applies transformers to columns of a polars DataFrame.

This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space.

This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.

Parameters¶

Name	Type	Description	Default
`transformers`	`list of tuples`	List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data. name : str Like in FeaturePipeline and FeatureUnion, this allows the transformer and its parameters to be set using `set_params` and searched in grid search. transformer : {'drop', 'passthrough'} or estimator Estimator must support `fit` and `transform`. Special-cased strings 'drop' and 'passthrough' are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively. columns : str, array-like of str, int, array-like of int, array-like of bool, slice or callable Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where `transformer` expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data `X` and can return any of the above. To select multiple columns by name or dtype, you can use `make_column_selector`.	required
`remainder`	`(drop, passthrough)`	By default, only the specified columns in `transformers` are transformed and combined in the output, and the non-specified columns are dropped. (default of `'drop'`). By specifying `remainder='passthrough'`, all remaining columns that were not specified in `transformers`, but present in the data passed to `fit` will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during `fit` will be excluded from the output of `transform`. By setting `remainder` to be an estimator, the remaining non-specified columns will use the `remainder` estimator. The estimator must support `fit` and `transform`. Note that using this feature requires that the DataFrame columns input at `fit` and `transform` have identical order.	`'drop'`
`n_jobs`	`int`	Number of jobs to run in parallel. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors.	`None`
`transformer_weights`	`dict`	Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.	`None`
`verbose`	`bool`	If True, the time elapsed while fitting each transformer will be printed as it is completed.	`False`
`verbose_feature_names_out`	`bool`	If True, `ColumnTransformer.get_feature_names_out` will prefix all feature names with the name of the transformer that generated that feature. If False, `ColumnTransformer.get_feature_names_out` will not prefix any feature names and will error if feature names are not unique.	`True`

Attributes¶

Name	Type	Description
`transformers_`	`list`	The collection of fitted transformers as tuples of (name, fitted_transformer, column). `fitted_transformer` can be an estimator, or `'drop'`; `'passthrough'` is replaced with an equivalent `FunctionTransformer`. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: ('remainder', transformer, remaining_columns) corresponding to the `remainder` parameter. If there are remaining columns, then `len(transformers_)==len(transformers)+1`, otherwise `len(transformers_)==len(transformers)`.
`named_transformers_`	`Bunch`	Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.
`output_indices_`	`dict`	A dictionary from each transformer name to a slice, where the slice corresponds to indices in the transformed output. This is useful to inspect which transformer is responsible for which transformed feature(s).
`n_features_in_`	`int`	Number of features seen during `fit`. Only defined if the underlying transformers expose such an attribute when fit.
`feature_names_in_`	ndarray of shape (`n_features_in_`,)	Names of features seen during `fit`. Defined only when `X` has feature names that are all strings.

Notes¶

The order of the columns in the transformed feature matrix follows the order of how the columns are specified in the transformers list. Columns of the original feature matrix that are not specified are dropped from the resulting transformed feature matrix, unless specified in the passthrough keyword. Those columns specified with passthrough are added at the right to the output of the transformers.

Apply heterogeneous preprocessing to different columns, useful when different time series have different characteristics (e.g., different seasonal patterns).

Column selection by name (string) works seamlessly with polars DataFrames, allowing intuitive column-specific transformations.

Time alignment across columns with different observation horizons is handled automatically by the internal _hstack() function, ensuring all transformed columns are properly aligned in time.

Setting remainder='passthrough' (default is 'drop') preserves untransformed columns in the output, useful for keeping auxiliary columns that don't require transformation.

The verbose_feature_names_out parameter (default=True) prefixes output column names with transformer names using a single underscore separator (e.g., 'deseason_sales') to prevent name collisions when multiple transformers produce columns with the same names. For panel data columns, the prefix is inserted after the group separator to preserve panel structure (e.g., 'store_1__deseason_sales').

The observation_horizon property returns the MAXIMUM across all column transformers, as the transformer needs enough history to satisfy the most demanding column-specific transformation.

force_int_remainder_cols is a class attribute set to True for compatibility with sklearn versions that reference it internally.

All columns must share the same time index. The time column is automatically handled and preserved in the output.

Examples¶

>>> import polars as pl
>>> from datetime import datetime, timedelta
>>> from yohou.compose import ColumnTransformer
>>> from yohou.stationarity import SeasonalDifferencing, SeasonalLogDifferencing
>>>
>>> # Create sample weekly time series data with multiple columns (52 weeks)
>>> time = pl.datetime_range(
...     start=datetime(2023, 1, 1),
...     end=datetime(2023, 1, 1) + timedelta(weeks=51),
...     interval="1w",
...     eager=True
... )
>>> data = pl.DataFrame({
...     "time": time,
...     "sales": range(1, 53),
...     "temperature": range(10, 62)
... })
>>>
>>> # Example 1: Apply different seasonal differencing to different columns
>>> ct = ColumnTransformer([
...     ('sales_diff', SeasonalDifferencing(seasonality=4), 'sales'),
...     ('temp_diff', SeasonalDifferencing(seasonality=7), 'temperature')
... ])
>>>
>>> # Example 2: Use remainder='passthrough' to keep auxiliary columns
>>> ct_passthrough = ColumnTransformer(
...     [('sales_diff', SeasonalDifferencing(seasonality=4), 'sales')],
...     remainder='passthrough'
... )
>>>
>>> # Example 3: Disable verbose_feature_names_out for cleaner names
>>> ct_clean = ColumnTransformer(
...     [('diff', SeasonalDifferencing(seasonality=4), 'sales')],
...     verbose_feature_names_out=False
... )

Source Code¶

View on GitHub

Show/Hide sourceclass ColumnTransformer(BaseTransformer, _BaseComposition):
    """Applies transformers to columns of a polars DataFrame.

    This estimator allows different columns or column subsets of the input
    to be transformed separately and the features generated by each transformer
    will be concatenated to form a single feature space.

    This is useful for heterogeneous or columnar data, to combine several
    feature extraction mechanisms or transformations into a single transformer.

    Parameters
    ----------
    transformers : list of tuples
        List of (name, transformer, columns) tuples specifying the
        transformer objects to be applied to subsets of the data.

        name : str
            Like in FeaturePipeline and FeatureUnion, this allows the transformer and
            its parameters to be set using ``set_params`` and searched in grid
            search.
        transformer : {'drop', 'passthrough'} or estimator
            Estimator must support ``fit`` and ``transform``.
            Special-cased strings 'drop' and 'passthrough' are accepted as
            well, to indicate to drop the columns or to pass them through
            untransformed, respectively.
        columns :  str, array-like of str, int, array-like of int, \
                array-like of bool, slice or callable
            Indexes the data on its second axis. Integers are interpreted as
            positional columns, while strings can reference DataFrame columns
            by name.  A scalar string or int should be used where
            ``transformer`` expects X to be a 1d array-like (vector),
            otherwise a 2d array will be passed to the transformer.
            A callable is passed the input data `X` and can return any of the
            above. To select multiple columns by name or dtype, you can use
            ``make_column_selector``.

    remainder : {'drop', 'passthrough'} or estimator, default='drop'
        By default, only the specified columns in `transformers` are
        transformed and combined in the output, and the non-specified
        columns are dropped. (default of ``'drop'``).
        By specifying ``remainder='passthrough'``, all remaining columns that
        were not specified in `transformers`, but present in the data passed
        to `fit` will be automatically passed through. This subset of columns
        is concatenated with the output of the transformers. For dataframes,
        extra columns not seen during `fit` will be excluded from the output
        of `transform`.
        By setting ``remainder`` to be an estimator, the remaining
        non-specified columns will use the ``remainder`` estimator. The
        estimator must support ``fit`` and ``transform``.
        Note that using this feature requires that the DataFrame columns
        input at ``fit`` and ``transform`` have identical order.

    n_jobs : int, default=None
        Number of jobs to run in parallel.
        ``None`` means 1 unless in a ``joblib.parallel_backend`` context.
        ``-1`` means using all processors.

    transformer_weights : dict, default=None
        Multiplicative weights for features per transformer. The output of the
        transformer is multiplied by these weights. Keys are transformer names,
        values the weights.

    verbose : bool, default=False
        If True, the time elapsed while fitting each transformer will be
        printed as it is completed.

    verbose_feature_names_out : bool, default=True
        If True, `ColumnTransformer.get_feature_names_out` will prefix
        all feature names with the name of the transformer that generated that
        feature.
        If False, `ColumnTransformer.get_feature_names_out` will not
        prefix any feature names and will error if feature names are not
        unique.

    Attributes
    ----------
    transformers_ : list
        The collection of fitted transformers as tuples of (name,
        fitted_transformer, column). `fitted_transformer` can be an estimator,
        or `'drop'`; `'passthrough'` is replaced with an equivalent
        `FunctionTransformer`. In case there were
        no columns selected, this will be the unfitted transformer. If there
        are remaining columns, the final element is a tuple of the form:
        ('remainder', transformer, remaining_columns) corresponding to the
        ``remainder`` parameter. If there are remaining columns, then
        ``len(transformers_)==len(transformers)+1``, otherwise
        ``len(transformers_)==len(transformers)``.

    named_transformers_ : `Bunch`
        Read-only attribute to access any transformer by given name.
        Keys are transformer names and values are the fitted transformer
        objects.

    output_indices_ : dict
        A dictionary from each transformer name to a slice, where the slice
        corresponds to indices in the transformed output. This is useful to
        inspect which transformer is responsible for which transformed
        feature(s).

    n_features_in_ : int
        Number of features seen during ``fit``. Only defined if the
        underlying transformers expose such an attribute when fit.

    feature_names_in_ : ndarray of shape (`n_features_in_`,)
        Names of features seen during ``fit``. Defined only when `X`
        has feature names that are all strings.

    See Also
    --------
    `sklearn.compose.ColumnTransformer` : Underlying scikit-learn column transformer.
    - [`FeaturePipeline`][yohou.compose.feature_pipeline.FeaturePipeline] : Sequential transformation.
    - [`BaseTransformer`][yohou.base.transformer.BaseTransformer] : Base transformer interface.
    - [`SeasonalDifferencing`][yohou.stationarity.transformers.SeasonalDifferencing] : Common column-wise transformer.

    Notes
    -----
    The order of the columns in the transformed feature matrix follows the
    order of how the columns are specified in the `transformers` list.
    Columns of the original feature matrix that are not specified are
    dropped from the resulting transformed feature matrix, unless specified
    in the `passthrough` keyword. Those columns specified with `passthrough`
    are added at the right to the output of the transformers.

    Apply heterogeneous preprocessing to different columns, useful when different
    time series have different characteristics (e.g., different seasonal patterns).

    Column selection by name (string) works seamlessly with polars DataFrames,
    allowing intuitive column-specific transformations.

    Time alignment across columns with different observation horizons is handled
    automatically by the internal `_hstack()` function, ensuring all transformed
    columns are properly aligned in time.

    Setting `remainder='passthrough'` (default is 'drop') preserves untransformed
    columns in the output, useful for keeping auxiliary columns that don't require
    transformation.

    The `verbose_feature_names_out` parameter (default=True) prefixes output column
    names with transformer names using a single underscore separator
    (e.g., 'deseason_sales') to prevent name collisions when multiple
    transformers produce columns with the same names. For panel data columns,
    the prefix is inserted after the group separator to preserve panel structure
    (e.g., 'store_1__deseason_sales').

    The `observation_horizon` property returns the MAXIMUM across all column
    transformers, as the transformer needs enough history to satisfy the most
    demanding column-specific transformation.

    ``force_int_remainder_cols`` is a class attribute set to ``True`` for
    compatibility with sklearn versions that reference it internally.

    All columns must share the same `time` index. The `time` column is automatically
    handled and preserved in the output.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime, timedelta
    >>> from yohou.compose import ColumnTransformer
    >>> from yohou.stationarity import SeasonalDifferencing, SeasonalLogDifferencing
    >>>
    >>> # Create sample weekly time series data with multiple columns (52 weeks)
    >>> time = pl.datetime_range(
    ...     start=datetime(2023, 1, 1),
    ...     end=datetime(2023, 1, 1) + timedelta(weeks=51),
    ...     interval="1w",
    ...     eager=True
    ... )
    >>> data = pl.DataFrame({
    ...     "time": time,
    ...     "sales": range(1, 53),
    ...     "temperature": range(10, 62)
    ... })
    >>>
    >>> # Example 1: Apply different seasonal differencing to different columns
    >>> ct = ColumnTransformer([
    ...     ('sales_diff', SeasonalDifferencing(seasonality=4), 'sales'),
    ...     ('temp_diff', SeasonalDifferencing(seasonality=7), 'temperature')
    ... ])
    >>>
    >>> # Example 2: Use remainder='passthrough' to keep auxiliary columns
    >>> ct_passthrough = ColumnTransformer(
    ...     [('sales_diff', SeasonalDifferencing(seasonality=4), 'sales')],
    ...     remainder='passthrough'
    ... )
    >>>
    >>> # Example 3: Disable verbose_feature_names_out for cleaner names
    >>> ct_clean = ColumnTransformer(
    ...     [('diff', SeasonalDifferencing(seasonality=4), 'sales')],
    ...     verbose_feature_names_out=False
    ... )
    """

    _parameter_constraints: dict[str, Any] = {
        "transformers": [list, Hidden(tuple)],
        "remainder": [
            StrOptions({"drop", "passthrough"}),
            HasMethods(["fit", "transform"]),
            HasMethods(["fit_transform", "transform"]),
        ],
        "n_jobs": [Integral, None],
        "transformer_weights": [dict, None],
        "verbose": ["verbose"],
        "verbose_feature_names_out": ["boolean"],
    }

    def get_params(self, deep: bool = True) -> dict[str, Any]:
        """Get parameters for this estimator.

        Parameters
        ----------
        deep : bool, default=True
            If True, will return the parameters for this estimator and
            contained subobjects that are estimators.

        Returns
        -------
        params : dict[str, Any]
            Parameter names mapped to their values.

        """
        return _BaseComposition._get_params(self, attr="transformers", deep=deep)

    def set_params(self, **params: Any) -> "ColumnTransformer":
        """Set the parameters of this estimator.

        Parameters
        ----------
        **params : dict
            Estimator parameters.

        Returns
        -------
        self : ColumnTransformer
            ColumnTransformer instance.

        """
        _BaseComposition._set_params(self, attr="transformers", **params)
        return self

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with yohou-specific attributes.

        """
        tags = super().__sklearn_tags__()

        # Aggregate tags from contained transformers (static capability check)
        if hasattr(self, "transformers") and self.transformers is not None:
            transformers = [t for _, t, _ in self.transformers if t not in ("drop", "passthrough") and t is not None]

            # Include remainder if it's an estimator
            if hasattr(self, "remainder") and self.remainder not in ("drop", "passthrough", None):
                transformers.append(self.remainder)

            if transformers:
                assert tags.transformer_tags is not None
                assert tags.input_tags is not None
                # Stateful if any transformer is stateful
                tags.transformer_tags.stateful = any(
                    t.__sklearn_tags__().transformer_tags.stateful for t in transformers
                )

                # Not invertible: column transformer cannot generally invert
                # since columns may be dropped or reordered
                tags.transformer_tags.invertible = False

                # Aggregate min_value: take the maximum (most restrictive)
                # All transformers receive subsets of the same input
                min_values = [t.__sklearn_tags__().input_tags.min_value for t in transformers]
                non_none_min_values = [v for v in min_values if v is not None]
                tags.input_tags.min_value = max(non_none_min_values) if non_none_min_values else None

        return tags

    @property
    def _transformers(self) -> list[tuple[str, Any, Any]]:
        """List of (name, fitted_transformer, column) tuples.

        Returns
        -------
        transformers : list[tuple[str, Any, Any]]
            The fitted transformers.

        """
        return sklearn_ColumnTransformer._transformers.fget(self)  # ty: ignore[invalid-argument-type]

    def _iter(
        self,
        fitted: bool = False,
        column_as_labels: bool = False,
        skip_drop: bool = False,
        skip_empty_columns: bool = True,
    ) -> Iterator[tuple[str, Any, Any, Any]]:
        """Generate (name, trans, column, weight) tuples.

        Parameters
        ----------
        fitted : bool, default=False
            Whether to iterate over fitted transformers.
        column_as_labels : bool, default=False
            Whether to return columns as labels.
        skip_drop : bool, default=False
            Whether to skip 'drop' transformers.
        skip_empty_columns : bool, default=True
            Whether to skip transformers with empty columns.

        Yields
        ------
        name : str
            Transformer name.
        trans : Any
            Transformer instance.
        column : Any
            Column specification.
        weight : Any
            Transformer weight.

        """
        return sklearn_ColumnTransformer._iter(
            self,  # ty: ignore[invalid-argument-type]
            fitted=fitted,
            column_as_labels=column_as_labels,
            skip_drop=skip_drop,
            skip_empty_columns=skip_empty_columns,
        )

    def __getitem__(self, ind: int | str | slice) -> Any:
        """Return a sub-transformer or a single transformer.

        Parameters
        ----------
        ind : int, str, or slice
            Index, name, or slice of the transformer to retrieve.

        Returns
        -------
        transformer : Any
            The transformer or sub-transformer.

        """
        if isinstance(ind, slice):
            if ind.step is not None:
                raise ValueError("ColumnTransformer slicing only supports a step of 1")
            return self.__class__(
                transformers=self.transformers[ind],
                remainder=self.remainder,
                n_jobs=self.n_jobs,
                transformer_weights=self.transformer_weights,
                verbose=self.verbose,
            )
        elif isinstance(ind, int):
            name, trans, _ = self.transformers[ind]
            # If fitted, use named_transformers_, otherwise return from transformers
            if hasattr(self, "named_transformers_"):
                return self.named_transformers_[name]
            return trans
        else:
            # String case - get by name
            if hasattr(self, "named_transformers_"):
                return self.named_transformers_[ind]
            # Not fitted yet, search in transformers list
            for name, trans, _ in self.transformers:
                if name == ind:
                    return trans
            raise KeyError(f"Transformer {ind} not found")

    def _log_message(self, name: str, idx: int, total: int) -> str:
        """Get log message for a transformer.

        Parameters
        ----------
        name : str
            Transformer name.
        idx : int
            Current index.
        total : int
            Total number of transformers.

        Returns
        -------
        message : str
            Log message.

        """
        return f"(step {idx} of {total}) Processing {name}"

    def _update_fitted_transformers(self, transformers: Any) -> None:
        """Update fitted transformers.

        Parameters
        ----------
        transformers : Any
            Fitted transformers.


        """
        # Directly use sklearn's implementation - it's tightly coupled with internal state
        sklearn_ColumnTransformer._update_fitted_transformers(self, transformers)  # ty: ignore[invalid-argument-type]

    def _get_feature_name_out_for_transformer(self, name: str, trans: Any, feature_names_in: Any) -> Any:
        """Get feature names for a transformer.

        Parameters
        ----------
        name : str
            Transformer name.
        trans : Any
            Transformer instance.
        feature_names_in : Any
            Input feature names.

        Returns
        -------
        feature_names_out : Any
            Output feature names.

        """
        return sklearn_ColumnTransformer._get_feature_name_out_for_transformer(
            cast(sklearn_ColumnTransformer, self),
            name,
            trans,
            feature_names_in,
        )

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names.

        Collects output feature names from each fitted sub-transformer,
        optionally prefixing them with the transformer name when
        ``verbose_feature_names_out`` is True.

        Parameters
        ----------
        input_features : list[str] | None, default=None
            Input feature names. If None, uses ``feature_names_in_`` from fit.

        Returns
        -------
        list of str
            Output feature names.

        """
        check_is_fitted(self, "transformers_")
        feature_names_out: list[str] = []
        for name, trans, columns in self.transformers_:  # ty: ignore[unresolved-attribute]
            if trans == "drop" or (isinstance(columns, list) and len(columns) == 0):
                continue
            col_list = list(columns) if isinstance(columns, list) else [columns]
            names: list[str] = col_list  # ty: ignore[invalid-assignment]
            if hasattr(trans, "get_feature_names_out"):
                result = trans.get_feature_names_out()
                if result is not None:
                    # Sub-transformers may include "time" in their output; strip it.
                    filtered = [f for f in result if f != "time"]
                    if filtered:
                        names = filtered
            if self.verbose_feature_names_out:
                names = [f"{name}_{f}" for f in names]
            feature_names_out.extend(names)
        return feature_names_out

    def _get_remainder_cols(self, indices: Any) -> Any:
        """Get remainder columns.

        Parameters
        ----------
        indices : Any
            Column indices.

        Returns
        -------
        remainder_cols : Any
            Remainder columns.

        """
        # Directly use sklearn's implementation - it calls _get_remainder_cols_dtype internally
        return sklearn_ColumnTransformer._get_remainder_cols(self, indices)  # ty: ignore[invalid-argument-type]

    def _get_remainder_cols_dtype(self) -> Any:
        """Get dtype of remainder columns.

        Returns
        -------
        dtype : Any
            Data type of remainder columns.

        """
        return sklearn_ColumnTransformer._get_remainder_cols_dtype(self)  # ty: ignore[invalid-argument-type]

    def _add_prefix_for_feature_names_out(self, feature_names_out: list) -> list[str]:
        """Add prefixes to feature names.

        Uses single underscore ``_`` as separator (not ``__``) to avoid
        conflicts with the panel data ``<GROUP>__<SERIES>`` convention.
        For panel columns, the prefix is inserted after the group separator
        (e.g., ``store_1__deseason_sales``).

        Parameters
        ----------
        feature_names_out : Any
            Feature names from transformers.

        Returns
        -------
        prefixed_names : Any
            Feature names with prefixes.

        """
        return [panel_aware_prefix(col, name) for name, cols in feature_names_out for col in cols]

    def _sk_visual_block_(self) -> Any:
        """Get visual block representation.

        Returns
        -------
        visual_block : Any
            Visual block representation.

        """
        return sklearn_ColumnTransformer._sk_visual_block_(self)  # ty: ignore[invalid-argument-type]

    def _validate_remainder(self, X: Any) -> None:
        """Validate remainder parameter.

        Parameters
        ----------
        X : Any
            Input data.

        """
        # Let sklearn handle validation completely
        sklearn_ColumnTransformer._validate_remainder(self, X)  # ty: ignore[invalid-argument-type]

    def _validate_column_callables(self, X: Any) -> None:
        """Validate column callables.

        Parameters
        ----------
        X : Any
            Input data.

        """
        # Let sklearn handle validation
        sklearn_ColumnTransformer._validate_column_callables(self, X)  # ty: ignore[invalid-argument-type]

    def _record_output_indices(self, Xs: Any) -> None:
        """Record output indices for each transformer.

        Parameters
        ----------
        Xs : Any
            Transformed outputs.

        """
        # Let sklearn handle recording
        sklearn_ColumnTransformer._record_output_indices(self, Xs)  # ty: ignore[invalid-argument-type]

    # Required by sklearn <1.8 _get_remainder_cols; unused by >=1.8.
    force_int_remainder_cols = FORCE_INT_REMAINDER_COLS

    def __init__(
        self,
        transformers: list[tuple[str, Any, Any]],
        *,
        remainder: str | Any = "drop",
        n_jobs: int | None = None,
        transformer_weights: dict[str, float] | None = None,
        verbose: bool = False,
        verbose_feature_names_out: bool = True,
    ) -> None:
        self.transformers = transformers
        self.remainder = remainder
        self.n_jobs = n_jobs
        self.transformer_weights = transformer_weights
        self.verbose = verbose
        self.verbose_feature_names_out = verbose_feature_names_out

    def _get_observation_horizons(self) -> list[int]:
        """Get observation horizons from all fitted transformers.

        Returns
        -------
        observation_horizons : list[int]
            List of observation horizons from each transformer.

        """
        observation_horizons = []
        for _, t, _, _ in self._iter(
            fitted=True,
            column_as_labels=True,
            skip_drop=False,
            skip_empty_columns=False,
        ):
            observation_horizon = 0
            if t not in ("drop", "passthrough") and t is not None and hasattr(t, "observation_horizon"):
                observation_horizon = t.observation_horizon

            observation_horizons.append(observation_horizon)

        return observation_horizons

    @property
    def observation_horizon(self) -> int:
        """Maximum observation horizon across all transformers.

        Returns
        -------
        int
            Maximum observation horizon needed.

        Raises
        ------
        NotFittedError
            If the column transformer has not been fitted yet.

        """
        check_is_fitted(self)

        observation_horizons = self._get_observation_horizons()
        observation_horizon = max(observation_horizons)

        return observation_horizon

    @property
    def named_transformers_(self) -> Bunch:
        """Access the fitted transformer by name.

        Read-only attribute to access any transformer by given name.
        Keys are transformer names and values are the fitted transformer
        objects.

        Returns
        -------
        Bunch
            Dict-like object of fitted transformers keyed by name.

        """
        transformers = getattr(self, "transformers_", self.transformers)
        return Bunch(**{name: trans for name, trans, _ in transformers})

    def _validate_transformers(self) -> None:
        """Validate names of transformers and the transformers themselves.

        This checks whether given transformers have the required methods, i.e.
        `fit` or `fit_transform` and `transform` implemented.
        """
        if not self.transformers:
            return

        names, transformers, _ = zip(*self.transformers, strict=False)

        # validate names
        self._validate_names(names)

        # validate estimators
        for t in transformers:
            if t == "passthrough":
                continue
            if not isinstance(t, BaseTransformer):
                # Used to validate the transformers in the `transformers` list
                raise TypeError(
                    "All estimators should be instances of `BaseTransformer` "
                    "or be the string 'passthrough' "
                    f"'{t}' (type {type(t)}) doesn't"
                )

    def _call_func_on_transformers(
        self,
        X: pl.DataFrame,
        y: pl.DataFrame | None,
        func: Callable,
        column_as_labels: bool,
        routed_params: dict[str, dict[str, dict[str, Any]]],
        time_column: pl.DataFrame | None = None,
    ) -> list[pl.DataFrame]:
        """
        Private function to fit and/or transform on demand.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            The data to be used in fit and/or transform. Should NOT include "time" column.

        y : array-like of shape (n_samples,)
            Targets.

        func : callable
            Function to call, which can be _fit_transform_one or
            _transform_one.

        column_as_labels : bool
            Used to iterate through transformers. If True, columns are returned
            as strings. If False, columns are returned as they were given by
            the user. Can be True only if the ``ColumnTransformer`` is already
            fitted.

        routed_params : dict
            The routed parameters as the output from ``process_routing``.

        time_column : pl.DataFrame, optional
            The time column to concatenate with each transformer's input.
            If None, uses self._time_column_ (set during fit).

        Returns
        -------
        Return value (transformers and/or transformed X data) depends
        on the passed function.
        """
        # Use provided time_column or fall back to stored one from fit
        if time_column is None:
            time_column = self._time_column_

        fitted = func is not _fit_transform_one

        def safe_indexing(X: pl.DataFrame, columns: object, axis: int) -> object:
            """Safe indexing helper for polars DataFrames."""
            Xi = _safe_indexing(X, columns, axis=axis)

            if isinstance(Xi, pl.Series):
                Xi = Xi.to_frame()

            return Xi

        transformers = list(
            self._iter(
                fitted=fitted,
                column_as_labels=column_as_labels,
                skip_drop=True,
                skip_empty_columns=True,
            )
        )
        try:
            jobs = []
            for idx, (name, trans, column, weight) in enumerate(transformers, start=1):
                transformer_to_use = trans
                if func is _fit_transform_one:
                    if transformer_to_use == "passthrough":
                        output_config = _get_output_config("transform", self)
                        transformer_to_use = FunctionTransformer(
                            check_inverse=False,
                            feature_names_out="one-to-one",
                        ).set_output(transform=output_config["dense"])

                    extra_args = {
                        "message_clsname": "ColumnTransformer",
                        "message": self._log_message(name, idx, len(transformers)),
                    }
                else:  # func is _transform_one
                    extra_args = {}
                jobs.append(
                    delayed(func)(
                        transformer=clone(transformer_to_use) if not fitted else transformer_to_use,
                        X=pl.concat(
                            [time_column, safe_indexing(X, column, axis=1)],
                            how="horizontal",
                        ),
                        y=y,
                        weight=weight,
                        **extra_args,
                        params=routed_params[name],
                    )
                )

            return Parallel(n_jobs=self.n_jobs)(jobs)

        except ValueError as e:
            if "Expected 2D array, got 1D array instead" in str(e):
                raise ValueError(_ERR_MSG_1DCOLUMN) from e
            else:
                raise

    def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> "ColumnTransformer":
        """Fit all transformers using X.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            Input data, of which specified subsets are used to fit the
            transformers.

        y : array-like of shape (n_samples,...), default=None
            Targets for supervised learning.

        **params : dict, default=None
            Parameters to be passed to the underlying transformers' ``fit`` and
            ``transform`` methods.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        self : ColumnTransformer
            This estimator.
        """
        _raise_for_params(params, self, "fit")
        # we use fit_transform to make sure to set sparse_output_ (for which we
        # need the transformed data) to have consistent output type in predict
        self.fit_transform(X, y=y, **params)
        return self

    @_fit_context(
        # estimators in ColumnTransformer.transformers are not validated yet
        prefer_skip_nested_validation=False
    )
    def fit_transform(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> pl.DataFrame:
        """Fit all transformers, transform the data and concatenate results.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            Input data, of which specified subsets are used to fit the
            transformers.

        y : array-like of shape (n_samples,), default=None
            Targets for supervised learning.

        **params : dict, default=None
            Parameters to be passed to the underlying transformers' ``fit`` and
            ``transform`` methods.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        X_t : {array-like, sparse matrix} of \
                shape (n_samples, sum_n_components)
            Horizontally stacked results of transformers. sum_n_components is the
            sum of n_components (output dimension) over transformers. If
            any result is a sparse matrix, everything will be converted to
            sparse matrices.
        """
        _raise_for_params(params, self, "fit_transform")

        X = _check_X(X)

        # Strip time column early - before sklearn validation which stores column indices
        # This ensures all column references are to the non-time columns
        self._time_column_ = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        # Set feature_names_in_ and n_features_in_ on the stripped data
        _check_feature_names(self, X_no_time, reset=True)
        _check_n_features(self, X_no_time, reset=True)
        self._validate_transformers()
        n_samples = _num_samples(X_no_time)

        self._validate_column_callables(X_no_time)
        self._validate_remainder(X_no_time)

        routed_params = process_routing(self, "fit_transform", **params)

        result = self._call_func_on_transformers(
            X_no_time,
            y,
            _fit_transform_one,
            column_as_labels=False,
            routed_params=routed_params,
        )

        if not result:
            self._update_fitted_transformers([])
            # All transformers are None
            return self._time_column_

        Xs, transformers = zip(*result, strict=False)

        self.sparse_output_ = False

        self._update_fitted_transformers(transformers)
        self._record_output_indices(Xs)

        result = self._hstack(list(Xs), n_samples=n_samples)
        return result

    def transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Transform X separately by each transformer, concatenate results.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            The data to be transformed by subset.

        **params : dict, default=None
            Parameters to be passed to the underlying transformers' ``transform``
            method.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        X_t : {array-like, sparse matrix} of \
                shape (n_samples, sum_n_components)
            Horizontally stacked results of transformers. sum_n_components is the
            sum of n_components (output dimension) over transformers. If
            any result is a sparse matrix, everything will be converted to
            sparse matrices.
        """
        _raise_for_params(params, self, "transform")
        check_is_fitted(self)
        X = _check_X(X)

        # Strip time column early, consistent with fit_transform
        time_column = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        # If ColumnTransformer is fit using a dataframe, and now a dataframe is
        # passed to be transformed, we select columns by name instead. This
        # enables the user to pass X at transform time with extra columns which
        # were not present in fit time, and the order of the columns doesn't
        # matter.
        fit_dataframe_and_transform_dataframe = hasattr(self, "feature_names_in_") and (
            _is_pandas_df(X_no_time) or hasattr(X_no_time, "__dataframe__")
        )

        n_samples = _num_samples(X_no_time)
        column_names = _get_feature_names(X_no_time)

        if fit_dataframe_and_transform_dataframe:
            named_transformers = self.named_transformers_
            # check that all names seen in fit are in transform, unless
            # they were dropped
            non_dropped_indices = [
                ind
                for name, ind in self._transformer_to_input_indices.items()  # ty: ignore[unresolved-attribute]
                if name in named_transformers and named_transformers[name] != "drop"
            ]

            all_indices = set(chain(*non_dropped_indices))
            all_names = {self.feature_names_in_[ind] for ind in all_indices}

            diff = all_names - set(column_names)
            if diff:
                raise ValueError(f"columns are missing: {diff}")
        else:
            # ndarray was used for fitting or transforming, thus we only
            # check that n_features_in_ is consistent
            self._check_n_features(X_no_time, reset=False)  # ty: ignore[unresolved-attribute]

        routed_params = process_routing(self, "transform", **params)

        Xs = self._call_func_on_transformers(
            X_no_time,
            None,
            _transform_one,
            column_as_labels=fit_dataframe_and_transform_dataframe,
            routed_params=routed_params,
            time_column=time_column,
        )

        if not Xs:
            # All transformers are None
            return time_column

        result = self._hstack(list(Xs), n_samples=n_samples)
        return result

    def observe_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Observe and transform X by each transformer, concatenate results.

        This method atomically observes each column transformer with new data and
        transforms it. The transformation uses the pre-observe state, then updates
        the memory. This is more efficient and correct than calling observe() then
        transform() separately.

        Parameters
        ----------
        X : pl.DataFrame
            New data to observe with and transform.

        **params : dict, default=None
            Parameters routed to the `transform` methods of the transformers.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        X_t : pl.DataFrame
            Horizontally stacked results of transformers.

        """
        _raise_for_params(params, self, "observe_transform")
        check_is_fitted(self)
        X = _check_X(X)

        # Strip time column early, consistent with fit_transform and transform
        time_column = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        n_samples = _num_samples(X_no_time)

        routed_params = process_routing(self, "observe_transform", **params)

        Xs = self._call_func_on_transformers(
            X_no_time,
            None,
            _observe_transform_one,
            column_as_labels=False,
            routed_params=routed_params,
            time_column=time_column,
        )

        if not Xs:
            # All transformers are None
            return time_column

        # For observe_transform, skip sample count check since transformers handle buffering internally
        result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

        return result

    def rewind_transform(self, X: pl.DataFrame, **params) -> pl.DataFrame:
        """Rewind internal state and transform using only observation horizon rows.

        Discards accumulated observations and rewinds to a clean state using
        the last `observation_horizon` rows for each transformer. This provides
        a stateless transformation that can be used for reproducible results.

        Parameters
        ----------
        X : pl.DataFrame
            Input DataFrame with "time" column. The last `observation_horizon`
            rows of each transformer will be used to initialize state.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Transformed output with "time" column, after rewinding state.

        """
        check_is_fitted(self)
        time_column = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        n_samples = _num_samples(X_no_time)

        routed_params = process_routing(self, "rewind_transform", **params)

        Xs = self._call_func_on_transformers(
            X_no_time,
            None,
            _rewind_transform_one,
            column_as_labels=False,
            routed_params=routed_params,
            time_column=time_column,
        )

        if not Xs:
            # All transformers are None
            return time_column

        # For rewind_transform, skip sample count check since transformers discard warmup rows
        result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

        return result

    def _hstack(self, Xs: list[pl.DataFrame], *, n_samples: int, check_samples: bool = True) -> pl.DataFrame:
        """Stacks Xs horizontally.

        This allows subclasses to control the stacking behavior, while reusing
        everything else from ColumnTransformer.

        Parameters
        ----------
        Xs : list of {array-like, sparse matrix, dataframe}
            The container to concatenate.

        n_samples : int
            The number of samples in the input data to checking the transformation
            consistency.

        check_samples : bool, default=True
            Whether to check that output samples match expected count.
            Set to False for observe_transform which handles buffering internally.
        """
        # rename before stacking as it avoids to error on temporary duplicated
        # columns
        transformer_names = [
            t[0]
            for t in self._iter(
                fitted=True,
                column_as_labels=False,
                skip_drop=True,
                skip_empty_columns=True,
            )
        ]
        # feature_names_outs is a list of lists - one list per transformer
        feature_names_outs = [[col for col in X.columns if col != "time"] for X in Xs if X.shape[1] != 1]
        # Track the original column counts per transformer for re-grouping after prefixing
        column_counts = [len(cols) for cols in feature_names_outs]

        if self.verbose_feature_names_out:
            # `_add_prefix_for_feature_names_out` returns a flat list of prefixed names
            flat_feature_names = self._add_prefix_for_feature_names_out(
                list(zip(transformer_names, feature_names_outs, strict=False))
            )
            # Convert back to list of lists using the original column counts
            feature_names_outs = []
            idx = 0
            for count in column_counts:
                feature_names_outs.append(flat_feature_names[idx : idx + count])
                idx += count
        else:
            # check for duplicated columns and raise if any
            flat_feature_names = list(chain.from_iterable(feature_names_outs))
            feature_names_count = Counter(flat_feature_names)
            if any(count > 1 for count in feature_names_count.values()):
                duplicated_feature_names = sorted(name for name, count in feature_names_count.items() if count > 1)
                err_msg = (
                    "Duplicated feature names found before concatenating the"
                    " outputs of the transformers:"
                    f" {duplicated_feature_names}.\n"
                )
                for transformer_name, X in zip(transformer_names, Xs, strict=False):
                    if X.shape[1] == 1:
                        continue
                    dup_cols_in_transformer = sorted(set(X.columns).intersection(duplicated_feature_names))
                    if dup_cols_in_transformer:
                        err_msg += (
                            f"Transformer {transformer_name} has conflicting "
                            f"columns names: {dup_cols_in_transformer}.\n"
                        )
                raise ValueError(
                    err_msg + "Either make sure that the transformers named above "
                    "do not generate columns with conflicting names or set "
                    "verbose_feature_names_out=True to automatically "
                    "prefix to the output feature names with the name "
                    "of the transformer to prevent any conflicting "
                    "names."
                )

        output = _hstack(
            Xs,
            column_names=feature_names_outs,
            observation_horizons=self._get_observation_horizons(),
        )
        output_samples = output.shape[0]
        if check_samples and output_samples > n_samples:
            raise ValueError(
                "Concatenating DataFrames from the transformer's output lead to an inconsistent number of samples."
            )

        return output

    def get_metadata_routing(self) -> MetadataRouter:
        """Get metadata routing of this object.

        Please check [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) on how the routing
        mechanism works.

        Returns
        -------
        routing : MetadataRouter
            A `MetadataRouter` encapsulating
            routing information.
        """
        router = MetadataRouter(owner=self)
        # Here we don't care about which columns are used for which
        # transformers, and whether or not a transformer is used at all, which
        # might happen if no columns are selected for that transformer. We
        # request all metadata requested by all transformers.
        transformers = chain(self.transformers, [("remainder", self.remainder, None)])
        for name, step, _ in transformers:
            method_mapping = MethodMapping()
            if hasattr(step, "fit_transform"):
                (
                    method_mapping.add(caller="fit", callee="fit_transform").add(
                        caller="fit_transform", callee="fit_transform"
                    )
                )
            else:
                (
                    method_mapping
                    .add(caller="fit", callee="fit")
                    .add(caller="fit", callee="transform")
                    .add(caller="fit_transform", callee="fit")
                    .add(caller="fit_transform", callee="transform")
                )
            method_mapping.add(caller="transform", callee="transform")
            router.add(method_mapping=method_mapping, **{name: step})

        return router

Methods¶

`observation_horizon` `property` ¶

Maximum observation horizon across all transformers.

Returns¶

Type	Description
`int`	Maximum observation horizon needed.

Raises¶

Type	Description
`NotFittedError`	If the column transformer has not been fitted yet.

`named_transformers_` `property` ¶

Access the fitted transformer by name.

Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.

Returns¶

Type	Description
`Bunch`	Dict-like object of fitted transformers keyed by name.

`get_params(deep=True)` ¶

Get parameters for this estimator.

Parameters¶

Name	Type	Description	Default
`deep`	`bool`	If True, will return the parameters for this estimator and contained subobjects that are estimators.	`True`

Returns¶

Name	Type	Description
`params`	`dict[str, Any]`	Parameter names mapped to their values.

Source Code¶

View on GitHub

Show/Hide sourcedef get_params(self, deep: bool = True) -> dict[str, Any]:
    """Get parameters for this estimator.

    Parameters
    ----------
    deep : bool, default=True
        If True, will return the parameters for this estimator and
        contained subobjects that are estimators.

    Returns
    -------
    params : dict[str, Any]
        Parameter names mapped to their values.

    """
    return _BaseComposition._get_params(self, attr="transformers", deep=deep)

`set_params(**params)` ¶

Set the parameters of this estimator.

Parameters¶

Name	Type	Description	Default
`**params`	`dict`	Estimator parameters.	`{}`

Returns¶

Name	Type	Description
`self`	`ColumnTransformer`	ColumnTransformer instance.

Source Code¶

View on GitHub

Show/Hide sourcedef set_params(self, **params: Any) -> "ColumnTransformer":
    """Set the parameters of this estimator.

    Parameters
    ----------
    **params : dict
        Estimator parameters.

    Returns
    -------
    self : ColumnTransformer
        ColumnTransformer instance.

    """
    _BaseComposition._set_params(self, attr="transformers", **params)
    return self

`__sklearn_tags__()` ¶

Get estimator tags.

Returns¶

Type	Description
`Tags`	Estimator tags with yohou-specific attributes.

Source Code¶

View on GitHub

Show/Hide sourcedef __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with yohou-specific attributes.

    """
    tags = super().__sklearn_tags__()

    # Aggregate tags from contained transformers (static capability check)
    if hasattr(self, "transformers") and self.transformers is not None:
        transformers = [t for _, t, _ in self.transformers if t not in ("drop", "passthrough") and t is not None]

        # Include remainder if it's an estimator
        if hasattr(self, "remainder") and self.remainder not in ("drop", "passthrough", None):
            transformers.append(self.remainder)

        if transformers:
            assert tags.transformer_tags is not None
            assert tags.input_tags is not None
            # Stateful if any transformer is stateful
            tags.transformer_tags.stateful = any(
                t.__sklearn_tags__().transformer_tags.stateful for t in transformers
            )

            # Not invertible: column transformer cannot generally invert
            # since columns may be dropped or reordered
            tags.transformer_tags.invertible = False

            # Aggregate min_value: take the maximum (most restrictive)
            # All transformers receive subsets of the same input
            min_values = [t.__sklearn_tags__().input_tags.min_value for t in transformers]
            non_none_min_values = [v for v in min_values if v is not None]
            tags.input_tags.min_value = max(non_none_min_values) if non_none_min_values else None

    return tags

`getitem(ind)` ¶

Return a sub-transformer or a single transformer.

Parameters¶

Name	Type	Description	Default
`ind`	`int, str, or slice`	Index, name, or slice of the transformer to retrieve.	required

Returns¶

Name	Type	Description
`transformer`	`Any`	The transformer or sub-transformer.

Source Code¶

View on GitHub

Show/Hide sourcedef __getitem__(self, ind: int | str | slice) -> Any:
    """Return a sub-transformer or a single transformer.

    Parameters
    ----------
    ind : int, str, or slice
        Index, name, or slice of the transformer to retrieve.

    Returns
    -------
    transformer : Any
        The transformer or sub-transformer.

    """
    if isinstance(ind, slice):
        if ind.step is not None:
            raise ValueError("ColumnTransformer slicing only supports a step of 1")
        return self.__class__(
            transformers=self.transformers[ind],
            remainder=self.remainder,
            n_jobs=self.n_jobs,
            transformer_weights=self.transformer_weights,
            verbose=self.verbose,
        )
    elif isinstance(ind, int):
        name, trans, _ = self.transformers[ind]
        # If fitted, use named_transformers_, otherwise return from transformers
        if hasattr(self, "named_transformers_"):
            return self.named_transformers_[name]
        return trans
    else:
        # String case - get by name
        if hasattr(self, "named_transformers_"):
            return self.named_transformers_[ind]
        # Not fitted yet, search in transformers list
        for name, trans, _ in self.transformers:
            if name == ind:
                return trans
        raise KeyError(f"Transformer {ind} not found")

`get_feature_names_out(input_features=None)` ¶

Get output feature names.

Collects output feature names from each fitted sub-transformer, optionally prefixing them with the transformer name when verbose_feature_names_out is True.

Parameters¶

Name	Type	Description	Default
`input_features`	`list[str] \| None`	Input feature names. If None, uses `feature_names_in_` from fit.	`None`

Returns¶

Type	Description
`list of str`	Output feature names.

Source Code¶

View on GitHub

Show/Hide sourcedef get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names.

    Collects output feature names from each fitted sub-transformer,
    optionally prefixing them with the transformer name when
    ``verbose_feature_names_out`` is True.

    Parameters
    ----------
    input_features : list[str] | None, default=None
        Input feature names. If None, uses ``feature_names_in_`` from fit.

    Returns
    -------
    list of str
        Output feature names.

    """
    check_is_fitted(self, "transformers_")
    feature_names_out: list[str] = []
    for name, trans, columns in self.transformers_:  # ty: ignore[unresolved-attribute]
        if trans == "drop" or (isinstance(columns, list) and len(columns) == 0):
            continue
        col_list = list(columns) if isinstance(columns, list) else [columns]
        names: list[str] = col_list  # ty: ignore[invalid-assignment]
        if hasattr(trans, "get_feature_names_out"):
            result = trans.get_feature_names_out()
            if result is not None:
                # Sub-transformers may include "time" in their output; strip it.
                filtered = [f for f in result if f != "time"]
                if filtered:
                    names = filtered
        if self.verbose_feature_names_out:
            names = [f"{name}_{f}" for f in names]
        feature_names_out.extend(names)
    return feature_names_out

`fit(X, y=None, **params)` ¶

Fit all transformers using X.

Parameters¶

Name	Type	Description	Default
`X`	`(array - like, dataframe)`	Input data, of which specified subsets are used to fit the transformers.	`array-like`
`y`	`array-like of shape (n_samples,...)`	Targets for supervised learning.	`None`
`**params`	`dict`	Parameters to be passed to the underlying transformers' `fit` and `transform` methods. You can only pass this if metadata routing is enabled, which you can enable using `sklearn.set_config(enable_metadata_routing=True)`.	`None`

Returns¶

Name	Type	Description
`self`	`ColumnTransformer`	This estimator.

Source Code¶

View on GitHub

Show/Hide sourcedef fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> "ColumnTransformer":
    """Fit all transformers using X.

    Parameters
    ----------
    X : {array-like, dataframe} of shape (n_samples, n_features)
        Input data, of which specified subsets are used to fit the
        transformers.

    y : array-like of shape (n_samples,...), default=None
        Targets for supervised learning.

    **params : dict, default=None
        Parameters to be passed to the underlying transformers' ``fit`` and
        ``transform`` methods.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    self : ColumnTransformer
        This estimator.
    """
    _raise_for_params(params, self, "fit")
    # we use fit_transform to make sure to set sparse_output_ (for which we
    # need the transformed data) to have consistent output type in predict
    self.fit_transform(X, y=y, **params)
    return self

`fit_transform(X, y=None, **params)` ¶

Fit all transformers, transform the data and concatenate results.

Parameters¶

Name	Type	Description	Default
`X`	`(array - like, dataframe)`	Input data, of which specified subsets are used to fit the transformers.	`array-like`
`y`	`array-like of shape (n_samples,)`	Targets for supervised learning.	`None`
`**params`	`dict`	Parameters to be passed to the underlying transformers' `fit` and `transform` methods. You can only pass this if metadata routing is enabled, which you can enable using `sklearn.set_config(enable_metadata_routing=True)`.	`None`

Returns¶

Name	Type	Description
`X_t`	`{array-like, sparse matrix} of shape (n_samples, sum_n_components)`	Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Source Code¶

View on GitHub

Show/Hide source@_fit_context(
    # estimators in ColumnTransformer.transformers are not validated yet
    prefer_skip_nested_validation=False
)
def fit_transform(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> pl.DataFrame:
    """Fit all transformers, transform the data and concatenate results.

    Parameters
    ----------
    X : {array-like, dataframe} of shape (n_samples, n_features)
        Input data, of which specified subsets are used to fit the
        transformers.

    y : array-like of shape (n_samples,), default=None
        Targets for supervised learning.

    **params : dict, default=None
        Parameters to be passed to the underlying transformers' ``fit`` and
        ``transform`` methods.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    X_t : {array-like, sparse matrix} of \
            shape (n_samples, sum_n_components)
        Horizontally stacked results of transformers. sum_n_components is the
        sum of n_components (output dimension) over transformers. If
        any result is a sparse matrix, everything will be converted to
        sparse matrices.
    """
    _raise_for_params(params, self, "fit_transform")

    X = _check_X(X)

    # Strip time column early - before sklearn validation which stores column indices
    # This ensures all column references are to the non-time columns
    self._time_column_ = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    # Set feature_names_in_ and n_features_in_ on the stripped data
    _check_feature_names(self, X_no_time, reset=True)
    _check_n_features(self, X_no_time, reset=True)
    self._validate_transformers()
    n_samples = _num_samples(X_no_time)

    self._validate_column_callables(X_no_time)
    self._validate_remainder(X_no_time)

    routed_params = process_routing(self, "fit_transform", **params)

    result = self._call_func_on_transformers(
        X_no_time,
        y,
        _fit_transform_one,
        column_as_labels=False,
        routed_params=routed_params,
    )

    if not result:
        self._update_fitted_transformers([])
        # All transformers are None
        return self._time_column_

    Xs, transformers = zip(*result, strict=False)

    self.sparse_output_ = False

    self._update_fitted_transformers(transformers)
    self._record_output_indices(Xs)

    result = self._hstack(list(Xs), n_samples=n_samples)
    return result

`transform(X, **params)` ¶

Transform X separately by each transformer, concatenate results.

Parameters¶

Name	Type	Description	Default
`X`	`(array - like, dataframe)`	The data to be transformed by subset.	`array-like`
`**params`	`dict`	Parameters to be passed to the underlying transformers' `transform` method. You can only pass this if metadata routing is enabled, which you can enable using `sklearn.set_config(enable_metadata_routing=True)`.	`None`

Returns¶

Name	Type	Description
`X_t`	`{array-like, sparse matrix} of shape (n_samples, sum_n_components)`	Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Source Code¶

View on GitHub

Show/Hide sourcedef transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Transform X separately by each transformer, concatenate results.

    Parameters
    ----------
    X : {array-like, dataframe} of shape (n_samples, n_features)
        The data to be transformed by subset.

    **params : dict, default=None
        Parameters to be passed to the underlying transformers' ``transform``
        method.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    X_t : {array-like, sparse matrix} of \
            shape (n_samples, sum_n_components)
        Horizontally stacked results of transformers. sum_n_components is the
        sum of n_components (output dimension) over transformers. If
        any result is a sparse matrix, everything will be converted to
        sparse matrices.
    """
    _raise_for_params(params, self, "transform")
    check_is_fitted(self)
    X = _check_X(X)

    # Strip time column early, consistent with fit_transform
    time_column = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    # If ColumnTransformer is fit using a dataframe, and now a dataframe is
    # passed to be transformed, we select columns by name instead. This
    # enables the user to pass X at transform time with extra columns which
    # were not present in fit time, and the order of the columns doesn't
    # matter.
    fit_dataframe_and_transform_dataframe = hasattr(self, "feature_names_in_") and (
        _is_pandas_df(X_no_time) or hasattr(X_no_time, "__dataframe__")
    )

    n_samples = _num_samples(X_no_time)
    column_names = _get_feature_names(X_no_time)

    if fit_dataframe_and_transform_dataframe:
        named_transformers = self.named_transformers_
        # check that all names seen in fit are in transform, unless
        # they were dropped
        non_dropped_indices = [
            ind
            for name, ind in self._transformer_to_input_indices.items()  # ty: ignore[unresolved-attribute]
            if name in named_transformers and named_transformers[name] != "drop"
        ]

        all_indices = set(chain(*non_dropped_indices))
        all_names = {self.feature_names_in_[ind] for ind in all_indices}

        diff = all_names - set(column_names)
        if diff:
            raise ValueError(f"columns are missing: {diff}")
    else:
        # ndarray was used for fitting or transforming, thus we only
        # check that n_features_in_ is consistent
        self._check_n_features(X_no_time, reset=False)  # ty: ignore[unresolved-attribute]

    routed_params = process_routing(self, "transform", **params)

    Xs = self._call_func_on_transformers(
        X_no_time,
        None,
        _transform_one,
        column_as_labels=fit_dataframe_and_transform_dataframe,
        routed_params=routed_params,
        time_column=time_column,
    )

    if not Xs:
        # All transformers are None
        return time_column

    result = self._hstack(list(Xs), n_samples=n_samples)
    return result

`observe_transform(X, **params)` ¶

Observe and transform X by each transformer, concatenate results.

This method atomically observes each column transformer with new data and transforms it. The transformation uses the pre-observe state, then updates the memory. This is more efficient and correct than calling observe() then transform() separately.

Parameters¶

Name	Type	Description	Default
`X`	`DataFrame`	New data to observe with and transform.	required
`**params`	`dict`	Parameters routed to the `transform` methods of the transformers. You can only pass this if metadata routing is enabled, which you can enable using `sklearn.set_config(enable_metadata_routing=True)`.	`None`

Returns¶

Name	Type	Description
`X_t`	`DataFrame`	Horizontally stacked results of transformers.

Source Code¶

View on GitHub

Show/Hide sourcedef observe_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Observe and transform X by each transformer, concatenate results.

    This method atomically observes each column transformer with new data and
    transforms it. The transformation uses the pre-observe state, then updates
    the memory. This is more efficient and correct than calling observe() then
    transform() separately.

    Parameters
    ----------
    X : pl.DataFrame
        New data to observe with and transform.

    **params : dict, default=None
        Parameters routed to the `transform` methods of the transformers.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    X_t : pl.DataFrame
        Horizontally stacked results of transformers.

    """
    _raise_for_params(params, self, "observe_transform")
    check_is_fitted(self)
    X = _check_X(X)

    # Strip time column early, consistent with fit_transform and transform
    time_column = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    n_samples = _num_samples(X_no_time)

    routed_params = process_routing(self, "observe_transform", **params)

    Xs = self._call_func_on_transformers(
        X_no_time,
        None,
        _observe_transform_one,
        column_as_labels=False,
        routed_params=routed_params,
        time_column=time_column,
    )

    if not Xs:
        # All transformers are None
        return time_column

    # For observe_transform, skip sample count check since transformers handle buffering internally
    result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

    return result

`rewind_transform(X, **params)` ¶

Rewind internal state and transform using only observation horizon rows.

Discards accumulated observations and rewinds to a clean state using the last observation_horizon rows for each transformer. This provides a stateless transformation that can be used for reproducible results.

Parameters¶

Name	Type	Description	Default
`X`	`DataFrame`	Input DataFrame with "time" column. The last `observation_horizon` rows of each transformer will be used to initialize state.	required
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`DataFrame`	Transformed output with "time" column, after rewinding state.

Source Code¶

View on GitHub

Show/Hide sourcedef rewind_transform(self, X: pl.DataFrame, **params) -> pl.DataFrame:
    """Rewind internal state and transform using only observation horizon rows.

    Discards accumulated observations and rewinds to a clean state using
    the last `observation_horizon` rows for each transformer. This provides
    a stateless transformation that can be used for reproducible results.

    Parameters
    ----------
    X : pl.DataFrame
        Input DataFrame with "time" column. The last `observation_horizon`
        rows of each transformer will be used to initialize state.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Transformed output with "time" column, after rewinding state.

    """
    check_is_fitted(self)
    time_column = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    n_samples = _num_samples(X_no_time)

    routed_params = process_routing(self, "rewind_transform", **params)

    Xs = self._call_func_on_transformers(
        X_no_time,
        None,
        _rewind_transform_one,
        column_as_labels=False,
        routed_params=routed_params,
        time_column=time_column,
    )

    if not Xs:
        # All transformers are None
        return time_column

    # For rewind_transform, skip sample count check since transformers discard warmup rows
    result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

    return result

`get_metadata_routing()` ¶

Get metadata routing of this object.

Please check Metadata Routing User Guide on how the routing mechanism works.

Returns¶

Name	Type	Description
`routing`	`MetadataRouter`	A `MetadataRouter` encapsulating routing information.

Source Code¶

View on GitHub

Show/Hide sourcedef get_metadata_routing(self) -> MetadataRouter:
    """Get metadata routing of this object.

    Please check [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) on how the routing
    mechanism works.

    Returns
    -------
    routing : MetadataRouter
        A `MetadataRouter` encapsulating
        routing information.
    """
    router = MetadataRouter(owner=self)
    # Here we don't care about which columns are used for which
    # transformers, and whether or not a transformer is used at all, which
    # might happen if no columns are selected for that transformer. We
    # request all metadata requested by all transformers.
    transformers = chain(self.transformers, [("remainder", self.remainder, None)])
    for name, step, _ in transformers:
        method_mapping = MethodMapping()
        if hasattr(step, "fit_transform"):
            (
                method_mapping.add(caller="fit", callee="fit_transform").add(
                    caller="fit_transform", callee="fit_transform"
                )
            )
        else:
            (
                method_mapping
                .add(caller="fit", callee="fit")
                .add(caller="fit", callee="transform")
                .add(caller="fit_transform", callee="fit")
                .add(caller="fit_transform", callee="transform")
            )
        method_mapping.add(caller="transform", callee="transform")
        router.add(method_mapping=method_mapping, **{name: step})

    return router

Tutorials¶

The following example notebooks use this component:

How to Use ColumnTransformer

Data-Features

Route columns through distinct transformers with ColumnTransformer, including remainder handling and automatic panel-aware column detection.

View · Open in marimo

ColumnTransformer¶

yohou.compose.column_transformer.ColumnTransformer ¶

Parameters¶

Attributes¶

See Also¶

Notes¶

Examples¶

Source Code¶

Methods¶

observation_horizon property ¶

Returns¶

Raises¶

named_transformers_ property ¶

Returns¶

get_params(deep=True) ¶

Parameters¶

Returns¶

Source Code¶

set_params(**params) ¶

Parameters¶

Returns¶

Source Code¶

__sklearn_tags__() ¶

Returns¶

Source Code¶

__getitem__(ind) ¶

Parameters¶

Returns¶

Source Code¶

get_feature_names_out(input_features=None) ¶

Parameters¶

Returns¶

Source Code¶

fit(X, y=None, **params) ¶

Parameters¶

Returns¶

Source Code¶

fit_transform(X, y=None, **params) ¶

Parameters¶

Returns¶

Source Code¶

transform(X, **params) ¶

Parameters¶

Returns¶

Source Code¶

observe_transform(X, **params) ¶

Parameters¶

Returns¶

Source Code¶

rewind_transform(X, **params) ¶

Parameters¶

Returns¶

Source Code¶

get_metadata_routing() ¶

Returns¶

Source Code¶

Tutorials¶

`yohou.compose.column_transformer.ColumnTransformer` ¶

`observation_horizon` `property` ¶

`named_transformers_` `property` ¶

`get_params(deep=True)` ¶

`set_params(**params)` ¶

`__sklearn_tags__()` ¶

`getitem(ind)` ¶

`get_feature_names_out(input_features=None)` ¶

`fit(X, y=None, **params)` ¶

`fit_transform(X, y=None, **params)` ¶

`transform(X, **params)` ¶

`observe_transform(X, **params)` ¶

`rewind_transform(X, **params)` ¶

`get_metadata_routing()` ¶