Skip to content

FeatureUnion

yohou.compose.feature_union.FeatureUnion

Bases: BaseTransformer, _BaseComposition

Concatenates results of multiple transformer objects.

This estimator applies a list of transformer objects in parallel to the input data, then concatenates the results. This is useful to combine several feature extraction mechanisms into a single transformer.

Parameters of the transformers may be set using its name and the parameter name separated by a '__'. A transformer may be replaced entirely by setting the parameter with its name to another transformer, removed by setting to 'drop' or disabled by setting to 'passthrough' (features are passed without transformation).

Parameters

Name Type Description Default
transformer_list list of (str, transformer) tuples

List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer. The transformer can be 'drop' for it to be ignored or can be 'passthrough' for features to be passed unchanged.

required
n_jobs int

Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

None
transformer_weights dict

Multiplicative weights for features per transformer. Keys are transformer names, values the weights. Raises ValueError if key not present in transformer_list.

None
verbose bool

If True, the time elapsed while fitting each transformer will be printed as it is completed.

False
verbose_feature_names_out bool

If True, get_feature_names_out will prefix all feature names with the name of the transformer that generated that feature using a single underscore separator (e.g., lags_sales). For panel data columns, the prefix is inserted after the group separator to preserve panel structure (e.g., store_1__lags_sales). If False, get_feature_names_out will not prefix any feature names and will error if feature names are not unique.

True

Attributes

Name Type Description
named_transformers `Bunch`

Dictionary-like object, with the following attributes. Read-only attribute to access any transformer parameter by user given name. Keys are transformer names and values are transformer parameters.

n_features_in_ int

Number of features seen during fit. Only defined if the underlying first transformer in transformer_list exposes such an attribute when fit.

feature_names_in_ ndarray of shape (`n_features_in_`,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

See Also

sklearn.pipeline.FeatureUnion : Underlying scikit-learn feature union class. - FeaturePipeline : Sequential transformer chaining. - BaseTransformer : Base class for transformers. - LagTransformer : Common transformer for lag features.

Notes

Transformers run in parallel when n_jobs is set to a value other than 1. This can significantly improve performance for computationally expensive transformers.

Results are concatenated horizontally with automatic time alignment. The internal _hstack() function handles transformers with different observation horizons by aligning their outputs to the maximum observation horizon.

The observation_horizon property returns the MAXIMUM across all transformers (not the sum). This is because all transformers operate on the same input data, and the union needs enough history to satisfy the most demanding transformer.

Useful for multi-scale feature engineering, such as combining short-term and long-term lag features, or mixing different preprocessing approaches in parallel.

All transformers must accept the same input time series with a time column.

Examples

>>> import polars as pl
>>> from datetime import datetime, timedelta
>>> from yohou.compose import FeatureUnion
>>> from yohou.preprocessing import LagTransformer
>>>
>>> # Create sample weekly time series data (52 weeks)
>>> time = pl.datetime_range(
...     start=datetime(2023, 1, 1),
...     end=datetime(2023, 1, 1) + timedelta(weeks=51),
...     interval="1w",
...     eager=True,
... )
>>> data = pl.DataFrame({"time": time, "demand": range(1, 53)})
>>>
>>> # Example 1: Combine short-term and long-term lags for multi-scale features
>>> union = FeatureUnion([
...     ("short_lags", LagTransformer(lag=[1, 2, 3])),
...     ("long_lags", LagTransformer(lag=[7, 14, 21])),
... ])
>>>
>>> # Example 2: Access transformers by name
>>> union.named_transformers["short_lags"]
LagTransformer(...)
>>>
>>> # Example 3: Access transformers by position
>>> union[0]
LagTransformer(...)

Source Code

Show/Hide source
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
class FeatureUnion(BaseTransformer, _BaseComposition):
    """Concatenates results of multiple transformer objects.

    This estimator applies a list of transformer objects in parallel to the
    input data, then concatenates the results. This is useful to combine
    several feature extraction mechanisms into a single transformer.

    Parameters of the transformers may be set using its name and the parameter
    name separated by a '__'. A transformer may be replaced entirely by
    setting the parameter with its name to another transformer, removed by
    setting to 'drop' or disabled by setting to 'passthrough' (features are
    passed without transformation).

    Parameters
    ----------
    transformer_list : list of (str, transformer) tuples
        List of transformer objects to be applied to the data. The first
        half of each tuple is the name of the transformer. The transformer can
        be 'drop' for it to be ignored or can be 'passthrough' for features to
        be passed unchanged.

    n_jobs : int, default=None
        Number of jobs to run in parallel.
        ``None`` means 1 unless in a ``joblib.parallel_backend`` context.
        ``-1`` means using all processors.

    transformer_weights : dict, default=None
        Multiplicative weights for features per transformer.
        Keys are transformer names, values the weights.
        Raises ValueError if key not present in ``transformer_list``.

    verbose : bool, default=False
        If True, the time elapsed while fitting each transformer will be
        printed as it is completed.

    verbose_feature_names_out : bool, default=True
        If True, `get_feature_names_out` will prefix all feature names
        with the name of the transformer that generated that feature
        using a single underscore separator (e.g., ``lags_sales``).
        For panel data columns, the prefix is inserted after the group
        separator to preserve panel structure
        (e.g., ``store_1__lags_sales``).
        If False, `get_feature_names_out` will not prefix any feature
        names and will error if feature names are not unique.

    Attributes
    ----------
    named_transformers : `Bunch`
        Dictionary-like object, with the following attributes.
        Read-only attribute to access any transformer parameter by user
        given name. Keys are transformer names and values are
        transformer parameters.

    n_features_in_ : int
        Number of features seen during ``fit``. Only defined if the
        underlying first transformer in `transformer_list` exposes such an
        attribute when fit.

    feature_names_in_ : ndarray of shape (`n_features_in_`,)
        Names of features seen during ``fit``. Defined only when
        `X` has feature names that are all strings.

    See Also
    --------
    `sklearn.pipeline.FeatureUnion` : Underlying scikit-learn feature union class.
    - [`FeaturePipeline`][yohou.compose.feature_pipeline.FeaturePipeline] : Sequential transformer chaining.
    - [`BaseTransformer`][yohou.base.transformer.BaseTransformer] : Base class for transformers.
    - [`LagTransformer`][yohou.preprocessing.window.LagTransformer] : Common transformer for lag features.

    Notes
    -----
    Transformers run in parallel when `n_jobs` is set to a value other than 1.
    This can significantly improve performance for computationally expensive transformers.

    Results are concatenated horizontally with automatic time alignment. The
    internal `_hstack()` function handles transformers with different observation
    horizons by aligning their outputs to the maximum observation horizon.

    The `observation_horizon` property returns the MAXIMUM across all transformers
    (not the sum). This is because all transformers operate on the same input data,
    and the union needs enough history to satisfy the most demanding transformer.

    Useful for multi-scale feature engineering, such as combining short-term and
    long-term lag features, or mixing different preprocessing approaches in parallel.

    All transformers must accept the same input time series with a `time` column.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime, timedelta
    >>> from yohou.compose import FeatureUnion
    >>> from yohou.preprocessing import LagTransformer
    >>>
    >>> # Create sample weekly time series data (52 weeks)
    >>> time = pl.datetime_range(
    ...     start=datetime(2023, 1, 1),
    ...     end=datetime(2023, 1, 1) + timedelta(weeks=51),
    ...     interval="1w",
    ...     eager=True,
    ... )
    >>> data = pl.DataFrame({"time": time, "demand": range(1, 53)})
    >>>
    >>> # Example 1: Combine short-term and long-term lags for multi-scale features
    >>> union = FeatureUnion([
    ...     ("short_lags", LagTransformer(lag=[1, 2, 3])),
    ...     ("long_lags", LagTransformer(lag=[7, 14, 21])),
    ... ])
    >>>
    >>> # Example 2: Access transformers by name
    >>> union.named_transformers["short_lags"]  # doctest: +ELLIPSIS
    LagTransformer(...)
    >>>
    >>> # Example 3: Access transformers by position
    >>> union[0]  # doctest: +ELLIPSIS
    LagTransformer(...)

    """

    _required_parameters = ["transformer_list"]

    def get_params(self, deep: bool = True) -> dict[str, Any]:
        """Get parameters for this estimator.

        Parameters
        ----------
        deep : bool, default=True
            If True, will return the parameters for this estimator and
            contained subobjects that are estimators.

        Returns
        -------
        params : dict[str, Any]
            Parameter names mapped to their values.

        """
        return _BaseComposition._get_params(self, attr="transformer_list", deep=deep)

    def set_params(self, **params: Any) -> "FeatureUnion":
        """Set the parameters of this estimator.

        Parameters
        ----------
        **params : dict
            Estimator parameters.

        Returns
        -------
        self : FeatureUnion
            FeatureUnion instance.

        """
        _BaseComposition._set_params(self, attr="transformer_list", **params)
        return self

    def _iter(self) -> Iterator[tuple[str, Any, float]]:
        """Generate (name, trans, weight) tuples excluding None and 'drop' transformers.

        Yields
        ------
        name : str
            Transformer name.
        trans : Any
            Transformer instance.
        weight : float
            Transformer weight.

        """
        return sklearn_FeatureUnion._iter(self)  # ty: ignore[invalid-argument-type]

    def __getitem__(self, ind: int | str | slice) -> Any:
        """Return a sub-union or a single transformer.

        Parameters
        ----------
        ind : int, str, or slice
            Index, name, or slice of the transformer to retrieve.

        Returns
        -------
        transformer : Any
            The transformer or sub-union.

        """
        if isinstance(ind, slice):
            if ind.step is not None:
                raise ValueError("FeatureUnion slicing only supports a step of 1")
            return self.__class__(
                transformer_list=self.transformer_list[ind],
                n_jobs=self.n_jobs,
                transformer_weights=self.transformer_weights,
                verbose=self.verbose,
            )
        elif isinstance(ind, int):
            _, est = self.transformer_list[ind]
            return est
        else:
            # String case - get by name
            return self.named_transformers[ind]

    @property
    def named_transformers(self) -> Bunch:
        """Access the transformers by name.

        Returns
        -------
        named_transformers : Bunch
            Dictionary-like object with transformer names as keys.

        """
        return Bunch(**dict(self.transformer_list))

    def _log_message(self, name: str, idx: int, total: int) -> str:
        """Get log message for a transformer.

        Parameters
        ----------
        name : str
            Transformer name.
        idx : int
            Current index.
        total : int
            Total number of transformers.

        Returns
        -------
        message : str
            Log message.

        """
        return f"(step {idx} of {total}) Processing {name}"

    def _parallel_func(self, X: pl.DataFrame, y: pl.DataFrame | None, func: Any, routed_params: Any) -> Any:
        """Run func in parallel on X and y.

        Parameters
        ----------
        X : pl.DataFrame
            Input data.
        y : pl.DataFrame | None
            Target data.
        func : Any
            Function to apply.
        routed_params : Any
            Routed parameters.

        Returns
        -------
        results : Any
            Results from parallel execution.

        """
        return sklearn_FeatureUnion._parallel_func(self, X, y, func, routed_params)  # ty: ignore[invalid-argument-type]

    def _update_transformer_list(self, transformers: Any) -> None:
        """Update transformer_list with fitted transformers.

        Parameters
        ----------
        transformers : Any
            Fitted transformers.

        """
        transformers_iter = iter(transformers)
        self.transformer_list[:] = [
            (name, next(transformers_iter) if old is not None else None) for name, old in self.transformer_list
        ]

    def get_feature_names_out(self, input_features: list[str] | None = None) -> Any:
        """Get output feature names.

        Parameters
        ----------
        input_features : list[str] | None, default=None
            Input feature names.

        Returns
        -------
        feature_names_out : Any
            Output feature names.

        """
        return super().get_feature_names_out(input_features)

    @property
    def n_features_in_(self) -> int:
        """Number of features seen during fit.

        Returns
        -------
        n_features_in_ : int
            Number of input features.

        """
        # Delegate to first transformer
        for _, trans in self.transformer_list:
            if hasattr(trans, "n_features_in_"):
                return trans.n_features_in_
        raise AttributeError("n_features_in_ not available")

    @property
    def feature_names_in_(self) -> Any:
        """Names of features seen during fit.

        Returns
        -------
        feature_names_in_ : Any
            Names of input features.

        """
        for _, trans in self.transformer_list:
            if hasattr(trans, "feature_names_in_"):
                return trans.feature_names_in_
        raise AttributeError("feature_names_in_ not available")

    def _add_prefix_for_feature_names_out(self, feature_names_out: list[list[str]]) -> list[str]:
        """Add prefixes to feature names.

        Uses single underscore ``_`` as separator (not ``__``) to avoid
        conflicts with the panel data ``<GROUP>__<SERIES>`` convention.
        For panel columns, the prefix is inserted after the group separator
        (e.g., ``store_1__lags_sales``).

        Parameters
        ----------
        feature_names_out : list[list[str]]
            Feature names from each transformer.

        Returns
        -------
        prefixed_names : list[str]
            Feature names with prefixes.

        """
        return [panel_aware_prefix(col, name) for name, cols in feature_names_out for col in cols]

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with yohou-specific attributes.

        """
        tags = super().__sklearn_tags__()

        # Aggregate tags from transformers (static capability check)
        if hasattr(self, "transformer_list") and self.transformer_list is not None:
            transformers = [t for _, t in self.transformer_list if t not in ("drop", "passthrough") and t is not None]
            if transformers:
                assert tags.transformer_tags is not None
                assert tags.input_tags is not None
                # Stateful if any transformer is stateful
                tags.transformer_tags.stateful = any(
                    t.__sklearn_tags__().transformer_tags.stateful for t in transformers
                )

                # Not invertible unless there is only one transformer and it is invertible
                tags.transformer_tags.invertible = (
                    len(transformers) == 1 and transformers[0].__sklearn_tags__().transformer_tags.invertible
                )

                # Aggregate min_value: take the maximum (most restrictive)
                # All transformers receive the same input, so we need to satisfy all constraints
                min_values = [t.__sklearn_tags__().input_tags.min_value for t in transformers]
                non_none_min_values = [v for v in min_values if v is not None]
                tags.input_tags.min_value = max(non_none_min_values) if non_none_min_values else None

        return tags

    def __sklearn_is_fitted__(self) -> bool:
        """Check if fitted.

        Returns
        -------
        is_fitted : bool
            True if the union is fitted.

        """
        return sklearn_FeatureUnion.__sklearn_is_fitted__(self)  # ty: ignore[invalid-argument-type]

    def _sk_visual_block_(self) -> Any:
        """Get visual block representation.

        Returns
        -------
        visual_block : Any
            Visual block representation.

        """
        return sklearn_FeatureUnion._sk_visual_block_(self)  # ty: ignore[invalid-argument-type]

    def _get_observation_horizons(self) -> list[int]:
        """Get observation horizons from all transformers.

        Returns
        -------
        observation_horizons : list[int]
            List of observation horizons from each transformer.

        """
        observation_horizons = []
        for _, t, _ in self._iter():
            observation_horizon = 0
            if t != "passthrough" and t is not None and hasattr(t, "observation_horizon"):
                observation_horizon = t.observation_horizon

            observation_horizons.append(observation_horizon)

        return observation_horizons

    @property
    def observation_horizon(self) -> int:
        """Maximum observation horizon across all transformers.

        Returns
        -------
        int
            Maximum observation horizon needed.

        Raises
        ------
        NotFittedError
            If the feature union has not been fitted yet.

        """
        check_is_fitted(self)

        observation_horizons = self._get_observation_horizons()
        observation_horizon = max(observation_horizons, default=0)

        return observation_horizon

    _parameter_constraints: dict = {
        "transformer_list": [list],
        "n_jobs": [numbers.Integral, None],
        "transformer_weights": [dict, None],
        "verbose": ["boolean"],
        "verbose_feature_names_out": ["boolean"],
    }

    def __init__(
        self,
        transformer_list: list[tuple[str, Any]],
        *,
        n_jobs: int | None = None,
        transformer_weights: dict[str, float] | None = None,
        verbose: bool = False,
        verbose_feature_names_out: bool = True,
    ) -> None:
        self.transformer_list = transformer_list
        self.n_jobs = n_jobs
        self.transformer_weights = transformer_weights
        self.verbose = verbose
        self.verbose_feature_names_out = verbose_feature_names_out

    def _validate_transformers(self) -> None:
        """Validate all transformers are BaseTransformer instances.

        Raises
        ------
        TypeError
            If any transformer is invalid.

        """
        names, transformers = zip(*self.transformer_list, strict=False)

        # validate names
        self._validate_names(names)

        # validate estimators
        for t in transformers:
            if t in ("drop", "passthrough"):
                continue
            if not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not hasattr(t, "transform"):
                raise TypeError(f"All estimators should implement fit and transform. '{t}' (type {type(t)}) doesn't")

    def _validate_transformer_weights(self) -> None:
        """Validate transformer weights dictionary.

        Raises
        ------
        ValueError
            If weight keys don't match transformer names.

        """
        if not self.transformer_weights:
            return

        transformer_names = {name for name, _ in self.transformer_list}
        for name in self.transformer_weights:
            if name not in transformer_names:
                raise ValueError(
                    f'Attempting to weight transformer "{name}", but it is not present in transformer_list.'
                )

    def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **fit_params: Any) -> "FeatureUnion":
        """Fit all transformers using X.

        Parameters
        ----------
        X : iterable or array-like, depending on transformers
            Input data, used to fit transformers.

        y : array-like of shape (n_samples, n_outputs), default=None
            Targets for supervised learning.

        **fit_params : dict, default=None
            - If `enable_metadata_routing=False` (default):
              Parameters directly passed to the `fit` methods of the
              sub-transformers.

            - If `enable_metadata_routing=True`:
              Parameters safely routed to the `fit` methods of the
              sub-transformers. See the sklearn Metadata Routing User Guide
              for more details.

        Returns
        -------
        self : object
            FeatureUnion class instance.
        """
        _raise_for_params(fit_params, self, "fit")
        routed_params = process_routing(self, "fit", **fit_params)
        transformers = self._parallel_func(X, y, _fit_one, routed_params)

        if not transformers:
            # All transformers are None
            return self

        self._update_transformer_list(transformers)
        return self

    def fit_transform(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: object) -> pl.DataFrame:
        """Fit all transformers, transform the data and concatenate results.

        Parameters
        ----------
        X : iterable or array-like, depending on transformers
            Input data to be transformed.

        y : array-like of shape (n_samples, n_outputs), default=None
            Targets for supervised learning.

        **params : dict, default=None
            - If `enable_metadata_routing=False` (default):
              Parameters directly passed to the `fit` methods of the
              sub-transformers.

            - If `enable_metadata_routing=True`:
              Parameters safely routed to the `fit` methods of the
              sub-transformers. See the sklearn Metadata Routing User Guide
              for more details.

        Returns
        -------
        X_t : array-like or sparse matrix of \
                shape (n_samples, sum_n_components)
            The `hstack` of results of transformers. `sum_n_components` is the
            sum of `n_components` (output dimension) over transformers.
        """
        routed_params = process_routing(self, "fit_transform", **params)
        results = self._parallel_func(X, y, _fit_transform_one, routed_params)
        if not results:
            # All transformers are None
            time = X.select(cs.by_name("time"))
            return time

        Xs, transformers = zip(*results, strict=False)
        self._update_transformer_list(transformers)

        # Extract actual column names from each DataFrame (excluding time)
        transformer_names = [name for name, _, _ in self._iter()]
        raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

        # Apply prefixes if verbose_feature_names_out is True
        if self.verbose_feature_names_out:
            column_names = []
            for name, cols in zip(transformer_names, raw_column_names, strict=False):
                column_names.append([panel_aware_prefix(col, name) for col in cols])
        else:
            column_names = raw_column_names
            # Check for duplicates
            flat_names = [col for cols in column_names for col in cols]
            counts = Counter(flat_names)
            duplicates = [name for name, count in counts.items() if count > 1]
            if duplicates:
                raise ValueError(
                    f"Duplicate feature names found: {duplicates}. "
                    "Either use transformers that produce unique names or set "
                    "verbose_feature_names_out=True to add transformer name prefixes."
                )

        result = _hstack(
            list(Xs),
            column_names=column_names,
            observation_horizons=self._get_observation_horizons(),
        )
        return result

    def transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Transform X separately by each transformer, concatenate results.

        Parameters
        ----------
        X : iterable or array-like, depending on transformers
            Input data to be transformed.

        **params : dict, default=None
            Parameters routed to the `transform` method of the sub-transformers via the
            metadata routing API. See [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) for more details.

        Returns
        -------
        X_t : array-like or sparse matrix of shape (n_samples, sum_n_components)
            The `hstack` of results of transformers. `sum_n_components` is the
            sum of `n_components` (output dimension) over transformers.
        """
        _raise_for_params(params, self, "transform")
        routed_params = process_routing(self, "transform", **params)

        Xs = Parallel(n_jobs=self.n_jobs)(
            delayed(_transform_one)(trans, X, None, weight, routed_params[name]) for name, trans, weight in self._iter()
        )
        if not Xs:
            # All transformers are None
            time = X.select(cs.by_name("time"))
            return time

        # Extract actual column names from each DataFrame (excluding time)
        transformer_names = [name for name, _, _ in self._iter()]
        raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

        # Apply prefixes if verbose_feature_names_out is True
        if self.verbose_feature_names_out:
            column_names = []
            for name, cols in zip(transformer_names, raw_column_names, strict=False):
                column_names.append([panel_aware_prefix(col, name) for col in cols])
        else:
            column_names = raw_column_names

        result = _hstack(
            Xs,
            column_names=column_names,
            observation_horizons=self._get_observation_horizons(),
        )
        return result

    def observe_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Observe and transform X in parallel for each transformer, concatenate results.

        This method atomically observes each transformer with new data and
        transforms it in parallel. The transformation uses the pre-observe state,
        then updates the memory. This is more efficient and correct than calling
        observe() then transform() separately.

        Parameters
        ----------
        X : pl.DataFrame
            New data to observe with and transform.

        **params : dict, default=None
            Parameters routed to the `transform` methods of the sub-transformers
            via the metadata routing API. See [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) for more details.

        Returns
        -------
        X_t : pl.DataFrame
            Horizontally stacked results of transformers, aligned by observation horizons.

        """
        _raise_for_params(params, self, "observe_transform")
        routed_params = process_routing(self, "observe_transform", **params)

        # Parallel execution of observe_transform on all transformers
        Xs = Parallel(n_jobs=self.n_jobs)(
            delayed(_observe_transform_one)(trans, X, None, weight, routed_params[name])
            for name, trans, weight in self._iter()
        )

        if not Xs:
            # All transformers are None
            time = X.select(cs.by_name("time"))
            return time

        # Extract actual column names from each DataFrame (excluding time)
        transformer_names = [name for name, _, _ in self._iter()]
        raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

        # Apply prefixes if verbose_feature_names_out is True
        if self.verbose_feature_names_out:
            column_names = []
            for name, cols in zip(transformer_names, raw_column_names, strict=False):
                column_names.append([panel_aware_prefix(col, name) for col in cols])
        else:
            column_names = raw_column_names

        result = _hstack(
            Xs,
            column_names=column_names,
            # observe_transform returns the same number of rows as the input
            # for every sub-transformer (alignment is handled internally via
            # each transformer's observation memory), so no observation-horizon
            # trimming is needed here.
            observation_horizons=[0] * len(Xs),
        )

        return result

    def rewind_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Rewind and transform X in parallel for each transformer, concatenate results.

        This method applies rewind_transform semantics to each transformer in parallel:
        transforms from scratch without using pre-existing memory, discards warmup rows,
        and rewinds the internal state with the input data.

        Parameters
        ----------
        X : pl.DataFrame
            Data to transform and use for rewinding state.

        **params : dict, default=None
            Parameters routed to the `rewind_transform` methods of the sub-transformers
            via the metadata routing API. See [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) for more details.

        Returns
        -------
        X_t : pl.DataFrame
            Horizontally stacked results of transformers, aligned by observation horizons,
            with warmup rows discarded.

        """
        _raise_for_params(params, self, "rewind_transform")
        routed_params = process_routing(self, "rewind_transform", **params)

        # Parallel execution of rewind_transform on all transformers
        Xs = Parallel(n_jobs=self.n_jobs)(
            delayed(_rewind_transform_one)(trans, X, None, weight, routed_params[name])
            for name, trans, weight in self._iter()
        )

        if not Xs:
            # All transformers are None
            time = X.select(cs.by_name("time"))
            return time

        # Extract actual column names from each DataFrame (excluding time)
        transformer_names = [name for name, _, _ in self._iter()]
        raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

        # Apply prefixes if verbose_feature_names_out is True
        if self.verbose_feature_names_out:
            column_names = []
            for name, cols in zip(transformer_names, raw_column_names, strict=False):
                column_names.append([panel_aware_prefix(col, name) for col in cols])
        else:
            column_names = raw_column_names

        result = _hstack(
            Xs,
            column_names=column_names,
            observation_horizons=self._get_observation_horizons(),
        )

        return result

    def get_metadata_routing(self) -> MetadataRouter:
        """Get metadata routing of this object.

        Please check [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) on how the routing
        mechanism works.

        Returns
        -------
        routing : MetadataRouter
            A `MetadataRouter` encapsulating
            routing information.
        """
        router = MetadataRouter(owner=self)

        for name, transformer in self.transformer_list:
            router.add(
                **{name: transformer},
                method_mapping=MethodMapping()
                .add(caller="fit", callee="fit")
                .add(caller="fit_transform", callee="fit_transform")
                .add(caller="fit_transform", callee="fit")
                .add(caller="fit_transform", callee="transform")
                .add(caller="transform", callee="transform"),
            )

        return router

Methods

named_transformers property

Access the transformers by name.

Returns
Name Type Description
named_transformers Bunch

Dictionary-like object with transformer names as keys.

n_features_in_ property

Number of features seen during fit.

Returns
Name Type Description
n_features_in_ int

Number of input features.

feature_names_in_ property

Names of features seen during fit.

Returns
Name Type Description
feature_names_in_ Any

Names of input features.

observation_horizon property

Maximum observation horizon across all transformers.

Returns
Type Description
int

Maximum observation horizon needed.

Raises
Type Description
NotFittedError

If the feature union has not been fitted yet.

get_params(deep=True)

Get parameters for this estimator.

Parameters
Name Type Description Default
deep bool

If True, will return the parameters for this estimator and contained subobjects that are estimators.

True
Returns
Name Type Description
params dict[str, Any]

Parameter names mapped to their values.

Source Code
Show/Hide source
def get_params(self, deep: bool = True) -> dict[str, Any]:
    """Get parameters for this estimator.

    Parameters
    ----------
    deep : bool, default=True
        If True, will return the parameters for this estimator and
        contained subobjects that are estimators.

    Returns
    -------
    params : dict[str, Any]
        Parameter names mapped to their values.

    """
    return _BaseComposition._get_params(self, attr="transformer_list", deep=deep)

set_params(**params)

Set the parameters of this estimator.

Parameters
Name Type Description Default
**params dict

Estimator parameters.

{}
Returns
Name Type Description
self FeatureUnion

FeatureUnion instance.

Source Code
Show/Hide source
def set_params(self, **params: Any) -> "FeatureUnion":
    """Set the parameters of this estimator.

    Parameters
    ----------
    **params : dict
        Estimator parameters.

    Returns
    -------
    self : FeatureUnion
        FeatureUnion instance.

    """
    _BaseComposition._set_params(self, attr="transformer_list", **params)
    return self

__getitem__(ind)

Return a sub-union or a single transformer.

Parameters
Name Type Description Default
ind int, str, or slice

Index, name, or slice of the transformer to retrieve.

required
Returns
Name Type Description
transformer Any

The transformer or sub-union.

Source Code
Show/Hide source
def __getitem__(self, ind: int | str | slice) -> Any:
    """Return a sub-union or a single transformer.

    Parameters
    ----------
    ind : int, str, or slice
        Index, name, or slice of the transformer to retrieve.

    Returns
    -------
    transformer : Any
        The transformer or sub-union.

    """
    if isinstance(ind, slice):
        if ind.step is not None:
            raise ValueError("FeatureUnion slicing only supports a step of 1")
        return self.__class__(
            transformer_list=self.transformer_list[ind],
            n_jobs=self.n_jobs,
            transformer_weights=self.transformer_weights,
            verbose=self.verbose,
        )
    elif isinstance(ind, int):
        _, est = self.transformer_list[ind]
        return est
    else:
        # String case - get by name
        return self.named_transformers[ind]

get_feature_names_out(input_features=None)

Get output feature names.

Parameters
Name Type Description Default
input_features list[str] | None

Input feature names.

None
Returns
Name Type Description
feature_names_out Any

Output feature names.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> Any:
    """Get output feature names.

    Parameters
    ----------
    input_features : list[str] | None, default=None
        Input feature names.

    Returns
    -------
    feature_names_out : Any
        Output feature names.

    """
    return super().get_feature_names_out(input_features)

__sklearn_tags__()

Get estimator tags.

Returns
Type Description
Tags

Estimator tags with yohou-specific attributes.

Source Code
Show/Hide source
def __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with yohou-specific attributes.

    """
    tags = super().__sklearn_tags__()

    # Aggregate tags from transformers (static capability check)
    if hasattr(self, "transformer_list") and self.transformer_list is not None:
        transformers = [t for _, t in self.transformer_list if t not in ("drop", "passthrough") and t is not None]
        if transformers:
            assert tags.transformer_tags is not None
            assert tags.input_tags is not None
            # Stateful if any transformer is stateful
            tags.transformer_tags.stateful = any(
                t.__sklearn_tags__().transformer_tags.stateful for t in transformers
            )

            # Not invertible unless there is only one transformer and it is invertible
            tags.transformer_tags.invertible = (
                len(transformers) == 1 and transformers[0].__sklearn_tags__().transformer_tags.invertible
            )

            # Aggregate min_value: take the maximum (most restrictive)
            # All transformers receive the same input, so we need to satisfy all constraints
            min_values = [t.__sklearn_tags__().input_tags.min_value for t in transformers]
            non_none_min_values = [v for v in min_values if v is not None]
            tags.input_tags.min_value = max(non_none_min_values) if non_none_min_values else None

    return tags

__sklearn_is_fitted__()

Check if fitted.

Returns
Name Type Description
is_fitted bool

True if the union is fitted.

Source Code
Show/Hide source
def __sklearn_is_fitted__(self) -> bool:
    """Check if fitted.

    Returns
    -------
    is_fitted : bool
        True if the union is fitted.

    """
    return sklearn_FeatureUnion.__sklearn_is_fitted__(self)  # ty: ignore[invalid-argument-type]

fit(X, y=None, **fit_params)

Fit all transformers using X.

Parameters
Name Type Description Default
X iterable or array-like, depending on transformers

Input data, used to fit transformers.

required
y array-like of shape (n_samples, n_outputs)

Targets for supervised learning.

None
**fit_params dict
  • If enable_metadata_routing=False (default): Parameters directly passed to the fit methods of the sub-transformers.

  • If enable_metadata_routing=True: Parameters safely routed to the fit methods of the sub-transformers. See the sklearn Metadata Routing User Guide for more details.

None
Returns
Name Type Description
self object

FeatureUnion class instance.

Source Code
Show/Hide source
def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **fit_params: Any) -> "FeatureUnion":
    """Fit all transformers using X.

    Parameters
    ----------
    X : iterable or array-like, depending on transformers
        Input data, used to fit transformers.

    y : array-like of shape (n_samples, n_outputs), default=None
        Targets for supervised learning.

    **fit_params : dict, default=None
        - If `enable_metadata_routing=False` (default):
          Parameters directly passed to the `fit` methods of the
          sub-transformers.

        - If `enable_metadata_routing=True`:
          Parameters safely routed to the `fit` methods of the
          sub-transformers. See the sklearn Metadata Routing User Guide
          for more details.

    Returns
    -------
    self : object
        FeatureUnion class instance.
    """
    _raise_for_params(fit_params, self, "fit")
    routed_params = process_routing(self, "fit", **fit_params)
    transformers = self._parallel_func(X, y, _fit_one, routed_params)

    if not transformers:
        # All transformers are None
        return self

    self._update_transformer_list(transformers)
    return self

fit_transform(X, y=None, **params)

Fit all transformers, transform the data and concatenate results.

Parameters
Name Type Description Default
X iterable or array-like, depending on transformers

Input data to be transformed.

required
y array-like of shape (n_samples, n_outputs)

Targets for supervised learning.

None
**params dict
  • If enable_metadata_routing=False (default): Parameters directly passed to the fit methods of the sub-transformers.

  • If enable_metadata_routing=True: Parameters safely routed to the fit methods of the sub-transformers. See the sklearn Metadata Routing User Guide for more details.

None
Returns
Name Type Description
X_t array-like or sparse matrix of shape (n_samples, sum_n_components)

The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

Source Code
Show/Hide source
def fit_transform(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: object) -> pl.DataFrame:
    """Fit all transformers, transform the data and concatenate results.

    Parameters
    ----------
    X : iterable or array-like, depending on transformers
        Input data to be transformed.

    y : array-like of shape (n_samples, n_outputs), default=None
        Targets for supervised learning.

    **params : dict, default=None
        - If `enable_metadata_routing=False` (default):
          Parameters directly passed to the `fit` methods of the
          sub-transformers.

        - If `enable_metadata_routing=True`:
          Parameters safely routed to the `fit` methods of the
          sub-transformers. See the sklearn Metadata Routing User Guide
          for more details.

    Returns
    -------
    X_t : array-like or sparse matrix of \
            shape (n_samples, sum_n_components)
        The `hstack` of results of transformers. `sum_n_components` is the
        sum of `n_components` (output dimension) over transformers.
    """
    routed_params = process_routing(self, "fit_transform", **params)
    results = self._parallel_func(X, y, _fit_transform_one, routed_params)
    if not results:
        # All transformers are None
        time = X.select(cs.by_name("time"))
        return time

    Xs, transformers = zip(*results, strict=False)
    self._update_transformer_list(transformers)

    # Extract actual column names from each DataFrame (excluding time)
    transformer_names = [name for name, _, _ in self._iter()]
    raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

    # Apply prefixes if verbose_feature_names_out is True
    if self.verbose_feature_names_out:
        column_names = []
        for name, cols in zip(transformer_names, raw_column_names, strict=False):
            column_names.append([panel_aware_prefix(col, name) for col in cols])
    else:
        column_names = raw_column_names
        # Check for duplicates
        flat_names = [col for cols in column_names for col in cols]
        counts = Counter(flat_names)
        duplicates = [name for name, count in counts.items() if count > 1]
        if duplicates:
            raise ValueError(
                f"Duplicate feature names found: {duplicates}. "
                "Either use transformers that produce unique names or set "
                "verbose_feature_names_out=True to add transformer name prefixes."
            )

    result = _hstack(
        list(Xs),
        column_names=column_names,
        observation_horizons=self._get_observation_horizons(),
    )
    return result

transform(X, **params)

Transform X separately by each transformer, concatenate results.

Parameters
Name Type Description Default
X iterable or array-like, depending on transformers

Input data to be transformed.

required
**params dict

Parameters routed to the transform method of the sub-transformers via the metadata routing API. See Metadata Routing User Guide for more details.

None
Returns
Name Type Description
X_t array-like or sparse matrix of shape (n_samples, sum_n_components)

The hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

Source Code
Show/Hide source
def transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Transform X separately by each transformer, concatenate results.

    Parameters
    ----------
    X : iterable or array-like, depending on transformers
        Input data to be transformed.

    **params : dict, default=None
        Parameters routed to the `transform` method of the sub-transformers via the
        metadata routing API. See [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) for more details.

    Returns
    -------
    X_t : array-like or sparse matrix of shape (n_samples, sum_n_components)
        The `hstack` of results of transformers. `sum_n_components` is the
        sum of `n_components` (output dimension) over transformers.
    """
    _raise_for_params(params, self, "transform")
    routed_params = process_routing(self, "transform", **params)

    Xs = Parallel(n_jobs=self.n_jobs)(
        delayed(_transform_one)(trans, X, None, weight, routed_params[name]) for name, trans, weight in self._iter()
    )
    if not Xs:
        # All transformers are None
        time = X.select(cs.by_name("time"))
        return time

    # Extract actual column names from each DataFrame (excluding time)
    transformer_names = [name for name, _, _ in self._iter()]
    raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

    # Apply prefixes if verbose_feature_names_out is True
    if self.verbose_feature_names_out:
        column_names = []
        for name, cols in zip(transformer_names, raw_column_names, strict=False):
            column_names.append([panel_aware_prefix(col, name) for col in cols])
    else:
        column_names = raw_column_names

    result = _hstack(
        Xs,
        column_names=column_names,
        observation_horizons=self._get_observation_horizons(),
    )
    return result

observe_transform(X, **params)

Observe and transform X in parallel for each transformer, concatenate results.

This method atomically observes each transformer with new data and transforms it in parallel. The transformation uses the pre-observe state, then updates the memory. This is more efficient and correct than calling observe() then transform() separately.

Parameters
Name Type Description Default
X DataFrame

New data to observe with and transform.

required
**params dict

Parameters routed to the transform methods of the sub-transformers via the metadata routing API. See Metadata Routing User Guide for more details.

None
Returns
Name Type Description
X_t DataFrame

Horizontally stacked results of transformers, aligned by observation horizons.

Source Code
Show/Hide source
def observe_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Observe and transform X in parallel for each transformer, concatenate results.

    This method atomically observes each transformer with new data and
    transforms it in parallel. The transformation uses the pre-observe state,
    then updates the memory. This is more efficient and correct than calling
    observe() then transform() separately.

    Parameters
    ----------
    X : pl.DataFrame
        New data to observe with and transform.

    **params : dict, default=None
        Parameters routed to the `transform` methods of the sub-transformers
        via the metadata routing API. See [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) for more details.

    Returns
    -------
    X_t : pl.DataFrame
        Horizontally stacked results of transformers, aligned by observation horizons.

    """
    _raise_for_params(params, self, "observe_transform")
    routed_params = process_routing(self, "observe_transform", **params)

    # Parallel execution of observe_transform on all transformers
    Xs = Parallel(n_jobs=self.n_jobs)(
        delayed(_observe_transform_one)(trans, X, None, weight, routed_params[name])
        for name, trans, weight in self._iter()
    )

    if not Xs:
        # All transformers are None
        time = X.select(cs.by_name("time"))
        return time

    # Extract actual column names from each DataFrame (excluding time)
    transformer_names = [name for name, _, _ in self._iter()]
    raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

    # Apply prefixes if verbose_feature_names_out is True
    if self.verbose_feature_names_out:
        column_names = []
        for name, cols in zip(transformer_names, raw_column_names, strict=False):
            column_names.append([panel_aware_prefix(col, name) for col in cols])
    else:
        column_names = raw_column_names

    result = _hstack(
        Xs,
        column_names=column_names,
        # observe_transform returns the same number of rows as the input
        # for every sub-transformer (alignment is handled internally via
        # each transformer's observation memory), so no observation-horizon
        # trimming is needed here.
        observation_horizons=[0] * len(Xs),
    )

    return result

rewind_transform(X, **params)

Rewind and transform X in parallel for each transformer, concatenate results.

This method applies rewind_transform semantics to each transformer in parallel: transforms from scratch without using pre-existing memory, discards warmup rows, and rewinds the internal state with the input data.

Parameters
Name Type Description Default
X DataFrame

Data to transform and use for rewinding state.

required
**params dict

Parameters routed to the rewind_transform methods of the sub-transformers via the metadata routing API. See Metadata Routing User Guide for more details.

None
Returns
Name Type Description
X_t DataFrame

Horizontally stacked results of transformers, aligned by observation horizons, with warmup rows discarded.

Source Code
Show/Hide source
def rewind_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Rewind and transform X in parallel for each transformer, concatenate results.

    This method applies rewind_transform semantics to each transformer in parallel:
    transforms from scratch without using pre-existing memory, discards warmup rows,
    and rewinds the internal state with the input data.

    Parameters
    ----------
    X : pl.DataFrame
        Data to transform and use for rewinding state.

    **params : dict, default=None
        Parameters routed to the `rewind_transform` methods of the sub-transformers
        via the metadata routing API. See [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) for more details.

    Returns
    -------
    X_t : pl.DataFrame
        Horizontally stacked results of transformers, aligned by observation horizons,
        with warmup rows discarded.

    """
    _raise_for_params(params, self, "rewind_transform")
    routed_params = process_routing(self, "rewind_transform", **params)

    # Parallel execution of rewind_transform on all transformers
    Xs = Parallel(n_jobs=self.n_jobs)(
        delayed(_rewind_transform_one)(trans, X, None, weight, routed_params[name])
        for name, trans, weight in self._iter()
    )

    if not Xs:
        # All transformers are None
        time = X.select(cs.by_name("time"))
        return time

    # Extract actual column names from each DataFrame (excluding time)
    transformer_names = [name for name, _, _ in self._iter()]
    raw_column_names = [[col for col in X_t.columns if col != "time"] for X_t in Xs]

    # Apply prefixes if verbose_feature_names_out is True
    if self.verbose_feature_names_out:
        column_names = []
        for name, cols in zip(transformer_names, raw_column_names, strict=False):
            column_names.append([panel_aware_prefix(col, name) for col in cols])
    else:
        column_names = raw_column_names

    result = _hstack(
        Xs,
        column_names=column_names,
        observation_horizons=self._get_observation_horizons(),
    )

    return result

get_metadata_routing()

Get metadata routing of this object.

Please check Metadata Routing User Guide on how the routing mechanism works.

Returns
Name Type Description
routing MetadataRouter

A MetadataRouter encapsulating routing information.

Source Code
Show/Hide source
def get_metadata_routing(self) -> MetadataRouter:
    """Get metadata routing of this object.

    Please check [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) on how the routing
    mechanism works.

    Returns
    -------
    routing : MetadataRouter
        A `MetadataRouter` encapsulating
        routing information.
    """
    router = MetadataRouter(owner=self)

    for name, transformer in self.transformer_list:
        router.add(
            **{name: transformer},
            method_mapping=MethodMapping()
            .add(caller="fit", callee="fit")
            .add(caller="fit_transform", callee="fit_transform")
            .add(caller="fit_transform", callee="fit")
            .add(caller="fit_transform", callee="transform")
            .add(caller="transform", callee="transform"),
        )

    return router

Tutorials

The following example notebooks use this component:

  • How to Compose Features with FeatureUnion


    Data-Features

    Combine lag features, rolling statistics, EMA, and scaling in parallel with FeatureUnion and automatic observation horizon resolution.

    View · Open in marimo

  • How to Build a Feature Pipeline


    Data-Features

    Nest FeaturePipeline, FeatureUnion, and DecompositionPipeline for multi-level feature engineering with trend-season-residual decomposition.

    View · Open in marimo

  • How to Add Calendar, Fourier, and Holiday Features


    Data-Features

    Enrich your feature matrix with time-derived signals using CalendarFeatureTransformer, FourierFeatureTransformer, and HolidayFeatureTransformer.

    View · Open in marimo

  • How to Apply Window Transformations


    Data-Features

    Feature engineering with LagTransformer, RollingStatisticsTransformer, SlidingWindowFunctionTransformer, and ExponentialMovingAverage on time series data.

    View · Open in marimo

  • How to Build Panel Feature Pipelines


    Panel-Data

    Combine ColumnForecaster, FeaturePipeline, FeatureUnion, and DecompositionPipeline on panel data with per-group scoring on KDD Cup air quality.

    View · Open in marimo