Skip to content

ColumnTransformer

yohou.compose.column_transformer.ColumnTransformer

Bases: BaseTransformer, _BaseComposition

Applies transformers to columns of a polars DataFrame.

This estimator allows different columns or column subsets of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space.

This is useful for heterogeneous or columnar data, to combine several feature extraction mechanisms or transformations into a single transformer.

Parameters

Name Type Description Default
transformers list of tuples

List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.

name : str Like in FeaturePipeline and FeatureUnion, this allows the transformer and its parameters to be set using set_params and searched in grid search. transformer : {'drop', 'passthrough'} or estimator Estimator must support fit and transform. Special-cased strings 'drop' and 'passthrough' are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively. columns : str, array-like of str, int, array-like of int, array-like of bool, slice or callable Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where transformer expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above. To select multiple columns by name or dtype, you can use make_column_selector.

required
remainder (drop, passthrough)

By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of 'drop'). By specifying remainder='passthrough', all remaining columns that were not specified in transformers, but present in the data passed to fit will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during fit will be excluded from the output of transform. By setting remainder to be an estimator, the remaining non-specified columns will use the remainder estimator. The estimator must support fit and transform. Note that using this feature requires that the DataFrame columns input at fit and transform have identical order.

'drop'
n_jobs int

Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

None
transformer_weights dict

Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.

None
verbose bool

If True, the time elapsed while fitting each transformer will be printed as it is completed.

False
verbose_feature_names_out bool

If True, ColumnTransformer.get_feature_names_out will prefix all feature names with the name of the transformer that generated that feature. If False, ColumnTransformer.get_feature_names_out will not prefix any feature names and will error if feature names are not unique.

True

Attributes

Name Type Description
transformers_ list

The collection of fitted transformers as tuples of (name, fitted_transformer, column). fitted_transformer can be an estimator, or 'drop'; 'passthrough' is replaced with an equivalent FunctionTransformer. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: ('remainder', transformer, remaining_columns) corresponding to the remainder parameter. If there are remaining columns, then len(transformers_)==len(transformers)+1, otherwise len(transformers_)==len(transformers).

named_transformers_ `Bunch`

Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.

output_indices_ dict

A dictionary from each transformer name to a slice, where the slice corresponds to indices in the transformed output. This is useful to inspect which transformer is responsible for which transformed feature(s).

n_features_in_ int

Number of features seen during fit. Only defined if the underlying transformers expose such an attribute when fit.

feature_names_in_ ndarray of shape (`n_features_in_`,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

See Also

sklearn.compose.ColumnTransformer : Underlying scikit-learn column transformer. - FeaturePipeline : Sequential transformation. - BaseTransformer : Base transformer interface. - SeasonalDifferencing : Common column-wise transformer.

Notes

The order of the columns in the transformed feature matrix follows the order of how the columns are specified in the transformers list. Columns of the original feature matrix that are not specified are dropped from the resulting transformed feature matrix, unless specified in the passthrough keyword. Those columns specified with passthrough are added at the right to the output of the transformers.

Apply heterogeneous preprocessing to different columns, useful when different time series have different characteristics (e.g., different seasonal patterns).

Column selection by name (string) works seamlessly with polars DataFrames, allowing intuitive column-specific transformations.

Time alignment across columns with different observation horizons is handled automatically by the internal _hstack() function, ensuring all transformed columns are properly aligned in time.

Setting remainder='passthrough' (default is 'drop') preserves untransformed columns in the output, useful for keeping auxiliary columns that don't require transformation.

The verbose_feature_names_out parameter (default=True) prefixes output column names with transformer names using a single underscore separator (e.g., 'deseason_sales') to prevent name collisions when multiple transformers produce columns with the same names. For panel data columns, the prefix is inserted after the group separator to preserve panel structure (e.g., 'store_1__deseason_sales').

The observation_horizon property returns the MAXIMUM across all column transformers, as the transformer needs enough history to satisfy the most demanding column-specific transformation.

force_int_remainder_cols is a class attribute set to True for compatibility with sklearn versions that reference it internally.

All columns must share the same time index. The time column is automatically handled and preserved in the output.

Examples

>>> import polars as pl
>>> from datetime import datetime, timedelta
>>> from yohou.compose import ColumnTransformer
>>> from yohou.stationarity import SeasonalDifferencing, SeasonalLogDifferencing
>>>
>>> # Create sample weekly time series data with multiple columns (52 weeks)
>>> time = pl.datetime_range(
...     start=datetime(2023, 1, 1),
...     end=datetime(2023, 1, 1) + timedelta(weeks=51),
...     interval="1w",
...     eager=True
... )
>>> data = pl.DataFrame({
...     "time": time,
...     "sales": range(1, 53),
...     "temperature": range(10, 62)
... })
>>>
>>> # Example 1: Apply different seasonal differencing to different columns
>>> ct = ColumnTransformer([
...     ('sales_diff', SeasonalDifferencing(seasonality=4), 'sales'),
...     ('temp_diff', SeasonalDifferencing(seasonality=7), 'temperature')
... ])
>>>
>>> # Example 2: Use remainder='passthrough' to keep auxiliary columns
>>> ct_passthrough = ColumnTransformer(
...     [('sales_diff', SeasonalDifferencing(seasonality=4), 'sales')],
...     remainder='passthrough'
... )
>>>
>>> # Example 3: Disable verbose_feature_names_out for cleaner names
>>> ct_clean = ColumnTransformer(
...     [('diff', SeasonalDifferencing(seasonality=4), 'sales')],
...     verbose_feature_names_out=False
... )

Source Code

Show/Hide source
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
class ColumnTransformer(BaseTransformer, _BaseComposition):
    """Applies transformers to columns of a polars DataFrame.

    This estimator allows different columns or column subsets of the input
    to be transformed separately and the features generated by each transformer
    will be concatenated to form a single feature space.

    This is useful for heterogeneous or columnar data, to combine several
    feature extraction mechanisms or transformations into a single transformer.

    Parameters
    ----------
    transformers : list of tuples
        List of (name, transformer, columns) tuples specifying the
        transformer objects to be applied to subsets of the data.

        name : str
            Like in FeaturePipeline and FeatureUnion, this allows the transformer and
            its parameters to be set using ``set_params`` and searched in grid
            search.
        transformer : {'drop', 'passthrough'} or estimator
            Estimator must support ``fit`` and ``transform``.
            Special-cased strings 'drop' and 'passthrough' are accepted as
            well, to indicate to drop the columns or to pass them through
            untransformed, respectively.
        columns :  str, array-like of str, int, array-like of int, \
                array-like of bool, slice or callable
            Indexes the data on its second axis. Integers are interpreted as
            positional columns, while strings can reference DataFrame columns
            by name.  A scalar string or int should be used where
            ``transformer`` expects X to be a 1d array-like (vector),
            otherwise a 2d array will be passed to the transformer.
            A callable is passed the input data `X` and can return any of the
            above. To select multiple columns by name or dtype, you can use
            ``make_column_selector``.

    remainder : {'drop', 'passthrough'} or estimator, default='drop'
        By default, only the specified columns in `transformers` are
        transformed and combined in the output, and the non-specified
        columns are dropped. (default of ``'drop'``).
        By specifying ``remainder='passthrough'``, all remaining columns that
        were not specified in `transformers`, but present in the data passed
        to `fit` will be automatically passed through. This subset of columns
        is concatenated with the output of the transformers. For dataframes,
        extra columns not seen during `fit` will be excluded from the output
        of `transform`.
        By setting ``remainder`` to be an estimator, the remaining
        non-specified columns will use the ``remainder`` estimator. The
        estimator must support ``fit`` and ``transform``.
        Note that using this feature requires that the DataFrame columns
        input at ``fit`` and ``transform`` have identical order.

    n_jobs : int, default=None
        Number of jobs to run in parallel.
        ``None`` means 1 unless in a ``joblib.parallel_backend`` context.
        ``-1`` means using all processors.

    transformer_weights : dict, default=None
        Multiplicative weights for features per transformer. The output of the
        transformer is multiplied by these weights. Keys are transformer names,
        values the weights.

    verbose : bool, default=False
        If True, the time elapsed while fitting each transformer will be
        printed as it is completed.

    verbose_feature_names_out : bool, default=True
        If True, `ColumnTransformer.get_feature_names_out` will prefix
        all feature names with the name of the transformer that generated that
        feature.
        If False, `ColumnTransformer.get_feature_names_out` will not
        prefix any feature names and will error if feature names are not
        unique.

    Attributes
    ----------
    transformers_ : list
        The collection of fitted transformers as tuples of (name,
        fitted_transformer, column). `fitted_transformer` can be an estimator,
        or `'drop'`; `'passthrough'` is replaced with an equivalent
        `FunctionTransformer`. In case there were
        no columns selected, this will be the unfitted transformer. If there
        are remaining columns, the final element is a tuple of the form:
        ('remainder', transformer, remaining_columns) corresponding to the
        ``remainder`` parameter. If there are remaining columns, then
        ``len(transformers_)==len(transformers)+1``, otherwise
        ``len(transformers_)==len(transformers)``.

    named_transformers_ : `Bunch`
        Read-only attribute to access any transformer by given name.
        Keys are transformer names and values are the fitted transformer
        objects.

    output_indices_ : dict
        A dictionary from each transformer name to a slice, where the slice
        corresponds to indices in the transformed output. This is useful to
        inspect which transformer is responsible for which transformed
        feature(s).

    n_features_in_ : int
        Number of features seen during ``fit``. Only defined if the
        underlying transformers expose such an attribute when fit.

    feature_names_in_ : ndarray of shape (`n_features_in_`,)
        Names of features seen during ``fit``. Defined only when `X`
        has feature names that are all strings.

    See Also
    --------
    `sklearn.compose.ColumnTransformer` : Underlying scikit-learn column transformer.
    - [`FeaturePipeline`][yohou.compose.feature_pipeline.FeaturePipeline] : Sequential transformation.
    - [`BaseTransformer`][yohou.base.transformer.BaseTransformer] : Base transformer interface.
    - [`SeasonalDifferencing`][yohou.stationarity.transformers.SeasonalDifferencing] : Common column-wise transformer.

    Notes
    -----
    The order of the columns in the transformed feature matrix follows the
    order of how the columns are specified in the `transformers` list.
    Columns of the original feature matrix that are not specified are
    dropped from the resulting transformed feature matrix, unless specified
    in the `passthrough` keyword. Those columns specified with `passthrough`
    are added at the right to the output of the transformers.

    Apply heterogeneous preprocessing to different columns, useful when different
    time series have different characteristics (e.g., different seasonal patterns).

    Column selection by name (string) works seamlessly with polars DataFrames,
    allowing intuitive column-specific transformations.

    Time alignment across columns with different observation horizons is handled
    automatically by the internal `_hstack()` function, ensuring all transformed
    columns are properly aligned in time.

    Setting `remainder='passthrough'` (default is 'drop') preserves untransformed
    columns in the output, useful for keeping auxiliary columns that don't require
    transformation.

    The `verbose_feature_names_out` parameter (default=True) prefixes output column
    names with transformer names using a single underscore separator
    (e.g., 'deseason_sales') to prevent name collisions when multiple
    transformers produce columns with the same names. For panel data columns,
    the prefix is inserted after the group separator to preserve panel structure
    (e.g., 'store_1__deseason_sales').

    The `observation_horizon` property returns the MAXIMUM across all column
    transformers, as the transformer needs enough history to satisfy the most
    demanding column-specific transformation.

    ``force_int_remainder_cols`` is a class attribute set to ``True`` for
    compatibility with sklearn versions that reference it internally.

    All columns must share the same `time` index. The `time` column is automatically
    handled and preserved in the output.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime, timedelta
    >>> from yohou.compose import ColumnTransformer
    >>> from yohou.stationarity import SeasonalDifferencing, SeasonalLogDifferencing
    >>>
    >>> # Create sample weekly time series data with multiple columns (52 weeks)
    >>> time = pl.datetime_range(
    ...     start=datetime(2023, 1, 1),
    ...     end=datetime(2023, 1, 1) + timedelta(weeks=51),
    ...     interval="1w",
    ...     eager=True
    ... )
    >>> data = pl.DataFrame({
    ...     "time": time,
    ...     "sales": range(1, 53),
    ...     "temperature": range(10, 62)
    ... })
    >>>
    >>> # Example 1: Apply different seasonal differencing to different columns
    >>> ct = ColumnTransformer([
    ...     ('sales_diff', SeasonalDifferencing(seasonality=4), 'sales'),
    ...     ('temp_diff', SeasonalDifferencing(seasonality=7), 'temperature')
    ... ])
    >>>
    >>> # Example 2: Use remainder='passthrough' to keep auxiliary columns
    >>> ct_passthrough = ColumnTransformer(
    ...     [('sales_diff', SeasonalDifferencing(seasonality=4), 'sales')],
    ...     remainder='passthrough'
    ... )
    >>>
    >>> # Example 3: Disable verbose_feature_names_out for cleaner names
    >>> ct_clean = ColumnTransformer(
    ...     [('diff', SeasonalDifferencing(seasonality=4), 'sales')],
    ...     verbose_feature_names_out=False
    ... )
    """

    _parameter_constraints: dict[str, Any] = {
        "transformers": [list, Hidden(tuple)],
        "remainder": [
            StrOptions({"drop", "passthrough"}),
            HasMethods(["fit", "transform"]),
            HasMethods(["fit_transform", "transform"]),
        ],
        "n_jobs": [Integral, None],
        "transformer_weights": [dict, None],
        "verbose": ["verbose"],
        "verbose_feature_names_out": ["boolean"],
    }

    def get_params(self, deep: bool = True) -> dict[str, Any]:
        """Get parameters for this estimator.

        Parameters
        ----------
        deep : bool, default=True
            If True, will return the parameters for this estimator and
            contained subobjects that are estimators.

        Returns
        -------
        params : dict[str, Any]
            Parameter names mapped to their values.

        """
        return _BaseComposition._get_params(self, attr="transformers", deep=deep)

    def set_params(self, **params: Any) -> "ColumnTransformer":
        """Set the parameters of this estimator.

        Parameters
        ----------
        **params : dict
            Estimator parameters.

        Returns
        -------
        self : ColumnTransformer
            ColumnTransformer instance.

        """
        _BaseComposition._set_params(self, attr="transformers", **params)
        return self

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with yohou-specific attributes.

        """
        tags = super().__sklearn_tags__()

        # Aggregate tags from contained transformers (static capability check)
        if hasattr(self, "transformers") and self.transformers is not None:
            transformers = [t for _, t, _ in self.transformers if t not in ("drop", "passthrough") and t is not None]

            # Include remainder if it's an estimator
            if hasattr(self, "remainder") and self.remainder not in ("drop", "passthrough", None):
                transformers.append(self.remainder)

            if transformers:
                assert tags.transformer_tags is not None
                assert tags.input_tags is not None
                # Stateful if any transformer is stateful
                tags.transformer_tags.stateful = any(
                    t.__sklearn_tags__().transformer_tags.stateful for t in transformers
                )

                # Not invertible: column transformer cannot generally invert
                # since columns may be dropped or reordered
                tags.transformer_tags.invertible = False

                # Aggregate min_value: take the maximum (most restrictive)
                # All transformers receive subsets of the same input
                min_values = [t.__sklearn_tags__().input_tags.min_value for t in transformers]
                non_none_min_values = [v for v in min_values if v is not None]
                tags.input_tags.min_value = max(non_none_min_values) if non_none_min_values else None

        return tags

    @property
    def _transformers(self) -> list[tuple[str, Any, Any]]:
        """List of (name, fitted_transformer, column) tuples.

        Returns
        -------
        transformers : list[tuple[str, Any, Any]]
            The fitted transformers.

        """
        return sklearn_ColumnTransformer._transformers.fget(self)  # ty: ignore[invalid-argument-type]

    def _iter(
        self,
        fitted: bool = False,
        column_as_labels: bool = False,
        skip_drop: bool = False,
        skip_empty_columns: bool = True,
    ) -> Iterator[tuple[str, Any, Any, Any]]:
        """Generate (name, trans, column, weight) tuples.

        Parameters
        ----------
        fitted : bool, default=False
            Whether to iterate over fitted transformers.
        column_as_labels : bool, default=False
            Whether to return columns as labels.
        skip_drop : bool, default=False
            Whether to skip 'drop' transformers.
        skip_empty_columns : bool, default=True
            Whether to skip transformers with empty columns.

        Yields
        ------
        name : str
            Transformer name.
        trans : Any
            Transformer instance.
        column : Any
            Column specification.
        weight : Any
            Transformer weight.

        """
        return sklearn_ColumnTransformer._iter(
            self,  # ty: ignore[invalid-argument-type]
            fitted=fitted,
            column_as_labels=column_as_labels,
            skip_drop=skip_drop,
            skip_empty_columns=skip_empty_columns,
        )

    def __getitem__(self, ind: int | str | slice) -> Any:
        """Return a sub-transformer or a single transformer.

        Parameters
        ----------
        ind : int, str, or slice
            Index, name, or slice of the transformer to retrieve.

        Returns
        -------
        transformer : Any
            The transformer or sub-transformer.

        """
        if isinstance(ind, slice):
            if ind.step is not None:
                raise ValueError("ColumnTransformer slicing only supports a step of 1")
            return self.__class__(
                transformers=self.transformers[ind],
                remainder=self.remainder,
                n_jobs=self.n_jobs,
                transformer_weights=self.transformer_weights,
                verbose=self.verbose,
            )
        elif isinstance(ind, int):
            name, trans, _ = self.transformers[ind]
            # If fitted, use named_transformers_, otherwise return from transformers
            if hasattr(self, "named_transformers_"):
                return self.named_transformers_[name]
            return trans
        else:
            # String case - get by name
            if hasattr(self, "named_transformers_"):
                return self.named_transformers_[ind]
            # Not fitted yet, search in transformers list
            for name, trans, _ in self.transformers:
                if name == ind:
                    return trans
            raise KeyError(f"Transformer {ind} not found")

    def _log_message(self, name: str, idx: int, total: int) -> str:
        """Get log message for a transformer.

        Parameters
        ----------
        name : str
            Transformer name.
        idx : int
            Current index.
        total : int
            Total number of transformers.

        Returns
        -------
        message : str
            Log message.

        """
        return f"(step {idx} of {total}) Processing {name}"

    def _update_fitted_transformers(self, transformers: Any) -> None:
        """Update fitted transformers.

        Parameters
        ----------
        transformers : Any
            Fitted transformers.


        """
        # Directly use sklearn's implementation - it's tightly coupled with internal state
        sklearn_ColumnTransformer._update_fitted_transformers(self, transformers)  # ty: ignore[invalid-argument-type]

    def _get_feature_name_out_for_transformer(self, name: str, trans: Any, feature_names_in: Any) -> Any:
        """Get feature names for a transformer.

        Parameters
        ----------
        name : str
            Transformer name.
        trans : Any
            Transformer instance.
        feature_names_in : Any
            Input feature names.

        Returns
        -------
        feature_names_out : Any
            Output feature names.

        """
        return sklearn_ColumnTransformer._get_feature_name_out_for_transformer(
            cast(sklearn_ColumnTransformer, self),
            name,
            trans,
            feature_names_in,
        )

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names.

        Collects output feature names from each fitted sub-transformer,
        optionally prefixing them with the transformer name when
        ``verbose_feature_names_out`` is True.

        Parameters
        ----------
        input_features : list[str] | None, default=None
            Input feature names. If None, uses ``feature_names_in_`` from fit.

        Returns
        -------
        list of str
            Output feature names.

        """
        check_is_fitted(self, "transformers_")
        feature_names_out: list[str] = []
        for name, trans, columns in self.transformers_:  # ty: ignore[unresolved-attribute]
            if trans == "drop" or (isinstance(columns, list) and len(columns) == 0):
                continue
            col_list = list(columns) if isinstance(columns, list) else [columns]
            names: list[str] = col_list  # ty: ignore[invalid-assignment]
            if hasattr(trans, "get_feature_names_out"):
                result = trans.get_feature_names_out()
                if result is not None:
                    # Sub-transformers may include "time" in their output; strip it.
                    filtered = [f for f in result if f != "time"]
                    if filtered:
                        names = filtered
            if self.verbose_feature_names_out:
                names = [f"{name}_{f}" for f in names]
            feature_names_out.extend(names)
        return feature_names_out

    def _get_remainder_cols(self, indices: Any) -> Any:
        """Get remainder columns.

        Parameters
        ----------
        indices : Any
            Column indices.

        Returns
        -------
        remainder_cols : Any
            Remainder columns.

        """
        # Directly use sklearn's implementation - it calls _get_remainder_cols_dtype internally
        return sklearn_ColumnTransformer._get_remainder_cols(self, indices)  # ty: ignore[invalid-argument-type]

    def _get_remainder_cols_dtype(self) -> Any:
        """Get dtype of remainder columns.

        Returns
        -------
        dtype : Any
            Data type of remainder columns.

        """
        return sklearn_ColumnTransformer._get_remainder_cols_dtype(self)  # ty: ignore[invalid-argument-type]

    def _add_prefix_for_feature_names_out(self, feature_names_out: list) -> list[str]:
        """Add prefixes to feature names.

        Uses single underscore ``_`` as separator (not ``__``) to avoid
        conflicts with the panel data ``<GROUP>__<SERIES>`` convention.
        For panel columns, the prefix is inserted after the group separator
        (e.g., ``store_1__deseason_sales``).

        Parameters
        ----------
        feature_names_out : Any
            Feature names from transformers.

        Returns
        -------
        prefixed_names : Any
            Feature names with prefixes.

        """
        return [panel_aware_prefix(col, name) for name, cols in feature_names_out for col in cols]

    def _sk_visual_block_(self) -> Any:
        """Get visual block representation.

        Returns
        -------
        visual_block : Any
            Visual block representation.

        """
        return sklearn_ColumnTransformer._sk_visual_block_(self)  # ty: ignore[invalid-argument-type]

    def _validate_remainder(self, X: Any) -> None:
        """Validate remainder parameter.

        Parameters
        ----------
        X : Any
            Input data.

        """
        # Let sklearn handle validation completely
        sklearn_ColumnTransformer._validate_remainder(self, X)  # ty: ignore[invalid-argument-type]

    def _validate_column_callables(self, X: Any) -> None:
        """Validate column callables.

        Parameters
        ----------
        X : Any
            Input data.

        """
        # Let sklearn handle validation
        sklearn_ColumnTransformer._validate_column_callables(self, X)  # ty: ignore[invalid-argument-type]

    def _record_output_indices(self, Xs: Any) -> None:
        """Record output indices for each transformer.

        Parameters
        ----------
        Xs : Any
            Transformed outputs.

        """
        # Let sklearn handle recording
        sklearn_ColumnTransformer._record_output_indices(self, Xs)  # ty: ignore[invalid-argument-type]

    # Required by sklearn <1.8 _get_remainder_cols; unused by >=1.8.
    force_int_remainder_cols = FORCE_INT_REMAINDER_COLS

    def __init__(
        self,
        transformers: list[tuple[str, Any, Any]],
        *,
        remainder: str | Any = "drop",
        n_jobs: int | None = None,
        transformer_weights: dict[str, float] | None = None,
        verbose: bool = False,
        verbose_feature_names_out: bool = True,
    ) -> None:
        self.transformers = transformers
        self.remainder = remainder
        self.n_jobs = n_jobs
        self.transformer_weights = transformer_weights
        self.verbose = verbose
        self.verbose_feature_names_out = verbose_feature_names_out

    def _get_observation_horizons(self) -> list[int]:
        """Get observation horizons from all fitted transformers.

        Returns
        -------
        observation_horizons : list[int]
            List of observation horizons from each transformer.

        """
        observation_horizons = []
        for _, t, _, _ in self._iter(
            fitted=True,
            column_as_labels=True,
            skip_drop=False,
            skip_empty_columns=False,
        ):
            observation_horizon = 0
            if t not in ("drop", "passthrough") and t is not None and hasattr(t, "observation_horizon"):
                observation_horizon = t.observation_horizon

            observation_horizons.append(observation_horizon)

        return observation_horizons

    @property
    def observation_horizon(self) -> int:
        """Maximum observation horizon across all transformers.

        Returns
        -------
        int
            Maximum observation horizon needed.

        Raises
        ------
        NotFittedError
            If the column transformer has not been fitted yet.

        """
        check_is_fitted(self)

        observation_horizons = self._get_observation_horizons()
        observation_horizon = max(observation_horizons)

        return observation_horizon

    @property
    def named_transformers_(self) -> Bunch:
        """Access the fitted transformer by name.

        Read-only attribute to access any transformer by given name.
        Keys are transformer names and values are the fitted transformer
        objects.

        Returns
        -------
        Bunch
            Dict-like object of fitted transformers keyed by name.

        """
        transformers = getattr(self, "transformers_", self.transformers)
        return Bunch(**{name: trans for name, trans, _ in transformers})

    def _validate_transformers(self) -> None:
        """Validate names of transformers and the transformers themselves.

        This checks whether given transformers have the required methods, i.e.
        `fit` or `fit_transform` and `transform` implemented.
        """
        if not self.transformers:
            return

        names, transformers, _ = zip(*self.transformers, strict=False)

        # validate names
        self._validate_names(names)

        # validate estimators
        for t in transformers:
            if t == "passthrough":
                continue
            if not isinstance(t, BaseTransformer):
                # Used to validate the transformers in the `transformers` list
                raise TypeError(
                    "All estimators should be instances of `BaseTransformer` "
                    "or be the string 'passthrough' "
                    f"'{t}' (type {type(t)}) doesn't"
                )

    def _call_func_on_transformers(
        self,
        X: pl.DataFrame,
        y: pl.DataFrame | None,
        func: Callable,
        column_as_labels: bool,
        routed_params: dict[str, dict[str, dict[str, Any]]],
        time_column: pl.DataFrame | None = None,
    ) -> list[pl.DataFrame]:
        """
        Private function to fit and/or transform on demand.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            The data to be used in fit and/or transform. Should NOT include "time" column.

        y : array-like of shape (n_samples,)
            Targets.

        func : callable
            Function to call, which can be _fit_transform_one or
            _transform_one.

        column_as_labels : bool
            Used to iterate through transformers. If True, columns are returned
            as strings. If False, columns are returned as they were given by
            the user. Can be True only if the ``ColumnTransformer`` is already
            fitted.

        routed_params : dict
            The routed parameters as the output from ``process_routing``.

        time_column : pl.DataFrame, optional
            The time column to concatenate with each transformer's input.
            If None, uses self._time_column_ (set during fit).

        Returns
        -------
        Return value (transformers and/or transformed X data) depends
        on the passed function.
        """
        # Use provided time_column or fall back to stored one from fit
        if time_column is None:
            time_column = self._time_column_

        fitted = func is not _fit_transform_one

        def safe_indexing(X: pl.DataFrame, columns: object, axis: int) -> object:
            """Safe indexing helper for polars DataFrames."""
            Xi = _safe_indexing(X, columns, axis=axis)

            if isinstance(Xi, pl.Series):
                Xi = Xi.to_frame()

            return Xi

        transformers = list(
            self._iter(
                fitted=fitted,
                column_as_labels=column_as_labels,
                skip_drop=True,
                skip_empty_columns=True,
            )
        )
        try:
            jobs = []
            for idx, (name, trans, column, weight) in enumerate(transformers, start=1):
                transformer_to_use = trans
                if func is _fit_transform_one:
                    if transformer_to_use == "passthrough":
                        output_config = _get_output_config("transform", self)
                        transformer_to_use = FunctionTransformer(
                            check_inverse=False,
                            feature_names_out="one-to-one",
                        ).set_output(transform=output_config["dense"])

                    extra_args = {
                        "message_clsname": "ColumnTransformer",
                        "message": self._log_message(name, idx, len(transformers)),
                    }
                else:  # func is _transform_one
                    extra_args = {}
                jobs.append(
                    delayed(func)(
                        transformer=clone(transformer_to_use) if not fitted else transformer_to_use,
                        X=pl.concat(
                            [time_column, safe_indexing(X, column, axis=1)],
                            how="horizontal",
                        ),
                        y=y,
                        weight=weight,
                        **extra_args,
                        params=routed_params[name],
                    )
                )

            return Parallel(n_jobs=self.n_jobs)(jobs)

        except ValueError as e:
            if "Expected 2D array, got 1D array instead" in str(e):
                raise ValueError(_ERR_MSG_1DCOLUMN) from e
            else:
                raise

    def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> "ColumnTransformer":
        """Fit all transformers using X.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            Input data, of which specified subsets are used to fit the
            transformers.

        y : array-like of shape (n_samples,...), default=None
            Targets for supervised learning.

        **params : dict, default=None
            Parameters to be passed to the underlying transformers' ``fit`` and
            ``transform`` methods.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        self : ColumnTransformer
            This estimator.
        """
        _raise_for_params(params, self, "fit")
        # we use fit_transform to make sure to set sparse_output_ (for which we
        # need the transformed data) to have consistent output type in predict
        self.fit_transform(X, y=y, **params)
        return self

    @_fit_context(
        # estimators in ColumnTransformer.transformers are not validated yet
        prefer_skip_nested_validation=False
    )
    def fit_transform(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> pl.DataFrame:
        """Fit all transformers, transform the data and concatenate results.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            Input data, of which specified subsets are used to fit the
            transformers.

        y : array-like of shape (n_samples,), default=None
            Targets for supervised learning.

        **params : dict, default=None
            Parameters to be passed to the underlying transformers' ``fit`` and
            ``transform`` methods.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        X_t : {array-like, sparse matrix} of \
                shape (n_samples, sum_n_components)
            Horizontally stacked results of transformers. sum_n_components is the
            sum of n_components (output dimension) over transformers. If
            any result is a sparse matrix, everything will be converted to
            sparse matrices.
        """
        _raise_for_params(params, self, "fit_transform")

        X = _check_X(X)

        # Strip time column early - before sklearn validation which stores column indices
        # This ensures all column references are to the non-time columns
        self._time_column_ = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        # Set feature_names_in_ and n_features_in_ on the stripped data
        _check_feature_names(self, X_no_time, reset=True)
        _check_n_features(self, X_no_time, reset=True)
        self._validate_transformers()
        n_samples = _num_samples(X_no_time)

        self._validate_column_callables(X_no_time)
        self._validate_remainder(X_no_time)

        routed_params = process_routing(self, "fit_transform", **params)

        result = self._call_func_on_transformers(
            X_no_time,
            y,
            _fit_transform_one,
            column_as_labels=False,
            routed_params=routed_params,
        )

        if not result:
            self._update_fitted_transformers([])
            # All transformers are None
            return self._time_column_

        Xs, transformers = zip(*result, strict=False)

        self.sparse_output_ = False

        self._update_fitted_transformers(transformers)
        self._record_output_indices(Xs)

        result = self._hstack(list(Xs), n_samples=n_samples)
        return result

    def transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Transform X separately by each transformer, concatenate results.

        Parameters
        ----------
        X : {array-like, dataframe} of shape (n_samples, n_features)
            The data to be transformed by subset.

        **params : dict, default=None
            Parameters to be passed to the underlying transformers' ``transform``
            method.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        X_t : {array-like, sparse matrix} of \
                shape (n_samples, sum_n_components)
            Horizontally stacked results of transformers. sum_n_components is the
            sum of n_components (output dimension) over transformers. If
            any result is a sparse matrix, everything will be converted to
            sparse matrices.
        """
        _raise_for_params(params, self, "transform")
        check_is_fitted(self)
        X = _check_X(X)

        # Strip time column early, consistent with fit_transform
        time_column = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        # If ColumnTransformer is fit using a dataframe, and now a dataframe is
        # passed to be transformed, we select columns by name instead. This
        # enables the user to pass X at transform time with extra columns which
        # were not present in fit time, and the order of the columns doesn't
        # matter.
        fit_dataframe_and_transform_dataframe = hasattr(self, "feature_names_in_") and (
            _is_pandas_df(X_no_time) or hasattr(X_no_time, "__dataframe__")
        )

        n_samples = _num_samples(X_no_time)
        column_names = _get_feature_names(X_no_time)

        if fit_dataframe_and_transform_dataframe:
            named_transformers = self.named_transformers_
            # check that all names seen in fit are in transform, unless
            # they were dropped
            non_dropped_indices = [
                ind
                for name, ind in self._transformer_to_input_indices.items()  # ty: ignore[unresolved-attribute]
                if name in named_transformers and named_transformers[name] != "drop"
            ]

            all_indices = set(chain(*non_dropped_indices))
            all_names = {self.feature_names_in_[ind] for ind in all_indices}

            diff = all_names - set(column_names)
            if diff:
                raise ValueError(f"columns are missing: {diff}")
        else:
            # ndarray was used for fitting or transforming, thus we only
            # check that n_features_in_ is consistent
            self._check_n_features(X_no_time, reset=False)  # ty: ignore[unresolved-attribute]

        routed_params = process_routing(self, "transform", **params)

        Xs = self._call_func_on_transformers(
            X_no_time,
            None,
            _transform_one,
            column_as_labels=fit_dataframe_and_transform_dataframe,
            routed_params=routed_params,
            time_column=time_column,
        )

        if not Xs:
            # All transformers are None
            return time_column

        result = self._hstack(list(Xs), n_samples=n_samples)
        return result

    def observe_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
        """Observe and transform X by each transformer, concatenate results.

        This method atomically observes each column transformer with new data and
        transforms it. The transformation uses the pre-observe state, then updates
        the memory. This is more efficient and correct than calling observe() then
        transform() separately.

        Parameters
        ----------
        X : pl.DataFrame
            New data to observe with and transform.

        **params : dict, default=None
            Parameters routed to the `transform` methods of the transformers.

            You can only pass this if metadata routing is enabled, which you
            can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

        Returns
        -------
        X_t : pl.DataFrame
            Horizontally stacked results of transformers.

        """
        _raise_for_params(params, self, "observe_transform")
        check_is_fitted(self)
        X = _check_X(X)

        # Strip time column early, consistent with fit_transform and transform
        time_column = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        n_samples = _num_samples(X_no_time)

        routed_params = process_routing(self, "observe_transform", **params)

        Xs = self._call_func_on_transformers(
            X_no_time,
            None,
            _observe_transform_one,
            column_as_labels=False,
            routed_params=routed_params,
            time_column=time_column,
        )

        if not Xs:
            # All transformers are None
            return time_column

        # For observe_transform, skip sample count check since transformers handle buffering internally
        result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

        return result

    def rewind_transform(self, X: pl.DataFrame, **params) -> pl.DataFrame:
        """Rewind internal state and transform using only observation horizon rows.

        Discards accumulated observations and rewinds to a clean state using
        the last `observation_horizon` rows for each transformer. This provides
        a stateless transformation that can be used for reproducible results.

        Parameters
        ----------
        X : pl.DataFrame
            Input DataFrame with "time" column. The last `observation_horizon`
            rows of each transformer will be used to initialize state.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Transformed output with "time" column, after rewinding state.

        """
        check_is_fitted(self)
        time_column = X.select(cs.by_name("time"))
        X_no_time = X.select(~cs.by_name("time"))

        n_samples = _num_samples(X_no_time)

        routed_params = process_routing(self, "rewind_transform", **params)

        Xs = self._call_func_on_transformers(
            X_no_time,
            None,
            _rewind_transform_one,
            column_as_labels=False,
            routed_params=routed_params,
            time_column=time_column,
        )

        if not Xs:
            # All transformers are None
            return time_column

        # For rewind_transform, skip sample count check since transformers discard warmup rows
        result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

        return result

    def _hstack(self, Xs: list[pl.DataFrame], *, n_samples: int, check_samples: bool = True) -> pl.DataFrame:
        """Stacks Xs horizontally.

        This allows subclasses to control the stacking behavior, while reusing
        everything else from ColumnTransformer.

        Parameters
        ----------
        Xs : list of {array-like, sparse matrix, dataframe}
            The container to concatenate.

        n_samples : int
            The number of samples in the input data to checking the transformation
            consistency.

        check_samples : bool, default=True
            Whether to check that output samples match expected count.
            Set to False for observe_transform which handles buffering internally.
        """
        # rename before stacking as it avoids to error on temporary duplicated
        # columns
        transformer_names = [
            t[0]
            for t in self._iter(
                fitted=True,
                column_as_labels=False,
                skip_drop=True,
                skip_empty_columns=True,
            )
        ]
        # feature_names_outs is a list of lists - one list per transformer
        feature_names_outs = [[col for col in X.columns if col != "time"] for X in Xs if X.shape[1] != 1]
        # Track the original column counts per transformer for re-grouping after prefixing
        column_counts = [len(cols) for cols in feature_names_outs]

        if self.verbose_feature_names_out:
            # `_add_prefix_for_feature_names_out` returns a flat list of prefixed names
            flat_feature_names = self._add_prefix_for_feature_names_out(
                list(zip(transformer_names, feature_names_outs, strict=False))
            )
            # Convert back to list of lists using the original column counts
            feature_names_outs = []
            idx = 0
            for count in column_counts:
                feature_names_outs.append(flat_feature_names[idx : idx + count])
                idx += count
        else:
            # check for duplicated columns and raise if any
            flat_feature_names = list(chain.from_iterable(feature_names_outs))
            feature_names_count = Counter(flat_feature_names)
            if any(count > 1 for count in feature_names_count.values()):
                duplicated_feature_names = sorted(name for name, count in feature_names_count.items() if count > 1)
                err_msg = (
                    "Duplicated feature names found before concatenating the"
                    " outputs of the transformers:"
                    f" {duplicated_feature_names}.\n"
                )
                for transformer_name, X in zip(transformer_names, Xs, strict=False):
                    if X.shape[1] == 1:
                        continue
                    dup_cols_in_transformer = sorted(set(X.columns).intersection(duplicated_feature_names))
                    if dup_cols_in_transformer:
                        err_msg += (
                            f"Transformer {transformer_name} has conflicting "
                            f"columns names: {dup_cols_in_transformer}.\n"
                        )
                raise ValueError(
                    err_msg + "Either make sure that the transformers named above "
                    "do not generate columns with conflicting names or set "
                    "verbose_feature_names_out=True to automatically "
                    "prefix to the output feature names with the name "
                    "of the transformer to prevent any conflicting "
                    "names."
                )

        output = _hstack(
            Xs,
            column_names=feature_names_outs,
            observation_horizons=self._get_observation_horizons(),
        )
        output_samples = output.shape[0]
        if check_samples and output_samples > n_samples:
            raise ValueError(
                "Concatenating DataFrames from the transformer's output lead to an inconsistent number of samples."
            )

        return output

    def get_metadata_routing(self) -> MetadataRouter:
        """Get metadata routing of this object.

        Please check [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) on how the routing
        mechanism works.

        Returns
        -------
        routing : MetadataRouter
            A `MetadataRouter` encapsulating
            routing information.
        """
        router = MetadataRouter(owner=self)
        # Here we don't care about which columns are used for which
        # transformers, and whether or not a transformer is used at all, which
        # might happen if no columns are selected for that transformer. We
        # request all metadata requested by all transformers.
        transformers = chain(self.transformers, [("remainder", self.remainder, None)])
        for name, step, _ in transformers:
            method_mapping = MethodMapping()
            if hasattr(step, "fit_transform"):
                (
                    method_mapping.add(caller="fit", callee="fit_transform").add(
                        caller="fit_transform", callee="fit_transform"
                    )
                )
            else:
                (
                    method_mapping
                    .add(caller="fit", callee="fit")
                    .add(caller="fit", callee="transform")
                    .add(caller="fit_transform", callee="fit")
                    .add(caller="fit_transform", callee="transform")
                )
            method_mapping.add(caller="transform", callee="transform")
            router.add(method_mapping=method_mapping, **{name: step})

        return router

Methods

observation_horizon property

Maximum observation horizon across all transformers.

Returns
Type Description
int

Maximum observation horizon needed.

Raises
Type Description
NotFittedError

If the column transformer has not been fitted yet.

named_transformers_ property

Access the fitted transformer by name.

Read-only attribute to access any transformer by given name. Keys are transformer names and values are the fitted transformer objects.

Returns
Type Description
Bunch

Dict-like object of fitted transformers keyed by name.

get_params(deep=True)

Get parameters for this estimator.

Parameters
Name Type Description Default
deep bool

If True, will return the parameters for this estimator and contained subobjects that are estimators.

True
Returns
Name Type Description
params dict[str, Any]

Parameter names mapped to their values.

Source Code
Show/Hide source
def get_params(self, deep: bool = True) -> dict[str, Any]:
    """Get parameters for this estimator.

    Parameters
    ----------
    deep : bool, default=True
        If True, will return the parameters for this estimator and
        contained subobjects that are estimators.

    Returns
    -------
    params : dict[str, Any]
        Parameter names mapped to their values.

    """
    return _BaseComposition._get_params(self, attr="transformers", deep=deep)

set_params(**params)

Set the parameters of this estimator.

Parameters
Name Type Description Default
**params dict

Estimator parameters.

{}
Returns
Name Type Description
self ColumnTransformer

ColumnTransformer instance.

Source Code
Show/Hide source
def set_params(self, **params: Any) -> "ColumnTransformer":
    """Set the parameters of this estimator.

    Parameters
    ----------
    **params : dict
        Estimator parameters.

    Returns
    -------
    self : ColumnTransformer
        ColumnTransformer instance.

    """
    _BaseComposition._set_params(self, attr="transformers", **params)
    return self

__sklearn_tags__()

Get estimator tags.

Returns
Type Description
Tags

Estimator tags with yohou-specific attributes.

Source Code
Show/Hide source
def __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with yohou-specific attributes.

    """
    tags = super().__sklearn_tags__()

    # Aggregate tags from contained transformers (static capability check)
    if hasattr(self, "transformers") and self.transformers is not None:
        transformers = [t for _, t, _ in self.transformers if t not in ("drop", "passthrough") and t is not None]

        # Include remainder if it's an estimator
        if hasattr(self, "remainder") and self.remainder not in ("drop", "passthrough", None):
            transformers.append(self.remainder)

        if transformers:
            assert tags.transformer_tags is not None
            assert tags.input_tags is not None
            # Stateful if any transformer is stateful
            tags.transformer_tags.stateful = any(
                t.__sklearn_tags__().transformer_tags.stateful for t in transformers
            )

            # Not invertible: column transformer cannot generally invert
            # since columns may be dropped or reordered
            tags.transformer_tags.invertible = False

            # Aggregate min_value: take the maximum (most restrictive)
            # All transformers receive subsets of the same input
            min_values = [t.__sklearn_tags__().input_tags.min_value for t in transformers]
            non_none_min_values = [v for v in min_values if v is not None]
            tags.input_tags.min_value = max(non_none_min_values) if non_none_min_values else None

    return tags

__getitem__(ind)

Return a sub-transformer or a single transformer.

Parameters
Name Type Description Default
ind int, str, or slice

Index, name, or slice of the transformer to retrieve.

required
Returns
Name Type Description
transformer Any

The transformer or sub-transformer.

Source Code
Show/Hide source
def __getitem__(self, ind: int | str | slice) -> Any:
    """Return a sub-transformer or a single transformer.

    Parameters
    ----------
    ind : int, str, or slice
        Index, name, or slice of the transformer to retrieve.

    Returns
    -------
    transformer : Any
        The transformer or sub-transformer.

    """
    if isinstance(ind, slice):
        if ind.step is not None:
            raise ValueError("ColumnTransformer slicing only supports a step of 1")
        return self.__class__(
            transformers=self.transformers[ind],
            remainder=self.remainder,
            n_jobs=self.n_jobs,
            transformer_weights=self.transformer_weights,
            verbose=self.verbose,
        )
    elif isinstance(ind, int):
        name, trans, _ = self.transformers[ind]
        # If fitted, use named_transformers_, otherwise return from transformers
        if hasattr(self, "named_transformers_"):
            return self.named_transformers_[name]
        return trans
    else:
        # String case - get by name
        if hasattr(self, "named_transformers_"):
            return self.named_transformers_[ind]
        # Not fitted yet, search in transformers list
        for name, trans, _ in self.transformers:
            if name == ind:
                return trans
        raise KeyError(f"Transformer {ind} not found")

get_feature_names_out(input_features=None)

Get output feature names.

Collects output feature names from each fitted sub-transformer, optionally prefixing them with the transformer name when verbose_feature_names_out is True.

Parameters
Name Type Description Default
input_features list[str] | None

Input feature names. If None, uses feature_names_in_ from fit.

None
Returns
Type Description
list of str

Output feature names.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names.

    Collects output feature names from each fitted sub-transformer,
    optionally prefixing them with the transformer name when
    ``verbose_feature_names_out`` is True.

    Parameters
    ----------
    input_features : list[str] | None, default=None
        Input feature names. If None, uses ``feature_names_in_`` from fit.

    Returns
    -------
    list of str
        Output feature names.

    """
    check_is_fitted(self, "transformers_")
    feature_names_out: list[str] = []
    for name, trans, columns in self.transformers_:  # ty: ignore[unresolved-attribute]
        if trans == "drop" or (isinstance(columns, list) and len(columns) == 0):
            continue
        col_list = list(columns) if isinstance(columns, list) else [columns]
        names: list[str] = col_list  # ty: ignore[invalid-assignment]
        if hasattr(trans, "get_feature_names_out"):
            result = trans.get_feature_names_out()
            if result is not None:
                # Sub-transformers may include "time" in their output; strip it.
                filtered = [f for f in result if f != "time"]
                if filtered:
                    names = filtered
        if self.verbose_feature_names_out:
            names = [f"{name}_{f}" for f in names]
        feature_names_out.extend(names)
    return feature_names_out

fit(X, y=None, **params)

Fit all transformers using X.

Parameters
Name Type Description Default
X (array - like, dataframe)

Input data, of which specified subsets are used to fit the transformers.

array-like
y array-like of shape (n_samples,...)

Targets for supervised learning.

None
**params dict

Parameters to be passed to the underlying transformers' fit and transform methods.

You can only pass this if metadata routing is enabled, which you can enable using sklearn.set_config(enable_metadata_routing=True).

None
Returns
Name Type Description
self ColumnTransformer

This estimator.

Source Code
Show/Hide source
def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> "ColumnTransformer":
    """Fit all transformers using X.

    Parameters
    ----------
    X : {array-like, dataframe} of shape (n_samples, n_features)
        Input data, of which specified subsets are used to fit the
        transformers.

    y : array-like of shape (n_samples,...), default=None
        Targets for supervised learning.

    **params : dict, default=None
        Parameters to be passed to the underlying transformers' ``fit`` and
        ``transform`` methods.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    self : ColumnTransformer
        This estimator.
    """
    _raise_for_params(params, self, "fit")
    # we use fit_transform to make sure to set sparse_output_ (for which we
    # need the transformed data) to have consistent output type in predict
    self.fit_transform(X, y=y, **params)
    return self

fit_transform(X, y=None, **params)

Fit all transformers, transform the data and concatenate results.

Parameters
Name Type Description Default
X (array - like, dataframe)

Input data, of which specified subsets are used to fit the transformers.

array-like
y array-like of shape (n_samples,)

Targets for supervised learning.

None
**params dict

Parameters to be passed to the underlying transformers' fit and transform methods.

You can only pass this if metadata routing is enabled, which you can enable using sklearn.set_config(enable_metadata_routing=True).

None
Returns
Name Type Description
X_t {array-like, sparse matrix} of shape (n_samples, sum_n_components)

Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Source Code
Show/Hide source
@_fit_context(
    # estimators in ColumnTransformer.transformers are not validated yet
    prefer_skip_nested_validation=False
)
def fit_transform(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params: Any) -> pl.DataFrame:
    """Fit all transformers, transform the data and concatenate results.

    Parameters
    ----------
    X : {array-like, dataframe} of shape (n_samples, n_features)
        Input data, of which specified subsets are used to fit the
        transformers.

    y : array-like of shape (n_samples,), default=None
        Targets for supervised learning.

    **params : dict, default=None
        Parameters to be passed to the underlying transformers' ``fit`` and
        ``transform`` methods.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    X_t : {array-like, sparse matrix} of \
            shape (n_samples, sum_n_components)
        Horizontally stacked results of transformers. sum_n_components is the
        sum of n_components (output dimension) over transformers. If
        any result is a sparse matrix, everything will be converted to
        sparse matrices.
    """
    _raise_for_params(params, self, "fit_transform")

    X = _check_X(X)

    # Strip time column early - before sklearn validation which stores column indices
    # This ensures all column references are to the non-time columns
    self._time_column_ = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    # Set feature_names_in_ and n_features_in_ on the stripped data
    _check_feature_names(self, X_no_time, reset=True)
    _check_n_features(self, X_no_time, reset=True)
    self._validate_transformers()
    n_samples = _num_samples(X_no_time)

    self._validate_column_callables(X_no_time)
    self._validate_remainder(X_no_time)

    routed_params = process_routing(self, "fit_transform", **params)

    result = self._call_func_on_transformers(
        X_no_time,
        y,
        _fit_transform_one,
        column_as_labels=False,
        routed_params=routed_params,
    )

    if not result:
        self._update_fitted_transformers([])
        # All transformers are None
        return self._time_column_

    Xs, transformers = zip(*result, strict=False)

    self.sparse_output_ = False

    self._update_fitted_transformers(transformers)
    self._record_output_indices(Xs)

    result = self._hstack(list(Xs), n_samples=n_samples)
    return result

transform(X, **params)

Transform X separately by each transformer, concatenate results.

Parameters
Name Type Description Default
X (array - like, dataframe)

The data to be transformed by subset.

array-like
**params dict

Parameters to be passed to the underlying transformers' transform method.

You can only pass this if metadata routing is enabled, which you can enable using sklearn.set_config(enable_metadata_routing=True).

None
Returns
Name Type Description
X_t {array-like, sparse matrix} of shape (n_samples, sum_n_components)

Horizontally stacked results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers. If any result is a sparse matrix, everything will be converted to sparse matrices.

Source Code
Show/Hide source
def transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Transform X separately by each transformer, concatenate results.

    Parameters
    ----------
    X : {array-like, dataframe} of shape (n_samples, n_features)
        The data to be transformed by subset.

    **params : dict, default=None
        Parameters to be passed to the underlying transformers' ``transform``
        method.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    X_t : {array-like, sparse matrix} of \
            shape (n_samples, sum_n_components)
        Horizontally stacked results of transformers. sum_n_components is the
        sum of n_components (output dimension) over transformers. If
        any result is a sparse matrix, everything will be converted to
        sparse matrices.
    """
    _raise_for_params(params, self, "transform")
    check_is_fitted(self)
    X = _check_X(X)

    # Strip time column early, consistent with fit_transform
    time_column = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    # If ColumnTransformer is fit using a dataframe, and now a dataframe is
    # passed to be transformed, we select columns by name instead. This
    # enables the user to pass X at transform time with extra columns which
    # were not present in fit time, and the order of the columns doesn't
    # matter.
    fit_dataframe_and_transform_dataframe = hasattr(self, "feature_names_in_") and (
        _is_pandas_df(X_no_time) or hasattr(X_no_time, "__dataframe__")
    )

    n_samples = _num_samples(X_no_time)
    column_names = _get_feature_names(X_no_time)

    if fit_dataframe_and_transform_dataframe:
        named_transformers = self.named_transformers_
        # check that all names seen in fit are in transform, unless
        # they were dropped
        non_dropped_indices = [
            ind
            for name, ind in self._transformer_to_input_indices.items()  # ty: ignore[unresolved-attribute]
            if name in named_transformers and named_transformers[name] != "drop"
        ]

        all_indices = set(chain(*non_dropped_indices))
        all_names = {self.feature_names_in_[ind] for ind in all_indices}

        diff = all_names - set(column_names)
        if diff:
            raise ValueError(f"columns are missing: {diff}")
    else:
        # ndarray was used for fitting or transforming, thus we only
        # check that n_features_in_ is consistent
        self._check_n_features(X_no_time, reset=False)  # ty: ignore[unresolved-attribute]

    routed_params = process_routing(self, "transform", **params)

    Xs = self._call_func_on_transformers(
        X_no_time,
        None,
        _transform_one,
        column_as_labels=fit_dataframe_and_transform_dataframe,
        routed_params=routed_params,
        time_column=time_column,
    )

    if not Xs:
        # All transformers are None
        return time_column

    result = self._hstack(list(Xs), n_samples=n_samples)
    return result

observe_transform(X, **params)

Observe and transform X by each transformer, concatenate results.

This method atomically observes each column transformer with new data and transforms it. The transformation uses the pre-observe state, then updates the memory. This is more efficient and correct than calling observe() then transform() separately.

Parameters
Name Type Description Default
X DataFrame

New data to observe with and transform.

required
**params dict

Parameters routed to the transform methods of the transformers.

You can only pass this if metadata routing is enabled, which you can enable using sklearn.set_config(enable_metadata_routing=True).

None
Returns
Name Type Description
X_t DataFrame

Horizontally stacked results of transformers.

Source Code
Show/Hide source
def observe_transform(self, X: pl.DataFrame, **params: Any) -> pl.DataFrame:
    """Observe and transform X by each transformer, concatenate results.

    This method atomically observes each column transformer with new data and
    transforms it. The transformation uses the pre-observe state, then updates
    the memory. This is more efficient and correct than calling observe() then
    transform() separately.

    Parameters
    ----------
    X : pl.DataFrame
        New data to observe with and transform.

    **params : dict, default=None
        Parameters routed to the `transform` methods of the transformers.

        You can only pass this if metadata routing is enabled, which you
        can enable using ``sklearn.set_config(enable_metadata_routing=True)``.

    Returns
    -------
    X_t : pl.DataFrame
        Horizontally stacked results of transformers.

    """
    _raise_for_params(params, self, "observe_transform")
    check_is_fitted(self)
    X = _check_X(X)

    # Strip time column early, consistent with fit_transform and transform
    time_column = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    n_samples = _num_samples(X_no_time)

    routed_params = process_routing(self, "observe_transform", **params)

    Xs = self._call_func_on_transformers(
        X_no_time,
        None,
        _observe_transform_one,
        column_as_labels=False,
        routed_params=routed_params,
        time_column=time_column,
    )

    if not Xs:
        # All transformers are None
        return time_column

    # For observe_transform, skip sample count check since transformers handle buffering internally
    result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

    return result

rewind_transform(X, **params)

Rewind internal state and transform using only observation horizon rows.

Discards accumulated observations and rewinds to a clean state using the last observation_horizon rows for each transformer. This provides a stateless transformation that can be used for reproducible results.

Parameters
Name Type Description Default
X DataFrame

Input DataFrame with "time" column. The last observation_horizon rows of each transformer will be used to initialize state.

required
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
DataFrame

Transformed output with "time" column, after rewinding state.

Source Code
Show/Hide source
def rewind_transform(self, X: pl.DataFrame, **params) -> pl.DataFrame:
    """Rewind internal state and transform using only observation horizon rows.

    Discards accumulated observations and rewinds to a clean state using
    the last `observation_horizon` rows for each transformer. This provides
    a stateless transformation that can be used for reproducible results.

    Parameters
    ----------
    X : pl.DataFrame
        Input DataFrame with "time" column. The last `observation_horizon`
        rows of each transformer will be used to initialize state.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Transformed output with "time" column, after rewinding state.

    """
    check_is_fitted(self)
    time_column = X.select(cs.by_name("time"))
    X_no_time = X.select(~cs.by_name("time"))

    n_samples = _num_samples(X_no_time)

    routed_params = process_routing(self, "rewind_transform", **params)

    Xs = self._call_func_on_transformers(
        X_no_time,
        None,
        _rewind_transform_one,
        column_as_labels=False,
        routed_params=routed_params,
        time_column=time_column,
    )

    if not Xs:
        # All transformers are None
        return time_column

    # For rewind_transform, skip sample count check since transformers discard warmup rows
    result = self._hstack(list(Xs), n_samples=n_samples, check_samples=False)

    return result

get_metadata_routing()

Get metadata routing of this object.

Please check Metadata Routing User Guide on how the routing mechanism works.

Returns
Name Type Description
routing MetadataRouter

A MetadataRouter encapsulating routing information.

Source Code
Show/Hide source
def get_metadata_routing(self) -> MetadataRouter:
    """Get metadata routing of this object.

    Please check [Metadata Routing User Guide](https://scikit-learn.org/stable/metadata_routing.html) on how the routing
    mechanism works.

    Returns
    -------
    routing : MetadataRouter
        A `MetadataRouter` encapsulating
        routing information.
    """
    router = MetadataRouter(owner=self)
    # Here we don't care about which columns are used for which
    # transformers, and whether or not a transformer is used at all, which
    # might happen if no columns are selected for that transformer. We
    # request all metadata requested by all transformers.
    transformers = chain(self.transformers, [("remainder", self.remainder, None)])
    for name, step, _ in transformers:
        method_mapping = MethodMapping()
        if hasattr(step, "fit_transform"):
            (
                method_mapping.add(caller="fit", callee="fit_transform").add(
                    caller="fit_transform", callee="fit_transform"
                )
            )
        else:
            (
                method_mapping
                .add(caller="fit", callee="fit")
                .add(caller="fit", callee="transform")
                .add(caller="fit_transform", callee="fit")
                .add(caller="fit_transform", callee="transform")
            )
        method_mapping.add(caller="transform", callee="transform")
        router.add(method_mapping=method_mapping, **{name: step})

    return router

Tutorials

The following example notebooks use this component:

  • How to Use ColumnTransformer


    Data-Features

    Route columns through distinct transformers with ColumnTransformer, including remainder handling and automatic panel-aware column detection.

    View · Open in marimo