Skip to content

VotingClassProbaForecaster

yohou.ensemble.voting_class_proba.VotingClassProbaForecaster

Bases: _BaseEnsembleForecaster, BaseClassProbaForecaster, _BaseComposition

Combines class-probability forecasters via voting.

Aggregates predictions from multiple BaseClassProbaForecaster instances using soft (probability averaging) or hard (majority vote) strategies.

If a base forecaster fails during fit, it is silently skipped with a warning. The ensemble raises only when all base forecasters fail.

Parameters

Name Type Description Default
forecasters list of (name, forecaster) tuples

Named base class-probability forecasters to combine. Each entry is a (name, forecaster) tuple where name is a unique string and forecaster is a BaseClassProbaForecaster instance.

required
method ('soft', 'hard')

Aggregation strategy:

  • "soft": weighted average of class probabilities.
  • "hard": majority vote of argmax predictions. Ties are broken deterministically by choosing the first class in sorted order (via numpy.argmax).
"soft"
weights list of float or None

Per-forecaster weights. Raw values are passed to numpy.average which normalizes internally. Only used with method="soft". Silently ignored with method="hard".

None
n_jobs int or None

Number of parallel jobs for fitting base forecasters. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

None

Attributes

Name Type Description
forecasters_ list of (str, BaseClassProbaForecaster)

Successfully fitted base forecasters.

classes_ dict of str to list of str

Mapping from target column to sorted class labels.

n_classes_ dict of str to int

Number of classes per target column.

label_to_code_ dict of str to dict of str to float

Mapping from target column to label-to-code dict.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.ensemble import VotingClassProbaForecaster
>>> from yohou.class_proba import ClassProbaReductionForecaster
>>> from sklearn.tree import DecisionTreeClassifier
>>>
>>> time = pl.datetime_range(
...     start=datetime(2022, 1, 1), end=datetime(2022, 4, 10), interval="1d", eager=True
... )
>>> categories = ["sunny", "rainy", "cloudy"]
>>> y = pl.DataFrame({
...     "time": time,
...     "weather": [categories[i % 3] for i in range(len(time))],
... })
>>>
>>> forecaster = VotingClassProbaForecaster(
...     forecasters=[
...         (
...             "dt_1",
...             ClassProbaReductionForecaster(
...                 estimator=DecisionTreeClassifier(random_state=42),
...                 reduction_strategy="direct",
...             ),
...         ),
...         (
...             "dt_2",
...             ClassProbaReductionForecaster(
...                 estimator=DecisionTreeClassifier(random_state=123),
...                 reduction_strategy="direct",
...             ),
...         ),
...     ],
...     method="soft",
... )
>>> forecaster.fit(y, forecasting_horizon=3)
VotingClassProbaForecaster(...)
>>> y_pred = forecaster.predict(forecasting_horizon=3)
>>> len(y_pred)
3

See Also

Notes

  • All base forecasters must discover the same classes at fit time. A ValueError is raised if class sets differ.
  • Weights are only used with method="soft"; they are silently ignored with method="hard".

Source Code

Show/Hide source
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
class VotingClassProbaForecaster(_BaseEnsembleForecaster, BaseClassProbaForecaster, _BaseComposition):
    """Combines class-probability forecasters via voting.

    Aggregates predictions from multiple ``BaseClassProbaForecaster``
    instances using soft (probability averaging) or hard (majority vote)
    strategies.

    If a base forecaster fails during ``fit``, it is silently skipped
    with a warning.  The ensemble raises only when all base forecasters
    fail.

    Parameters
    ----------
    forecasters : list of (name, forecaster) tuples
        Named base class-probability forecasters to combine. Each entry
        is a ``(name, forecaster)`` tuple where *name* is a unique string
        and *forecaster* is a `BaseClassProbaForecaster` instance.
    method : {"soft", "hard"}, default="soft"
        Aggregation strategy:

        - ``"soft"``: weighted average of class probabilities.
        - ``"hard"``: majority vote of argmax predictions. Ties are
          broken deterministically by choosing the first class in sorted
          order (via ``numpy.argmax``).
    weights : list of float or None, default=None
        Per-forecaster weights. Raw values are passed to
        ``numpy.average`` which normalizes internally. Only used with
        ``method="soft"``. Silently ignored with ``method="hard"``.
    n_jobs : int or None, default=None
        Number of parallel jobs for fitting base forecasters.
        ``None`` means 1 unless in a ``joblib.parallel_backend`` context.
        ``-1`` means using all processors.

    Attributes
    ----------
    forecasters_ : list of (str, BaseClassProbaForecaster)
        Successfully fitted base forecasters.
    classes_ : dict of str to list of str
        Mapping from target column to sorted class labels.
    n_classes_ : dict of str to int
        Number of classes per target column.
    label_to_code_ : dict of str to dict of str to float
        Mapping from target column to label-to-code dict.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.ensemble import VotingClassProbaForecaster
    >>> from yohou.class_proba import ClassProbaReductionForecaster
    >>> from sklearn.tree import DecisionTreeClassifier
    >>>
    >>> time = pl.datetime_range(
    ...     start=datetime(2022, 1, 1), end=datetime(2022, 4, 10), interval="1d", eager=True
    ... )
    >>> categories = ["sunny", "rainy", "cloudy"]
    >>> y = pl.DataFrame({
    ...     "time": time,
    ...     "weather": [categories[i % 3] for i in range(len(time))],
    ... })
    >>>
    >>> forecaster = VotingClassProbaForecaster(
    ...     forecasters=[
    ...         (
    ...             "dt_1",
    ...             ClassProbaReductionForecaster(
    ...                 estimator=DecisionTreeClassifier(random_state=42),
    ...                 reduction_strategy="direct",
    ...             ),
    ...         ),
    ...         (
    ...             "dt_2",
    ...             ClassProbaReductionForecaster(
    ...                 estimator=DecisionTreeClassifier(random_state=123),
    ...                 reduction_strategy="direct",
    ...             ),
    ...         ),
    ...     ],
    ...     method="soft",
    ... )
    >>> forecaster.fit(y, forecasting_horizon=3)  # doctest: +ELLIPSIS
    VotingClassProbaForecaster(...)
    >>> y_pred = forecaster.predict(forecasting_horizon=3)
    >>> len(y_pred)
    3

    See Also
    --------
    - [`VotingPointForecaster`][yohou.ensemble.voting_point.VotingPointForecaster] : Ensemble for point forecasters.
    - [`VotingIntervalForecaster`][yohou.ensemble.voting_interval.VotingIntervalForecaster] : Ensemble for interval forecasters.
    - [`BaseClassProbaForecaster`][yohou.class_proba.base.BaseClassProbaForecaster] : Base class for class-probability forecasters.

    Notes
    -----
    - All base forecasters must discover the same classes at fit time.
      A ``ValueError`` is raised if class sets differ.
    - Weights are only used with ``method="soft"``; they are silently
      ignored with ``method="hard"``.

    """

    _parameter_constraints: dict = {
        "forecasters": [list],
        "method": [StrOptions({"soft", "hard"})],
        "weights": [list, None],
        "n_jobs": [Integral, None],
    }

    def __init__(
        self,
        forecasters: list[tuple[str, BaseClassProbaForecaster]],
        *,
        method: Literal["soft", "hard"] = "soft",
        weights: list[float] | None = None,
        n_jobs: int | None = None,
    ):
        super().__init__()
        self.forecasters = forecasters
        self.method = method
        self.weights = weights
        self.n_jobs = n_jobs

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with yohou-specific attributes.

        """
        tags = super().__sklearn_tags__()
        assert tags.forecaster_tags is not None

        tags.forecaster_tags.forecaster_type = CLASS_PROBA
        tags.forecaster_tags.tracks_observations = False
        tags.forecaster_tags.supports_panel_data = True

        return tags

    def _validate_classes_consistent(self) -> None:
        """Check that all surviving forecasters discovered the same classes.

        Raises
        ------
        ValueError
            If class sets differ across base forecasters.

        """
        reference_name, reference_forecaster = self.forecasters_[0]
        reference_classes = reference_forecaster.classes_  # ty: ignore[unresolved-attribute]

        for name, forecaster in self.forecasters_[1:]:
            if forecaster.classes_ != reference_classes:  # ty: ignore[unresolved-attribute]
                raise ValueError(
                    f"Forecaster '{name}' discovered classes {forecaster.classes_} "  # ty: ignore[unresolved-attribute]
                    f"but '{reference_name}' discovered {reference_classes}. "
                    f"All base forecasters must discover the same classes."
                )

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt = 1,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> VotingClassProbaForecaster:
        """Fit all base class-probability forecasters.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with ``"time"`` column and categorical
            value columns.
        X_actual : pl.DataFrame or None, default=None
            Actual feature observations with a ``"time"`` column aligned
            with ``y``. Forwarded to each child forecaster.
        forecasting_horizon : int, default=1
            Number of steps ahead to forecast.
        X_future : pl.DataFrame or None, default=None
            Known future features with ``"time"`` column.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"`` columns.
        **params : dict
            Metadata routing parameters.

        Returns
        -------
        self
            Fitted ensemble.

        Raises
        ------
        ValueError
            If ``weights`` length does not match the number of
            forecasters, or if base forecasters discover different
            classes.
        RuntimeError
            If all base forecasters fail during fitting.

        """
        _raise_for_params(params, self, "fit")
        routed_params = process_routing(self, "fit", **params)

        if forecasting_horizon < 1:
            raise ValueError(f"forecasting_horizon must be >= 1, got {forecasting_horizon}")

        self._validate_forecasters_list()

        if self.weights is not None and len(self.weights) != len(self.forecasters):
            raise ValueError(
                f"Number of weights ({len(self.weights)}) must match number of forecasters ({len(self.forecasters)})"
            )

        self.forecasters_ = self._fit_forecasters_parallel(
            y=y,
            X_actual=X_actual,
            forecasting_horizon=forecasting_horizon,
            routed_params=routed_params,
            n_jobs=self.n_jobs,
            X_future=X_future,
            X_forecast=X_forecast,
        )

        self._validate_classes_consistent()

        # Derive fitted attributes from first surviving forecaster
        _first_name, first_forecaster = self.forecasters_[0]
        self._derive_fitted_attributes(first_forecaster, forecasting_horizon, y, X_actual)

        self.classes_ = dict(first_forecaster.classes_)  # ty: ignore[unresolved-attribute]
        self.n_classes_ = dict(first_forecaster.n_classes_)  # ty: ignore[unresolved-attribute]
        self.label_to_code_ = dict(first_forecaster.label_to_code_)  # ty: ignore[unresolved-attribute]

        # Compute effective weights for surviving forecasters
        self._compute_effective_weights()

        return self

    def _predict_class_proba_one(
        self,
        groups: list[str],
        **params,
    ) -> pl.DataFrame:
        """Produce aggregated probability forecasts for one step.

        Parameters
        ----------
        groups : list of str
            Panel group names to predict for.
        **params : dict
            Metadata routing parameters.

        Returns
        -------
        pl.DataFrame
            Aggregated probability predictions.

        """
        predictions = []
        for _name, forecaster in self.forecasters_:
            y_proba = forecaster.predict_class_proba(  # ty: ignore[unresolved-attribute]
                groups=groups,
                **params,
            )
            predictions.append(y_proba)

        time_df = predictions[0].select(["vintage_time", "time"])
        proba_cols = [c for c in predictions[0].columns if c not in ("vintage_time", "time")]

        if self.method == "soft":
            agg_exprs = []
            for col in proba_cols:
                values = np.column_stack([pred[col].to_numpy() for pred in predictions])
                if self.weights_ is not None:
                    aggregated = np.average(values, axis=1, weights=self.weights_)
                else:
                    aggregated = np.mean(values, axis=1)
                agg_exprs.append(pl.Series(name=col, values=aggregated))
            return time_df.with_columns(agg_exprs)

        # Hard voting: majority vote converted to one-hot probabilities
        # Collect argmax predictions from each forecaster
        hard_predictions = []
        for _name, forecaster in self.forecasters_:
            y_pred = forecaster.predict(  # ty: ignore[unresolved-attribute]
                groups=groups,
                **params,
            )
            hard_predictions.append(y_pred)

        target_cols = [c for c in hard_predictions[0].columns if c not in ("vintage_time", "time")]
        result = time_df.clone()
        for target_col in target_cols:
            class_labels = self.classes_[target_col]
            n_rows = len(hard_predictions[0])
            winners = []
            for row_idx in range(n_rows):
                votes = [pred[target_col][row_idx] for pred in hard_predictions]
                vote_counts = Counter(votes)
                max_count = max(vote_counts.values())
                candidates = sorted(label for label, count in vote_counts.items() if count == max_count)
                winners.append(candidates[0])
            for label in class_labels:
                col_name = f"{target_col}_proba_{label}"
                proba_values = [1.0 if w == label else 0.0 for w in winners]
                result = result.with_columns(pl.Series(name=col_name, values=proba_values))

        return result

    def predict_class_proba(  # ty: ignore[invalid-method-override]
        self,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Generate aggregated class-probability forecasts.

        Parameters
        ----------
        forecasting_horizon : int or None, default=None
            Number of steps ahead. If ``None``, uses value from ``fit``.
        groups : list of str or None, default=None
            Panel group prefixes to predict.
        X_future : pl.DataFrame or None, default=None
            Known future features override. Re-derives step columns
            without mutating forecaster state.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override with ``"vintage_time"`` and
            ``"time"`` columns. Re-derives step columns without mutating
            forecaster state.
        **params : dict
            Metadata routing parameters.

        Returns
        -------
        pl.DataFrame
            Probability predictions with ``"vintage_time"``,
            ``"time"``, and ``{target}_proba_{class}`` columns.

        """
        check_is_fitted(self, ["forecasters_", "classes_"])

        if self.method == "soft":
            return self._soft_vote_predict_class_proba(
                forecasting_horizon=forecasting_horizon,
                groups=groups,
                X_future=X_future,
                X_forecast=X_forecast,
                **params,
            )
        return self._hard_vote_predict_class_proba(
            forecasting_horizon=forecasting_horizon,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
            **params,
        )

    def _soft_vote_predict_class_proba(
        self,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Soft vote: weighted average of class probabilities.

        Parameters
        ----------
        forecasting_horizon : int or None, default=None
            Forecasting horizon.
        groups : list of str or None, default=None
            Panel group prefixes.
        X_future : pl.DataFrame or None, default=None
            Known future features override.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override.
        **params : dict
            Routing parameters.

        Returns
        -------
        pl.DataFrame
            Averaged probability predictions.

        """
        predictions = []
        for _name, forecaster in self.forecasters_:
            y_proba = forecaster.predict_class_proba(  # ty: ignore[unresolved-attribute]
                forecasting_horizon=forecasting_horizon,
                groups=groups,
                X_future=X_future,
                X_forecast=X_forecast,
                **params,
            )
            predictions.append(y_proba)

        time_df = predictions[0].select(["vintage_time", "time"])
        proba_cols = [c for c in predictions[0].columns if c not in ("vintage_time", "time")]

        agg_exprs = []
        for col in proba_cols:
            values = np.column_stack([pred[col].to_numpy() for pred in predictions])

            if self.weights_ is not None:
                aggregated = np.average(values, axis=1, weights=self.weights_)
            else:
                aggregated = np.mean(values, axis=1)

            agg_exprs.append(pl.Series(name=col, values=aggregated))

        return time_df.with_columns(agg_exprs)

    def _hard_vote_predict_class_proba(
        self,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Hard vote: majority vote converted to one-hot probabilities.

        Parameters
        ----------
        forecasting_horizon : int or None, default=None
            Forecasting horizon.
        groups : list of str or None, default=None
            Panel group prefixes.
        X_future : pl.DataFrame or None, default=None
            Known future features override.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override.
        **params : dict
            Routing parameters.

        Returns
        -------
        pl.DataFrame
            One-hot probability predictions from majority vote.

        """
        predictions = []
        for _name, forecaster in self.forecasters_:
            y_pred = forecaster.predict(  # ty: ignore[unresolved-attribute]
                forecasting_horizon=forecasting_horizon,
                groups=groups,
                X_future=X_future,
                X_forecast=X_forecast,
                **params,
            )
            predictions.append(y_pred)

        time_df = predictions[0].select(["vintage_time", "time"])
        target_cols = [c for c in predictions[0].columns if c not in ("vintage_time", "time")]

        result = time_df.clone()
        for target_col in target_cols:
            class_labels = self.classes_[target_col]
            n_rows = len(predictions[0])

            winners = []
            for row_idx in range(n_rows):
                votes = [pred[target_col][row_idx] for pred in predictions]
                vote_counts = Counter(votes)
                max_count = max(vote_counts.values())
                candidates = sorted(label for label, count in vote_counts.items() if count == max_count)
                winners.append(candidates[0])

            for label in class_labels:
                col_name = f"{target_col}_proba_{label}"
                proba_values = [1.0 if w == label else 0.0 for w in winners]
                result = result.with_columns(pl.Series(name=col_name, values=proba_values))

        return result

    def predict(  # ty: ignore[invalid-method-override]
        self,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Generate argmax class predictions from the ensemble.

        Parameters
        ----------
        forecasting_horizon : int or None, default=None
            Number of steps ahead. If ``None``, uses value from ``fit``.
        groups : list of str or None, default=None
            Panel group prefixes.
        X_future : pl.DataFrame or None, default=None
            Known future features override. Re-derives step columns
            without mutating forecaster state.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override with ``"vintage_time"`` and
            ``"time"`` columns. Re-derives step columns without mutating
            forecaster state.
        **params : dict
            Metadata routing parameters.

        Returns
        -------
        pl.DataFrame
            Predictions with ``"vintage_time"``, ``"time"``, and one
            column per target with the most likely class label.

        """
        check_is_fitted(self, ["forecasters_", "classes_"])

        if self.method == "hard":
            return self._hard_vote_predict(
                forecasting_horizon=forecasting_horizon,
                groups=groups,
                X_future=X_future,
                X_forecast=X_forecast,
                **params,
            )

        y_proba = self.predict_class_proba(
            forecasting_horizon=forecasting_horizon,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
            **params,
        )
        return self._ensemble_argmax_from_proba(y_proba)

    def _ensemble_argmax_from_proba(self, y_proba: pl.DataFrame) -> pl.DataFrame:
        """Convert probability DataFrame to argmax class DataFrame.

        Panel-aware version that handles both panel-prefixed and plain
        proba column names.

        Parameters
        ----------
        y_proba : pl.DataFrame
            Probability predictions.

        Returns
        -------
        pl.DataFrame
            DataFrame with argmax class labels.

        """
        time_cols = [c for c in ("vintage_time", "time") if c in y_proba.columns]
        result = y_proba.select(time_cols)

        groups = self.groups_ or [None]

        for group in groups:
            for target_col, class_labels in self.classes_.items():
                if group is not None:
                    proba_cols = [f"{group}__{target_col}_proba_{label}" for label in class_labels]
                    out_col = f"{group}__{target_col}"
                else:
                    proba_cols = [f"{target_col}_proba_{label}" for label in class_labels]
                    out_col = target_col

                argmax_series = y_proba.select(pl.concat_list(proba_cols).list.arg_max().alias("_idx"))["_idx"]
                label_series = pl.Series(values=class_labels)
                result = result.with_columns(
                    argmax_series.map_elements(
                        lambda idx, _labels=label_series: _labels[idx],
                        return_dtype=pl.String,
                    ).alias(out_col),
                )

        return result

    def _hard_vote_predict(
        self,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Hard vote: majority vote of argmax predictions.

        Parameters
        ----------
        forecasting_horizon : int or None, default=None
            Forecasting horizon.
        groups : list of str or None, default=None
            Panel group prefixes.
        X_future : pl.DataFrame or None, default=None
            Known future features override.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override.
        **params : dict
            Routing parameters.

        Returns
        -------
        pl.DataFrame
            Majority vote predictions.

        """
        predictions = []
        for _name, forecaster in self.forecasters_:
            y_pred = forecaster.predict(  # ty: ignore[unresolved-attribute]
                forecasting_horizon=forecasting_horizon,
                groups=groups,
                X_future=X_future,
                X_forecast=X_forecast,
                **params,
            )
            predictions.append(y_pred)

        time_df = predictions[0].select(["vintage_time", "time"])
        target_cols = [c for c in predictions[0].columns if c not in ("vintage_time", "time")]

        result = time_df.clone()
        for target_col in target_cols:
            n_rows = len(predictions[0])
            winners = []
            for row_idx in range(n_rows):
                votes = [pred[target_col][row_idx] for pred in predictions]
                vote_counts = Counter(votes)
                max_count = max(vote_counts.values())
                candidates = sorted(label for label, count in vote_counts.items() if count == max_count)
                winners.append(candidates[0])

            result = result.with_columns(pl.Series(name=target_col, values=winners))

        return result

    def get_metadata_routing(self) -> MetadataRouter:
        """Get metadata routing configuration.

        Returns
        -------
        MetadataRouter
            Router with mappings for all base forecasters.

        """
        router = MetadataRouter(owner=self.__class__.__name__)

        for name, forecaster in self.forecasters:
            router.add(
                **{name: forecaster},
                method_mapping=MethodMapping()
                .add(caller="fit", callee="fit")
                .add(caller="predict", callee="predict")
                .add(caller="predict_class_proba", callee="predict_class_proba"),
            )

        return router

Methods

__sklearn_tags__()

Get estimator tags.

Returns
Type Description
Tags

Estimator tags with yohou-specific attributes.

Source Code
Show/Hide source
def __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with yohou-specific attributes.

    """
    tags = super().__sklearn_tags__()
    assert tags.forecaster_tags is not None

    tags.forecaster_tags.forecaster_type = CLASS_PROBA
    tags.forecaster_tags.tracks_observations = False
    tags.forecaster_tags.supports_panel_data = True

    return tags

fit(y, X_actual=None, forecasting_horizon=1, X_future=None, X_forecast=None, **params)

Fit all base class-probability forecasters.

Parameters
Name Type Description Default
y DataFrame

Target time series with "time" column and categorical value columns.

required
X_actual DataFrame or None

Actual feature observations with a "time" column aligned with y. Forwarded to each child forecaster.

None
forecasting_horizon int

Number of steps ahead to forecast.

1
X_future DataFrame or None

Known future features with "time" column.

None
X_forecast DataFrame or None

External forecasts with "vintage_time" and "time" columns.

None
**params dict

Metadata routing parameters.

{}
Returns
Type Description
self

Fitted ensemble.

Raises
Type Description
ValueError

If weights length does not match the number of forecasters, or if base forecasters discover different classes.

RuntimeError

If all base forecasters fail during fitting.

Source Code
Show/Hide source
@_fit_context(prefer_skip_nested_validation=True)
def fit(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt = 1,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> VotingClassProbaForecaster:
    """Fit all base class-probability forecasters.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with ``"time"`` column and categorical
        value columns.
    X_actual : pl.DataFrame or None, default=None
        Actual feature observations with a ``"time"`` column aligned
        with ``y``. Forwarded to each child forecaster.
    forecasting_horizon : int, default=1
        Number of steps ahead to forecast.
    X_future : pl.DataFrame or None, default=None
        Known future features with ``"time"`` column.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"`` columns.
    **params : dict
        Metadata routing parameters.

    Returns
    -------
    self
        Fitted ensemble.

    Raises
    ------
    ValueError
        If ``weights`` length does not match the number of
        forecasters, or if base forecasters discover different
        classes.
    RuntimeError
        If all base forecasters fail during fitting.

    """
    _raise_for_params(params, self, "fit")
    routed_params = process_routing(self, "fit", **params)

    if forecasting_horizon < 1:
        raise ValueError(f"forecasting_horizon must be >= 1, got {forecasting_horizon}")

    self._validate_forecasters_list()

    if self.weights is not None and len(self.weights) != len(self.forecasters):
        raise ValueError(
            f"Number of weights ({len(self.weights)}) must match number of forecasters ({len(self.forecasters)})"
        )

    self.forecasters_ = self._fit_forecasters_parallel(
        y=y,
        X_actual=X_actual,
        forecasting_horizon=forecasting_horizon,
        routed_params=routed_params,
        n_jobs=self.n_jobs,
        X_future=X_future,
        X_forecast=X_forecast,
    )

    self._validate_classes_consistent()

    # Derive fitted attributes from first surviving forecaster
    _first_name, first_forecaster = self.forecasters_[0]
    self._derive_fitted_attributes(first_forecaster, forecasting_horizon, y, X_actual)

    self.classes_ = dict(first_forecaster.classes_)  # ty: ignore[unresolved-attribute]
    self.n_classes_ = dict(first_forecaster.n_classes_)  # ty: ignore[unresolved-attribute]
    self.label_to_code_ = dict(first_forecaster.label_to_code_)  # ty: ignore[unresolved-attribute]

    # Compute effective weights for surviving forecasters
    self._compute_effective_weights()

    return self

predict_class_proba(forecasting_horizon=None, groups=None, X_future=None, X_forecast=None, **params)

Generate aggregated class-probability forecasts.

Parameters
Name Type Description Default
forecasting_horizon int or None

Number of steps ahead. If None, uses value from fit.

None
groups list of str or None

Panel group prefixes to predict.

None
X_future DataFrame or None

Known future features override. Re-derives step columns without mutating forecaster state.

None
X_forecast DataFrame or None

External forecast override with "vintage_time" and "time" columns. Re-derives step columns without mutating forecaster state.

None
**params dict

Metadata routing parameters.

{}
Returns
Type Description
DataFrame

Probability predictions with "vintage_time", "time", and {target}_proba_{class} columns.

Source Code
Show/Hide source
def predict_class_proba(  # ty: ignore[invalid-method-override]
    self,
    forecasting_horizon: StrictInt | None = None,
    groups: list[str] | None = None,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> pl.DataFrame:
    """Generate aggregated class-probability forecasts.

    Parameters
    ----------
    forecasting_horizon : int or None, default=None
        Number of steps ahead. If ``None``, uses value from ``fit``.
    groups : list of str or None, default=None
        Panel group prefixes to predict.
    X_future : pl.DataFrame or None, default=None
        Known future features override. Re-derives step columns
        without mutating forecaster state.
    X_forecast : pl.DataFrame or None, default=None
        External forecast override with ``"vintage_time"`` and
        ``"time"`` columns. Re-derives step columns without mutating
        forecaster state.
    **params : dict
        Metadata routing parameters.

    Returns
    -------
    pl.DataFrame
        Probability predictions with ``"vintage_time"``,
        ``"time"``, and ``{target}_proba_{class}`` columns.

    """
    check_is_fitted(self, ["forecasters_", "classes_"])

    if self.method == "soft":
        return self._soft_vote_predict_class_proba(
            forecasting_horizon=forecasting_horizon,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
            **params,
        )
    return self._hard_vote_predict_class_proba(
        forecasting_horizon=forecasting_horizon,
        groups=groups,
        X_future=X_future,
        X_forecast=X_forecast,
        **params,
    )

predict(forecasting_horizon=None, groups=None, X_future=None, X_forecast=None, **params)

Generate argmax class predictions from the ensemble.

Parameters
Name Type Description Default
forecasting_horizon int or None

Number of steps ahead. If None, uses value from fit.

None
groups list of str or None

Panel group prefixes.

None
X_future DataFrame or None

Known future features override. Re-derives step columns without mutating forecaster state.

None
X_forecast DataFrame or None

External forecast override with "vintage_time" and "time" columns. Re-derives step columns without mutating forecaster state.

None
**params dict

Metadata routing parameters.

{}
Returns
Type Description
DataFrame

Predictions with "vintage_time", "time", and one column per target with the most likely class label.

Source Code
Show/Hide source
def predict(  # ty: ignore[invalid-method-override]
    self,
    forecasting_horizon: StrictInt | None = None,
    groups: list[str] | None = None,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> pl.DataFrame:
    """Generate argmax class predictions from the ensemble.

    Parameters
    ----------
    forecasting_horizon : int or None, default=None
        Number of steps ahead. If ``None``, uses value from ``fit``.
    groups : list of str or None, default=None
        Panel group prefixes.
    X_future : pl.DataFrame or None, default=None
        Known future features override. Re-derives step columns
        without mutating forecaster state.
    X_forecast : pl.DataFrame or None, default=None
        External forecast override with ``"vintage_time"`` and
        ``"time"`` columns. Re-derives step columns without mutating
        forecaster state.
    **params : dict
        Metadata routing parameters.

    Returns
    -------
    pl.DataFrame
        Predictions with ``"vintage_time"``, ``"time"``, and one
        column per target with the most likely class label.

    """
    check_is_fitted(self, ["forecasters_", "classes_"])

    if self.method == "hard":
        return self._hard_vote_predict(
            forecasting_horizon=forecasting_horizon,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
            **params,
        )

    y_proba = self.predict_class_proba(
        forecasting_horizon=forecasting_horizon,
        groups=groups,
        X_future=X_future,
        X_forecast=X_forecast,
        **params,
    )
    return self._ensemble_argmax_from_proba(y_proba)

get_metadata_routing()

Get metadata routing configuration.

Returns
Type Description
MetadataRouter

Router with mappings for all base forecasters.

Source Code
Show/Hide source
def get_metadata_routing(self) -> MetadataRouter:
    """Get metadata routing configuration.

    Returns
    -------
    MetadataRouter
        Router with mappings for all base forecasters.

    """
    router = MetadataRouter(owner=self.__class__.__name__)

    for name, forecaster in self.forecasters:
        router.add(
            **{name: forecaster},
            method_mapping=MethodMapping()
            .add(caller="fit", callee="fit")
            .add(caller="predict", callee="predict")
            .add(caller="predict_class_proba", callee="predict_class_proba"),
        )

    return router

Tutorials

The following example notebooks use this component:

  • How to Combine Classification Forecasters


    Forecasting-Models

    Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

    View · Open in marimo