Skip to content

BaseClassProbaForecaster

yohou.class_proba.base.BaseClassProbaForecaster

Bases: BaseForecaster

Base class for class-probability forecasters.

Class-probability forecasters produce per-class probability distributions for categorical time series at each forecast step. The primary output method is predict_class_proba; predict returns the argmax class.

Parameters

Name Type Description Default
target_transformer instance of `BaseTransformer` or None

Transformer used to transform the target time series into the new target.

None
feature_transformer instance of `BaseTransformer` or None

Transformer used to transform the target time series into features.

None
target_as_feature (transformed, raw)

Controls whether the target is included as a feature. "transformed" includes the transformed target, "raw" includes the raw target, and None uses only exogenous features.

"transformed"
panel_strategy ('global', multivariate)

How to handle panel data. See BaseForecaster for details.

"global"

Notes

Subclasses must implement _predict_class_proba_one to produce probability forecasts for a single forecast step. The forecaster_type tag is set to CLASS_PROBA.

See Also

Source Code

Show/Hide source
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
class BaseClassProbaForecaster(BaseForecaster, metaclass=abc.ABCMeta):
    """Base class for class-probability forecasters.

    Class-probability forecasters produce per-class probability distributions
    for categorical time series at each forecast step. The primary output
    method is ``predict_class_proba``; ``predict`` returns the argmax class.

    Parameters
    ----------
    target_transformer : instance of `BaseTransformer` or None, default=None
        Transformer used to transform the target time series into the new target.
    feature_transformer : instance of `BaseTransformer` or None, default=None
        Transformer used to transform the target time series into features.
    target_as_feature : {"transformed", "raw"} or None, default="transformed"
        Controls whether the target is included as a feature.
        ``"transformed"`` includes the transformed target, ``"raw"``
        includes the raw target, and ``None`` uses only exogenous features.
    panel_strategy : {"global", "multivariate"}, default="global"
        How to handle panel data. See `BaseForecaster` for details.

    Notes
    -----
    Subclasses must implement ``_predict_class_proba_one`` to produce
    probability forecasts for a single forecast step. The ``forecaster_type``
    tag is set to ``CLASS_PROBA``.

    See Also
    --------
    - [`ClassProbaReductionForecaster`][yohou.class_proba.reduction.ClassProbaReductionForecaster] : ML-based class-probability forecaster.
    - [`BasePointForecaster`][yohou.point.base.BasePointForecaster] : Base class for point forecasters.

    """

    classes_: dict[str, list[str]]
    n_classes_: dict[str, int]
    label_to_code_: dict[str, dict[str, float]]

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with yohou-specific attributes.

        """
        tags = super().__sklearn_tags__()
        assert tags.forecaster_tags is not None
        tags.forecaster_tags.forecaster_type = CLASS_PROBA
        return tags

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt = 1,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> "BaseClassProbaForecaster":
        """Fit the forecaster to historical data.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with a ``"time"`` column (datetime) and one
            or more categorical value columns.
        X_actual : pl.DataFrame or None, default=None
            Actual feature observations with a ``"time"`` column aligned
            with ``y``. Processed by the feature transformer to produce
            lags, rolling statistics, and other derived features. If
            ``None``, only target-derived features are used.
        forecasting_horizon : int, default=1
            Number of time steps to forecast into the future.
        X_future : pl.DataFrame or None, default=None
            Known future features with a ``"time"`` column. Deterministic
            values available for past and future dates. Bypasses the
            feature transformer.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"``
            columns. Bypasses the feature transformer.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        self
            The fitted forecaster instance.

        Raises
        ------
        ValueError
            If ``forecasting_horizon`` < 1, or if ``y`` / ``X_actual`` have invalid
            structure (e.g., missing ``"time"`` column).

        """
        forecasting_horizon = self._validate_fit_params(forecasting_horizon)

        y_t, X_t = self._pre_fit(
            y=y,
            X_actual=X_actual,
            forecasting_horizon=forecasting_horizon,
            X_future=X_future,
            X_forecast=X_forecast,
        )

        self._fit(y_t, X_t, forecasting_horizon)

        return self

    def _validate_predict_params(self, forecasting_horizon: StrictInt | None) -> StrictInt:
        """Validate and return predict parameters.

        Parameters
        ----------
        forecasting_horizon : int or None
            Forecasting horizon to validate. If None, uses fit_forecasting_horizon_.

        Returns
        -------
        int
            Validated forecasting horizon.

        Raises
        ------
        ValueError
            If forecasting_horizon < 1.

        """
        if forecasting_horizon is None:
            forecasting_horizon = self.fit_forecasting_horizon_
        return self._validate_fit_params(forecasting_horizon)

    @abc.abstractmethod
    def _predict_class_proba_one(
        self,
        groups: list[str],
        **params,
    ) -> pl.DataFrame:
        """Produce probability forecasts for one fit-horizon block.

        Must be implemented by subclasses. Returns a DataFrame where
        each target column is expanded into ``n_classes`` columns named
        ``{target}_proba_{class_label}``.

        Parameters
        ----------
        groups : list of str
            Panel group names to predict for.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Probability predictions with ``"vintage_time"``, ``"time"``,
            and columns ``{target}_proba_{class_label}`` for each class.

        """

    def predict_class_proba(
        self,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        **params,
    ) -> pl.DataFrame:
        """Generate class-probability forecasts.

        Parameters
        ----------
        X_future : pl.DataFrame or None, default=None
            Known future features override. Re-derives step columns
            without mutating forecaster state.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override with ``"vintage_time"`` and
            ``"time"`` columns. Re-derives step columns without mutating
            forecaster state.
        forecasting_horizon : int or None, default=None
            Number of time steps to forecast into the future. If ``None``,
            uses the horizon specified at fit time.
        groups : list of str or None, default=None
            Panel group prefixes to operate on. If ``None``, all groups
            are used. Ignored when the forecaster was not fitted on panel
            data.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Probability predictions with ``"vintage_time"``, ``"time"``,
            and columns ``{target}_proba_{class_label}`` for each class.

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If the forecaster has not been fitted yet.
        ValueError
            If ``groups`` contains names not seen during fit.

        """
        check_is_fitted(
            self,
            ["local_y_schema_", "local_X_actual_schema_", "shared_X_actual_schema_", "groups_"],
        )

        _, _, groups = validate_forecaster_data(
            self,
            y=None,
            X_actual=None,
            reset=False,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
        )

        forecasting_horizon = self._validate_predict_params(forecasting_horizon)

        def step_fn(forecaster, groups):
            """Produce one class-probability prediction block."""
            y_pred_step = forecaster._predict_class_proba_one(
                groups=groups,
                **params,
            )
            return y_pred_step, y_pred_step

        def derive_observation_fn(forecaster, y_pred_step):
            """Derive observation via argmax and re-encoding."""
            y_obs = self._argmax_from_proba(y_pred_step)
            y_obs = self._encode_observation(y_obs)
            return y_obs

        def predict_fn():
            return self._recursive_predict(
                forecasting_horizon=forecasting_horizon,
                groups=groups,
                step_fn=step_fn,
                derive_observation_fn=derive_observation_fn,
            )

        return self._predict_with_step_override(
            X_future=X_future,
            X_forecast=X_forecast,
            predict_fn=predict_fn,
        )

    def predict(
        self,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        **params,
    ) -> pl.DataFrame:
        """Generate argmax class forecasts from class probabilities.

        Convenience method that calls ``predict_class_proba`` and returns
        the most-likely class for each time step and target column.

        Parameters
        ----------
        X_future : pl.DataFrame or None, default=None
            Known future features override.
        X_forecast : pl.DataFrame or None, default=None
            External forecast override.
        forecasting_horizon : int or None, default=None
            Number of time steps to forecast into the future. If ``None``,
            uses the horizon specified at fit time.
        groups : list of str or None, default=None
            Panel group prefixes to operate on. If ``None``, all groups
            are used. Ignored when the forecaster was not fitted on panel
            data.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Point predictions with ``"vintage_time"``, ``"time"``, and one
            column per target variable containing the most-likely class.

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If the forecaster has not been fitted yet.

        """
        y_proba = self.predict_class_proba(
            X_future=X_future,
            X_forecast=X_forecast,
            forecasting_horizon=forecasting_horizon,
            groups=groups,
            **params,
        )
        return self._argmax_from_proba(y_proba)

    def _argmax_from_proba(self, y_proba: pl.DataFrame) -> pl.DataFrame:
        """Convert probability DataFrame to argmax class DataFrame.

        Takes the probability output (columns named ``{target}_proba_{class}``)
        and returns the class with highest probability for each target
        and time step.

        Parameters
        ----------
        y_proba : pl.DataFrame
            Probability predictions from ``predict_class_proba``.

        Returns
        -------
        pl.DataFrame
            DataFrame with ``"time"``, and one column per original target
            containing the class label with highest probability.

        """
        check_is_fitted(self, ["classes_"])

        time_cols = [c for c in ("vintage_time", "time") if c in y_proba.columns]
        result = y_proba.select(time_cols)

        for target_col, class_labels in self.classes_.items():
            proba_cols = [f"{target_col}_proba_{label}" for label in class_labels]
            # For each row, find the index of the max probability column
            # then map that index to the class label.
            argmax_series = y_proba.select(pl.concat_list(proba_cols).list.arg_max().alias("_idx"))["_idx"]
            label_series = pl.Series(values=class_labels)
            result = result.with_columns(
                argmax_series.map_elements(
                    lambda idx, _labels=label_series: _labels[idx],
                    return_dtype=pl.String,
                ).alias(target_col),
            )

        return result

    def _encode_observation(self, y_obs: pl.DataFrame) -> pl.DataFrame:
        """Encode argmax string labels back to float codes for observation.

        Used during recursive prediction to convert argmax class labels
        back to the integer-coded format expected by ``observe()``.

        Parameters
        ----------
        y_obs : pl.DataFrame
            Observation with string class labels.

        Returns
        -------
        pl.DataFrame
            Observation with float-coded class labels matching the fit schema.

        """
        check_is_fitted(self, ["label_to_code_"])

        exprs = []
        for col in y_obs.columns:
            if col in ("vintage_time", "time"):
                continue
            mapping = self.label_to_code_[col]
            exprs.append(pl.col(col).cast(pl.String).replace_strict(mapping, return_dtype=pl.Float64).alias(col))
        return y_obs.with_columns(exprs)

    def _encode_y_input(self, y: pl.DataFrame) -> pl.DataFrame:
        """Encode user-facing categorical y to float codes for internal use.

        Handles both panel (``{group}__{target}``) and non-panel column
        names by looking up the base target name in ``label_to_code_``.

        Parameters
        ----------
        y : pl.DataFrame
            Target data with string or already-encoded columns.

        Returns
        -------
        pl.DataFrame
            Target data with float-coded columns matching ``local_y_schema_``.

        """
        check_is_fitted(self, ["label_to_code_"])

        exprs = []
        for col in y.columns:
            if col == "time":
                continue
            # Skip columns that are already numeric (already encoded)
            if y[col].dtype.is_numeric():
                continue
            # For panel columns like "group_0__weather", extract "weather"
            base_col = col.split("__")[-1] if "__" in col else col
            if base_col in self.label_to_code_:
                mapping = self.label_to_code_[base_col]
                exprs.append(pl.col(col).replace_strict(mapping, return_dtype=pl.Float64).alias(col))
        if exprs:
            return y.with_columns(exprs)
        return y

    def observe(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
    ) -> "BaseClassProbaForecaster":
        """Observe new data, encoding categorical targets before validation.

        Overrides ``BaseForecaster.observe`` to encode string target columns
        to float codes before schema validation.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with a ``"time"`` column (datetime) and one
            or more categorical value columns.
        X_actual : pl.DataFrame or None, default=None
            New actual feature observations with a ``"time"`` column
            aligned with ``y``. Passed through the feature transformer
            to update the internal observation state.
        groups : list of str or None, default=None
            Panel group prefixes to operate on. If ``None``, all groups
            are used.
        X_future : pl.DataFrame or None, default=None
            Known future features with a ``"time"`` column.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"``
            columns.

        Returns
        -------
        self
            The forecaster with updated observation buffers.

        """
        y = self._encode_y_input(y)
        return super().observe(y, X_actual, groups=groups, X_future=X_future, X_forecast=X_forecast)  # ty: ignore[invalid-return-type]

    def rewind(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        groups: list[str] | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
    ) -> "BaseClassProbaForecaster":
        """Rewind memory, encoding categorical targets before validation.

        Overrides ``BaseForecaster.rewind`` to encode string target columns
        to float codes before schema validation.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with a ``"time"`` column (datetime) and one
            or more categorical value columns.
        X_actual : pl.DataFrame or None, default=None
            Actual feature observations to restore the observation
            state to. Must align with ``y``.
        groups : list of str or None, default=None
            Panel group prefixes to operate on. If ``None``, all groups
            are used.
        X_future : pl.DataFrame or None, default=None
            Known future features with a ``"time"`` column.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"``
            columns.

        Returns
        -------
        self
            The forecaster with rewound observation buffers.

        """
        y = self._encode_y_input(y)
        return super().rewind(y, X_actual, groups=groups, X_future=X_future, X_forecast=X_forecast)  # ty: ignore[invalid-return-type]

    def observe_predict_class_proba(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        stride: StrictInt | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Alternate recursive predict_class_proba and observe.

        Equivalent to calling ``observe(y, X_actual)`` then
        ``predict_class_proba()``. Returns probability predictions.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with a ``"time"`` column (datetime) and one
            or more categorical value columns.
        X_actual : pl.DataFrame or None, default=None
            Actual feature observations with a ``"time"`` column aligned
            with ``y``. Sliced and observed incrementally at each step
            of the rolling loop.
        forecasting_horizon : int or None, default=None
            Number of time steps to forecast into the future. If ``None``,
            uses the horizon specified at fit time.
        groups : list of str or None, default=None
            Panel group prefixes to operate on. If ``None``, all groups
            are used. Ignored when the forecaster was not fitted on panel
            data.
        stride : int or None, default=None
            Step size for rolling update-predict. If ``None``, defaults to
            ``forecasting_horizon``.
        X_future : pl.DataFrame or None, default=None
            Known future features with a ``"time"`` column.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"``
            columns.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Probability predictions with ``"vintage_time"``, ``"time"``,
            and columns ``{target}_proba_{class_label}`` for each class.

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If the forecaster has not been fitted yet.
        ValueError
            If ``y`` / ``X_actual`` have invalid structure or ``groups``
            contains names not seen during fit.

        """
        check_is_fitted(
            self,
            ["local_y_schema_", "local_X_actual_schema_", "shared_X_actual_schema_", "groups_"],
        )

        y = self._encode_y_input(y)

        y, X_actual, groups = validate_forecaster_data(
            self,
            y=y,
            X_actual=X_actual,
            reset=False,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
        )

        forecasting_horizon = self._validate_predict_params(forecasting_horizon)
        if stride is None:
            stride = self.fit_forecasting_horizon_

        return self._observe_predict_loop(
            predict_fn=self.predict_class_proba,
            y=y,
            X_actual=X_actual,
            X_future=X_future,
            X_forecast=X_forecast,
            groups=groups,
            stride=stride,
            forecasting_horizon=forecasting_horizon,
            **params,
        )

    def observe_predict(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt | None = None,
        groups: list[str] | None = None,
        stride: StrictInt | None = None,
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> pl.DataFrame:
        """Alternate recursive predict and observe.

        Equivalent to calling ``observe(y, X_actual)`` then ``predict()``.
        Returns argmax class predictions.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with a ``"time"`` column (datetime) and one
            or more categorical value columns.
        X_actual : pl.DataFrame or None, default=None
            Actual feature observations with a ``"time"`` column aligned
            with ``y``. Sliced and observed incrementally at each step
            of the rolling loop.
        forecasting_horizon : int or None, default=None
            Number of time steps to forecast into the future. If ``None``,
            uses the horizon specified at fit time.
        groups : list of str or None, default=None
            Panel group prefixes to operate on. If ``None``, all groups
            are used. Ignored when the forecaster was not fitted on panel
            data.
        stride : int or None, default=None
            Step size for rolling update-predict. If ``None``, defaults to
            ``forecasting_horizon``.
        X_future : pl.DataFrame or None, default=None
            Known future features with a ``"time"`` column.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"``
            columns.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Point predictions with ``"vintage_time"``, ``"time"``, and one
            column per target variable containing the most-likely class.

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If the forecaster has not been fitted yet.
        ValueError
            If ``y`` / ``X_actual`` have invalid structure or ``groups``
            contains names not seen during fit.

        """
        check_is_fitted(
            self,
            ["local_y_schema_", "local_X_actual_schema_", "shared_X_actual_schema_", "groups_"],
        )

        y = self._encode_y_input(y)

        y, X_actual, groups = validate_forecaster_data(
            self,
            y=y,
            X_actual=X_actual,
            reset=False,
            groups=groups,
            X_future=X_future,
            X_forecast=X_forecast,
        )

        forecasting_horizon = self._validate_predict_params(forecasting_horizon)
        if stride is None:
            stride = self.fit_forecasting_horizon_

        return self._observe_predict_loop(
            predict_fn=self.predict,
            y=y,
            X_actual=X_actual,
            X_future=X_future,
            X_forecast=X_forecast,
            groups=groups,
            stride=stride,
            forecasting_horizon=forecasting_horizon,
            **params,
        )

Methods

__sklearn_tags__()

Get estimator tags.

Returns
Type Description
Tags

Estimator tags with yohou-specific attributes.

Source Code
Show/Hide source
def __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with yohou-specific attributes.

    """
    tags = super().__sklearn_tags__()
    assert tags.forecaster_tags is not None
    tags.forecaster_tags.forecaster_type = CLASS_PROBA
    return tags

fit(y, X_actual=None, forecasting_horizon=1, X_future=None, X_forecast=None, **params)

Fit the forecaster to historical data.

Parameters
Name Type Description Default
y DataFrame

Target time series with a "time" column (datetime) and one or more categorical value columns.

required
X_actual DataFrame or None

Actual feature observations with a "time" column aligned with y. Processed by the feature transformer to produce lags, rolling statistics, and other derived features. If None, only target-derived features are used.

None
forecasting_horizon int

Number of time steps to forecast into the future.

1
X_future DataFrame or None

Known future features with a "time" column. Deterministic values available for past and future dates. Bypasses the feature transformer.

None
X_forecast DataFrame or None

External forecasts with "vintage_time" and "time" columns. Bypasses the feature transformer.

None
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
self

The fitted forecaster instance.

Raises
Type Description
ValueError

If forecasting_horizon < 1, or if y / X_actual have invalid structure (e.g., missing "time" column).

Source Code
Show/Hide source
@_fit_context(prefer_skip_nested_validation=True)
def fit(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt = 1,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> "BaseClassProbaForecaster":
    """Fit the forecaster to historical data.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with a ``"time"`` column (datetime) and one
        or more categorical value columns.
    X_actual : pl.DataFrame or None, default=None
        Actual feature observations with a ``"time"`` column aligned
        with ``y``. Processed by the feature transformer to produce
        lags, rolling statistics, and other derived features. If
        ``None``, only target-derived features are used.
    forecasting_horizon : int, default=1
        Number of time steps to forecast into the future.
    X_future : pl.DataFrame or None, default=None
        Known future features with a ``"time"`` column. Deterministic
        values available for past and future dates. Bypasses the
        feature transformer.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"``
        columns. Bypasses the feature transformer.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    self
        The fitted forecaster instance.

    Raises
    ------
    ValueError
        If ``forecasting_horizon`` < 1, or if ``y`` / ``X_actual`` have invalid
        structure (e.g., missing ``"time"`` column).

    """
    forecasting_horizon = self._validate_fit_params(forecasting_horizon)

    y_t, X_t = self._pre_fit(
        y=y,
        X_actual=X_actual,
        forecasting_horizon=forecasting_horizon,
        X_future=X_future,
        X_forecast=X_forecast,
    )

    self._fit(y_t, X_t, forecasting_horizon)

    return self

predict_class_proba(X_future=None, X_forecast=None, forecasting_horizon=None, groups=None, **params)

Generate class-probability forecasts.

Parameters
Name Type Description Default
X_future DataFrame or None

Known future features override. Re-derives step columns without mutating forecaster state.

None
X_forecast DataFrame or None

External forecast override with "vintage_time" and "time" columns. Re-derives step columns without mutating forecaster state.

None
forecasting_horizon int or None

Number of time steps to forecast into the future. If None, uses the horizon specified at fit time.

None
groups list of str or None

Panel group prefixes to operate on. If None, all groups are used. Ignored when the forecaster was not fitted on panel data.

None
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
DataFrame

Probability predictions with "vintage_time", "time", and columns {target}_proba_{class_label} for each class.

Raises
Type Description
NotFittedError

If the forecaster has not been fitted yet.

ValueError

If groups contains names not seen during fit.

Source Code
Show/Hide source
def predict_class_proba(
    self,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt | None = None,
    groups: list[str] | None = None,
    **params,
) -> pl.DataFrame:
    """Generate class-probability forecasts.

    Parameters
    ----------
    X_future : pl.DataFrame or None, default=None
        Known future features override. Re-derives step columns
        without mutating forecaster state.
    X_forecast : pl.DataFrame or None, default=None
        External forecast override with ``"vintage_time"`` and
        ``"time"`` columns. Re-derives step columns without mutating
        forecaster state.
    forecasting_horizon : int or None, default=None
        Number of time steps to forecast into the future. If ``None``,
        uses the horizon specified at fit time.
    groups : list of str or None, default=None
        Panel group prefixes to operate on. If ``None``, all groups
        are used. Ignored when the forecaster was not fitted on panel
        data.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Probability predictions with ``"vintage_time"``, ``"time"``,
        and columns ``{target}_proba_{class_label}`` for each class.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If the forecaster has not been fitted yet.
    ValueError
        If ``groups`` contains names not seen during fit.

    """
    check_is_fitted(
        self,
        ["local_y_schema_", "local_X_actual_schema_", "shared_X_actual_schema_", "groups_"],
    )

    _, _, groups = validate_forecaster_data(
        self,
        y=None,
        X_actual=None,
        reset=False,
        groups=groups,
        X_future=X_future,
        X_forecast=X_forecast,
    )

    forecasting_horizon = self._validate_predict_params(forecasting_horizon)

    def step_fn(forecaster, groups):
        """Produce one class-probability prediction block."""
        y_pred_step = forecaster._predict_class_proba_one(
            groups=groups,
            **params,
        )
        return y_pred_step, y_pred_step

    def derive_observation_fn(forecaster, y_pred_step):
        """Derive observation via argmax and re-encoding."""
        y_obs = self._argmax_from_proba(y_pred_step)
        y_obs = self._encode_observation(y_obs)
        return y_obs

    def predict_fn():
        return self._recursive_predict(
            forecasting_horizon=forecasting_horizon,
            groups=groups,
            step_fn=step_fn,
            derive_observation_fn=derive_observation_fn,
        )

    return self._predict_with_step_override(
        X_future=X_future,
        X_forecast=X_forecast,
        predict_fn=predict_fn,
    )

predict(X_future=None, X_forecast=None, forecasting_horizon=None, groups=None, **params)

Generate argmax class forecasts from class probabilities.

Convenience method that calls predict_class_proba and returns the most-likely class for each time step and target column.

Parameters
Name Type Description Default
X_future DataFrame or None

Known future features override.

None
X_forecast DataFrame or None

External forecast override.

None
forecasting_horizon int or None

Number of time steps to forecast into the future. If None, uses the horizon specified at fit time.

None
groups list of str or None

Panel group prefixes to operate on. If None, all groups are used. Ignored when the forecaster was not fitted on panel data.

None
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
DataFrame

Point predictions with "vintage_time", "time", and one column per target variable containing the most-likely class.

Raises
Type Description
NotFittedError

If the forecaster has not been fitted yet.

Source Code
Show/Hide source
def predict(
    self,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt | None = None,
    groups: list[str] | None = None,
    **params,
) -> pl.DataFrame:
    """Generate argmax class forecasts from class probabilities.

    Convenience method that calls ``predict_class_proba`` and returns
    the most-likely class for each time step and target column.

    Parameters
    ----------
    X_future : pl.DataFrame or None, default=None
        Known future features override.
    X_forecast : pl.DataFrame or None, default=None
        External forecast override.
    forecasting_horizon : int or None, default=None
        Number of time steps to forecast into the future. If ``None``,
        uses the horizon specified at fit time.
    groups : list of str or None, default=None
        Panel group prefixes to operate on. If ``None``, all groups
        are used. Ignored when the forecaster was not fitted on panel
        data.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Point predictions with ``"vintage_time"``, ``"time"``, and one
        column per target variable containing the most-likely class.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If the forecaster has not been fitted yet.

    """
    y_proba = self.predict_class_proba(
        X_future=X_future,
        X_forecast=X_forecast,
        forecasting_horizon=forecasting_horizon,
        groups=groups,
        **params,
    )
    return self._argmax_from_proba(y_proba)

observe(y, X_actual=None, groups=None, X_future=None, X_forecast=None)

Observe new data, encoding categorical targets before validation.

Overrides BaseForecaster.observe to encode string target columns to float codes before schema validation.

Parameters
Name Type Description Default
y DataFrame

Target time series with a "time" column (datetime) and one or more categorical value columns.

required
X_actual DataFrame or None

New actual feature observations with a "time" column aligned with y. Passed through the feature transformer to update the internal observation state.

None
groups list of str or None

Panel group prefixes to operate on. If None, all groups are used.

None
X_future DataFrame or None

Known future features with a "time" column.

None
X_forecast DataFrame or None

External forecasts with "vintage_time" and "time" columns.

None
Returns
Type Description
self

The forecaster with updated observation buffers.

Source Code
Show/Hide source
def observe(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    groups: list[str] | None = None,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
) -> "BaseClassProbaForecaster":
    """Observe new data, encoding categorical targets before validation.

    Overrides ``BaseForecaster.observe`` to encode string target columns
    to float codes before schema validation.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with a ``"time"`` column (datetime) and one
        or more categorical value columns.
    X_actual : pl.DataFrame or None, default=None
        New actual feature observations with a ``"time"`` column
        aligned with ``y``. Passed through the feature transformer
        to update the internal observation state.
    groups : list of str or None, default=None
        Panel group prefixes to operate on. If ``None``, all groups
        are used.
    X_future : pl.DataFrame or None, default=None
        Known future features with a ``"time"`` column.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"``
        columns.

    Returns
    -------
    self
        The forecaster with updated observation buffers.

    """
    y = self._encode_y_input(y)
    return super().observe(y, X_actual, groups=groups, X_future=X_future, X_forecast=X_forecast)  # ty: ignore[invalid-return-type]

rewind(y, X_actual=None, groups=None, X_future=None, X_forecast=None)

Rewind memory, encoding categorical targets before validation.

Overrides BaseForecaster.rewind to encode string target columns to float codes before schema validation.

Parameters
Name Type Description Default
y DataFrame

Target time series with a "time" column (datetime) and one or more categorical value columns.

required
X_actual DataFrame or None

Actual feature observations to restore the observation state to. Must align with y.

None
groups list of str or None

Panel group prefixes to operate on. If None, all groups are used.

None
X_future DataFrame or None

Known future features with a "time" column.

None
X_forecast DataFrame or None

External forecasts with "vintage_time" and "time" columns.

None
Returns
Type Description
self

The forecaster with rewound observation buffers.

Source Code
Show/Hide source
def rewind(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    groups: list[str] | None = None,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
) -> "BaseClassProbaForecaster":
    """Rewind memory, encoding categorical targets before validation.

    Overrides ``BaseForecaster.rewind`` to encode string target columns
    to float codes before schema validation.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with a ``"time"`` column (datetime) and one
        or more categorical value columns.
    X_actual : pl.DataFrame or None, default=None
        Actual feature observations to restore the observation
        state to. Must align with ``y``.
    groups : list of str or None, default=None
        Panel group prefixes to operate on. If ``None``, all groups
        are used.
    X_future : pl.DataFrame or None, default=None
        Known future features with a ``"time"`` column.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"``
        columns.

    Returns
    -------
    self
        The forecaster with rewound observation buffers.

    """
    y = self._encode_y_input(y)
    return super().rewind(y, X_actual, groups=groups, X_future=X_future, X_forecast=X_forecast)  # ty: ignore[invalid-return-type]

observe_predict_class_proba(y, X_actual=None, forecasting_horizon=None, groups=None, stride=None, X_future=None, X_forecast=None, **params)

Alternate recursive predict_class_proba and observe.

Equivalent to calling observe(y, X_actual) then predict_class_proba(). Returns probability predictions.

Parameters
Name Type Description Default
y DataFrame

Target time series with a "time" column (datetime) and one or more categorical value columns.

required
X_actual DataFrame or None

Actual feature observations with a "time" column aligned with y. Sliced and observed incrementally at each step of the rolling loop.

None
forecasting_horizon int or None

Number of time steps to forecast into the future. If None, uses the horizon specified at fit time.

None
groups list of str or None

Panel group prefixes to operate on. If None, all groups are used. Ignored when the forecaster was not fitted on panel data.

None
stride int or None

Step size for rolling update-predict. If None, defaults to forecasting_horizon.

None
X_future DataFrame or None

Known future features with a "time" column.

None
X_forecast DataFrame or None

External forecasts with "vintage_time" and "time" columns.

None
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
DataFrame

Probability predictions with "vintage_time", "time", and columns {target}_proba_{class_label} for each class.

Raises
Type Description
NotFittedError

If the forecaster has not been fitted yet.

ValueError

If y / X_actual have invalid structure or groups contains names not seen during fit.

Source Code
Show/Hide source
def observe_predict_class_proba(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt | None = None,
    groups: list[str] | None = None,
    stride: StrictInt | None = None,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> pl.DataFrame:
    """Alternate recursive predict_class_proba and observe.

    Equivalent to calling ``observe(y, X_actual)`` then
    ``predict_class_proba()``. Returns probability predictions.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with a ``"time"`` column (datetime) and one
        or more categorical value columns.
    X_actual : pl.DataFrame or None, default=None
        Actual feature observations with a ``"time"`` column aligned
        with ``y``. Sliced and observed incrementally at each step
        of the rolling loop.
    forecasting_horizon : int or None, default=None
        Number of time steps to forecast into the future. If ``None``,
        uses the horizon specified at fit time.
    groups : list of str or None, default=None
        Panel group prefixes to operate on. If ``None``, all groups
        are used. Ignored when the forecaster was not fitted on panel
        data.
    stride : int or None, default=None
        Step size for rolling update-predict. If ``None``, defaults to
        ``forecasting_horizon``.
    X_future : pl.DataFrame or None, default=None
        Known future features with a ``"time"`` column.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"``
        columns.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Probability predictions with ``"vintage_time"``, ``"time"``,
        and columns ``{target}_proba_{class_label}`` for each class.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If the forecaster has not been fitted yet.
    ValueError
        If ``y`` / ``X_actual`` have invalid structure or ``groups``
        contains names not seen during fit.

    """
    check_is_fitted(
        self,
        ["local_y_schema_", "local_X_actual_schema_", "shared_X_actual_schema_", "groups_"],
    )

    y = self._encode_y_input(y)

    y, X_actual, groups = validate_forecaster_data(
        self,
        y=y,
        X_actual=X_actual,
        reset=False,
        groups=groups,
        X_future=X_future,
        X_forecast=X_forecast,
    )

    forecasting_horizon = self._validate_predict_params(forecasting_horizon)
    if stride is None:
        stride = self.fit_forecasting_horizon_

    return self._observe_predict_loop(
        predict_fn=self.predict_class_proba,
        y=y,
        X_actual=X_actual,
        X_future=X_future,
        X_forecast=X_forecast,
        groups=groups,
        stride=stride,
        forecasting_horizon=forecasting_horizon,
        **params,
    )

observe_predict(y, X_actual=None, forecasting_horizon=None, groups=None, stride=None, X_future=None, X_forecast=None, **params)

Alternate recursive predict and observe.

Equivalent to calling observe(y, X_actual) then predict(). Returns argmax class predictions.

Parameters
Name Type Description Default
y DataFrame

Target time series with a "time" column (datetime) and one or more categorical value columns.

required
X_actual DataFrame or None

Actual feature observations with a "time" column aligned with y. Sliced and observed incrementally at each step of the rolling loop.

None
forecasting_horizon int or None

Number of time steps to forecast into the future. If None, uses the horizon specified at fit time.

None
groups list of str or None

Panel group prefixes to operate on. If None, all groups are used. Ignored when the forecaster was not fitted on panel data.

None
stride int or None

Step size for rolling update-predict. If None, defaults to forecasting_horizon.

None
X_future DataFrame or None

Known future features with a "time" column.

None
X_forecast DataFrame or None

External forecasts with "vintage_time" and "time" columns.

None
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
DataFrame

Point predictions with "vintage_time", "time", and one column per target variable containing the most-likely class.

Raises
Type Description
NotFittedError

If the forecaster has not been fitted yet.

ValueError

If y / X_actual have invalid structure or groups contains names not seen during fit.

Source Code
Show/Hide source
def observe_predict(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt | None = None,
    groups: list[str] | None = None,
    stride: StrictInt | None = None,
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> pl.DataFrame:
    """Alternate recursive predict and observe.

    Equivalent to calling ``observe(y, X_actual)`` then ``predict()``.
    Returns argmax class predictions.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with a ``"time"`` column (datetime) and one
        or more categorical value columns.
    X_actual : pl.DataFrame or None, default=None
        Actual feature observations with a ``"time"`` column aligned
        with ``y``. Sliced and observed incrementally at each step
        of the rolling loop.
    forecasting_horizon : int or None, default=None
        Number of time steps to forecast into the future. If ``None``,
        uses the horizon specified at fit time.
    groups : list of str or None, default=None
        Panel group prefixes to operate on. If ``None``, all groups
        are used. Ignored when the forecaster was not fitted on panel
        data.
    stride : int or None, default=None
        Step size for rolling update-predict. If ``None``, defaults to
        ``forecasting_horizon``.
    X_future : pl.DataFrame or None, default=None
        Known future features with a ``"time"`` column.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"``
        columns.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    pl.DataFrame
        Point predictions with ``"vintage_time"``, ``"time"``, and one
        column per target variable containing the most-likely class.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If the forecaster has not been fitted yet.
    ValueError
        If ``y`` / ``X_actual`` have invalid structure or ``groups``
        contains names not seen during fit.

    """
    check_is_fitted(
        self,
        ["local_y_schema_", "local_X_actual_schema_", "shared_X_actual_schema_", "groups_"],
    )

    y = self._encode_y_input(y)

    y, X_actual, groups = validate_forecaster_data(
        self,
        y=y,
        X_actual=X_actual,
        reset=False,
        groups=groups,
        X_future=X_future,
        X_forecast=X_forecast,
    )

    forecasting_horizon = self._validate_predict_params(forecasting_horizon)
    if stride is None:
        stride = self.fit_forecasting_horizon_

    return self._observe_predict_loop(
        predict_fn=self.predict,
        y=y,
        X_actual=X_actual,
        X_future=X_future,
        X_forecast=X_forecast,
        groups=groups,
        stride=stride,
        forecasting_horizon=forecasting_horizon,
        **params,
    )

Tutorials

The following example notebooks use this component:

  • How to Create a Custom Class-Probability Forecaster


    Getting-Started

    Implement a MajorityClassForecaster from scratch, validate it with the check generator, and compare it against ClassProbaReductionForecaster.

    View · Open in marimo