ClassProbaReductionForecaster¶

`yohou.class_proba.reduction.ClassProbaReductionForecaster` ¶

Bases: BaseReductionForecaster, BaseClassProbaForecaster

Class-probability forecaster using sklearn classifiers on tabularized time series.

Converts categorical time series forecasting to a tabular classification task. The target is encoded to integer codes before tabularization; predictions use predict_proba to return per-class probability distributions.

Parameters¶

Name	Type	Description	Default
`estimator`	`BaseEstimator`	Classifier used to fit the tabularized data. Must implement `fit`, `predict`, and `predict_proba`.	`LogisticRegression()`
`reduction_strategy`	`('direct', 'multi-output')`	Strategy for multi-step forecasting.	`"direct"`
`target_transformer`	`BaseTransformer or None`	Transformer for target preprocessing.	`None`
`feature_transformer`	`BaseTransformer or None`	Transformer for feature engineering (typically LagTransformer).	`None`
`target_as_feature`	`('transformed', 'raw')`	Whether to include the target variable as a feature for reduction. If `"transformed"`, the transformed target is used. If `"raw"`, the raw target is used. If `None`, the target is not included as a feature.	`"transformed"`
`step_feature_alignment`	`('all', 'matched', 'cumulative')`	Controls which step-indexed feature columns each direct estimator sees. Only affects the `"direct"` strategy. `"all"`: every estimator receives all step columns. `"matched"`: estimator for step h receives only `_step_h`. `"cumulative"`: estimator for step h receives `_step_1..h`.	`"all"`
`nan_handling`	`('drop', 'pass')`	How to handle NaN values in tabularized data. `"pass"` leaves NaN in place (suitable for estimators that handle NaN natively, such as tree-based models). `"drop"` removes any training instance where X or y contains NaN before fitting the estimator, and emits a warning with the count of dropped rows. At predict time, returns NaN predictions for any time step whose features contain NaN.	`"drop"`
`panel_strategy`	`('global', 'multivariate')`	How to handle panel data. See `BaseForecaster` for details.	`"global"`
`n_jobs`	`int or None`	Number of jobs to run in parallel for the `"direct"` strategy. `None` means 1. `-1` means using all processors.	`None`

Attributes¶

Name	Type	Description
`classes_`	`dict[str, list[str]]`	Mapping from target column name to its class labels, discovered at fit time from the unique values in each target column.
`n_classes_`	`dict[str, int]`	Mapping from target column name to the number of classes.
`label_to_code_`	`dict[str, dict[str, int]]`	Mapping from target column name to a dict mapping class labels to integer codes.
`estimator_`	`BaseEstimator or list[BaseEstimator]`	Fitted sklearn classifier(s).

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.class_proba import ClassProbaReductionForecaster
>>>
>>> df = pl.DataFrame({
...     "time": pl.datetime_range(
...         start=datetime(2021, 1, 1),
...         end=datetime(2021, 1, 10),
...         interval="1d",
...         eager=True,
...     ),
...     "weather": ["sun", "sun", "rain", "rain", "cloud", "sun", "rain", "cloud", "sun", "rain"],
... })
>>>
>>> train = df[:8]
>>> forecaster = ClassProbaReductionForecaster()
>>> _ = forecaster.fit(y=train, forecasting_horizon=1)
>>>
>>> y_proba = forecaster.predict_class_proba(forecasting_horizon=1)
>>> len(y_proba)
1

Notes¶

The target columns are label-encoded to integer codes before tabularization. The encoding is stored in classes_ and label_to_code_ so that predict_class_proba can map the classifier's probability output back to the original class labels.

Source Code¶

View on GitHub

Show/Hide sourceclass ClassProbaReductionForecaster(BaseReductionForecaster, BaseClassProbaForecaster):
    """Class-probability forecaster using sklearn classifiers on tabularized time series.

    Converts categorical time series forecasting to a tabular classification task.
    The target is encoded to integer codes before tabularization; predictions use
    ``predict_proba`` to return per-class probability distributions.

    Parameters
    ----------
    estimator : BaseEstimator, default=LogisticRegression()
        Classifier used to fit the tabularized data. Must implement
        ``fit``, ``predict``, and ``predict_proba``.
    reduction_strategy : {"direct", "multi-output"}, default="multi-output"
        Strategy for multi-step forecasting.
    target_transformer : BaseTransformer or None, default=None
        Transformer for target preprocessing.
    feature_transformer : BaseTransformer or None, default=None
        Transformer for feature engineering (typically LagTransformer).
    target_as_feature : {"transformed", "raw"} or None, default="transformed"
        Whether to include the target variable as a feature for reduction.
        If ``"transformed"``, the transformed target is used. If ``"raw"``,
        the raw target is used. If ``None``, the target is not included as
        a feature.
    step_feature_alignment : {"all", "matched", "cumulative"}, default="all"
        Controls which step-indexed feature columns each direct estimator
        sees. Only affects the ``"direct"`` strategy.

        - ``"all"``: every estimator receives all step columns.
        - ``"matched"``: estimator for step h receives only ``*_step_h``.
        - ``"cumulative"``: estimator for step h receives ``*_step_1..h``.
    nan_handling : {"drop", "pass"}, default="pass"
        How to handle NaN values in tabularized data.
        ``"pass"`` leaves NaN in place (suitable for estimators that
        handle NaN natively, such as tree-based models). ``"drop"``
        removes any training instance where X or y contains NaN before
        fitting the estimator, and emits a warning with the count of
        dropped rows. At predict time, returns NaN predictions for any
        time step whose features contain NaN.
    panel_strategy : {"global", "multivariate"}, default="global"
        How to handle panel data. See `BaseForecaster` for details.
    n_jobs : int or None, default=None
        Number of jobs to run in parallel for the ``"direct"`` strategy.
        ``None`` means 1. ``-1`` means using all processors.

    Attributes
    ----------
    classes_ : dict[str, list[str]]
        Mapping from target column name to its class labels, discovered at
        fit time from the unique values in each target column.
    n_classes_ : dict[str, int]
        Mapping from target column name to the number of classes.
    label_to_code_ : dict[str, dict[str, int]]
        Mapping from target column name to a dict mapping class labels to
        integer codes.
    estimator_ : BaseEstimator or list[BaseEstimator]
        Fitted sklearn classifier(s).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.class_proba import ClassProbaReductionForecaster
    >>>
    >>> df = pl.DataFrame({
    ...     "time": pl.datetime_range(
    ...         start=datetime(2021, 1, 1),
    ...         end=datetime(2021, 1, 10),
    ...         interval="1d",
    ...         eager=True,
    ...     ),
    ...     "weather": ["sun", "sun", "rain", "rain", "cloud", "sun", "rain", "cloud", "sun", "rain"],
    ... })
    >>>
    >>> train = df[:8]
    >>> forecaster = ClassProbaReductionForecaster()
    >>> _ = forecaster.fit(y=train, forecasting_horizon=1)
    >>>
    >>> y_proba = forecaster.predict_class_proba(forecasting_horizon=1)
    >>> len(y_proba)
    1

    Notes
    -----
    The target columns are label-encoded to integer codes before
    tabularization. The encoding is stored in ``classes_`` and
    ``label_to_code_`` so that ``predict_class_proba`` can map the
    classifier's probability output back to the original class labels.

    See Also
    --------
    - [`BaseClassProbaForecaster`][yohou.class_proba.base.BaseClassProbaForecaster] : Base class for class-probability forecasters.
    - [`PointReductionForecaster`][yohou.point.reduction.PointReductionForecaster] : ML-based point forecaster.
    - [`BaseReductionForecaster`][yohou.base.reduction.BaseReductionForecaster] : Base class for reduction forecasters.

    """

    _parameter_constraints: dict = {
        **BaseReductionForecaster._parameter_constraints,
        **BaseClassProbaForecaster._parameter_constraints,
        "estimator": [HasMethods(["fit", "predict", "predict_proba"])],
        "reduction_strategy": [StrOptions({"direct", "multi-output"})],
    }

    _supports_panel = True

    def __init__(
        self,
        estimator: BaseEstimator = LogisticRegression(),
        reduction_strategy: Literal["direct", "multi-output"] = "multi-output",
        target_transformer: BaseTransformer | None = None,
        feature_transformer: BaseTransformer | None = None,
        target_as_feature: Literal["transformed", "raw"] | None = "transformed",
        step_feature_alignment: Literal["all", "matched", "cumulative"] = "all",
        nan_handling: Literal["drop", "pass"] = "pass",
        n_jobs: int | None = None,
        panel_strategy: Literal["global", "multivariate"] = "global",
    ) -> None:
        BaseReductionForecaster.__init__(
            self,
            estimator=estimator,
            reduction_strategy=reduction_strategy,
            target_as_feature=target_as_feature,
            step_feature_alignment=step_feature_alignment,
            nan_handling=nan_handling,
            n_jobs=n_jobs,
            panel_strategy=panel_strategy,
        )

        BaseClassProbaForecaster.__init__(
            self,
            target_transformer=target_transformer,
            feature_transformer=feature_transformer,
            target_as_feature=target_as_feature,
            panel_strategy=panel_strategy,
        )

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
        forecasting_horizon: StrictInt = 1,
        time_weight: Callable | pl.DataFrame | dict | None = None,
        vintage_weight: Callable | pl.DataFrame | dict | None = None,
        sample_weight_alignment: str = "first_step",
        X_future: pl.DataFrame | None = None,
        X_forecast: pl.DataFrame | None = None,
        **params,
    ) -> ClassProbaReductionForecaster:
        """Fit the forecaster to historical data.

        Encodes categorical targets to integer codes, tabularizes the time
        series, and fits the wrapped sklearn classifier.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series with a ``"time"`` column (datetime) and one
            or more categorical (String, Categorical, or Enum) value columns.
        X_actual : pl.DataFrame or None, default=None
            Actual feature observations with a ``"time"`` column aligned
            with ``y``. Processed by the feature transformer to produce
            lags, rolling statistics, and other derived features. If
            ``None``, only target-derived features are used.
        forecasting_horizon : int, default=1
            Number of time steps to forecast into the future.
        time_weight : callable, pl.DataFrame, dict, or None, default=None
            Per-timestep weights for fitting.  Accepts a callable
            ``f(time_series) -> pl.Series``, a panel-aware callable
            ``f(time_series, group_name) -> pl.Series``, a DataFrame
            with ``"time"`` and ``"weight"`` columns, or a
            ``{datetime_or_str: float}`` dict (``"*"`` key sets default).
        vintage_weight : callable, pl.DataFrame, dict, or None, default=None
            Per-vintage weights for fitting.  Same formats as
            ``time_weight``.  Resolved via direct lookup at observation
            time (no alignment strategy). Combined multiplicatively
            with ``time_weight``.
        sample_weight_alignment : str, default="first_step"
            Strategy for converting ``time_weight`` to sklearn
            ``sample_weight`` across forecast horizons. Does not apply
            to ``vintage_weight`` (which uses direct lookup).
        X_future : pl.DataFrame or None, default=None
            Known future features with a ``"time"`` column. Deterministic
            values available for past and future dates. Bypasses the
            feature transformer.
        X_forecast : pl.DataFrame or None, default=None
            External forecasts with ``"vintage_time"`` and ``"time"``
            columns. Bypasses the feature transformer.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        self
            The fitted forecaster instance.

        """
        forecasting_horizon = self._validate_fit_params(forecasting_horizon)

        # Discover classes from y before _pre_fit (which may transform y)
        # Use unprefixed (base) column names so panel groups share class labels.
        self.classes_: dict[str, list[str]] = {}
        self.n_classes_: dict[str, int] = {}
        self.label_to_code_: dict[str, dict[str, int]] = {}
        for col in y.columns:
            if col == "time":
                continue
            base_col = col.split("__")[-1] if "__" in col else col
            unique_vals = sorted(y[col].drop_nulls().unique().cast(pl.String).to_list())
            if base_col in self.classes_:
                merged = sorted(set(self.classes_[base_col]) | set(unique_vals))
                self.classes_[base_col] = merged
            else:
                self.classes_[base_col] = unique_vals
        for base_col, labels in self.classes_.items():
            self.n_classes_[base_col] = len(labels)
            self.label_to_code_[base_col] = {label: i for i, label in enumerate(labels)}

        # Encode target columns to integer codes for tabularization
        y_encoded = self._encode_target(y)

        y_t, X_t = self._pre_fit(
            y=y_encoded,
            X_actual=X_actual,
            forecasting_horizon=forecasting_horizon,
            X_future=X_future,
            X_forecast=X_forecast,
        )

        self.estimator_ = self._estimator_fit_one(
            y_t,
            X_t,
            forecasting_horizon,
            time_weight=time_weight,
            sample_weight_alignment=sample_weight_alignment,
            vintage_weight=vintage_weight,
            estimator_fit_params=params,
        )

        return self

    def _encode_target(self, y: pl.DataFrame) -> pl.DataFrame:
        """Encode categorical target columns to integer codes.

        Parameters
        ----------
        y : pl.DataFrame
            Target data with categorical columns.

        Returns
        -------
        pl.DataFrame
            Target data with categorical columns replaced by integer codes.

        """
        exprs = []
        for col in y.columns:
            if col == "time":
                continue
            base_col = col.split("__")[-1] if "__" in col else col
            mapping = self.label_to_code_[base_col]
            # Cast to String first to handle Categorical/Enum/String uniformly,
            # then replace labels with integer codes.
            exprs.append(pl.col(col).cast(pl.String).replace_strict(mapping, return_dtype=pl.Float64).alias(col))
        return y.with_columns(exprs)

    def _predict_class_proba_one(
        self,
        groups: list[str],
        **params,
    ) -> pl.DataFrame:
        """Produce probability forecasts for one fit-horizon block.

        Parameters
        ----------
        groups : list of str
            Panel group names to predict for.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        pl.DataFrame
            Probability predictions with ``"vintage_time"``, ``"time"``,
            and columns ``{target}_proba_{class_label}`` for each class.

        """
        y_proba = self._estimator_predict_proba_one(
            self.estimator_,
            groups=groups,
        )
        y_proba = self._add_time_columns(y_proba)
        return y_proba

    def _estimator_predict_proba_one(
        self,
        estimator: BaseEstimator | list[BaseEstimator],
        groups: list[str],
    ) -> pl.DataFrame:
        """Dispatch estimator probability prediction to the strategy-specific method.

        Parameters
        ----------
        estimator : BaseEstimator or list[BaseEstimator]
            Fitted estimator(s).
        groups : list of str
            Panel group names to predict for.

        Returns
        -------
        pl.DataFrame
            Probability predictions.

        """
        if self.reduction_strategy == "direct":
            assert isinstance(estimator, list)
            return self._estimator_predict_proba_direct(cast(list[BaseEstimator], estimator), groups)
        assert isinstance(estimator, BaseEstimator)
        return self._estimator_predict_proba_multi_output(estimator, groups)

    def _estimator_predict_proba_multi_output(
        self,
        estimator: BaseEstimator,
        groups: list[str],
    ) -> pl.DataFrame:
        """Generate probability predictions using a fitted multi-output estimator.

        Parameters
        ----------
        estimator : BaseEstimator
            Fitted sklearn classifier.
        groups : list of str
            Panel group names to predict for.

        Returns
        -------
        pl.DataFrame
            Probability predictions.

        """
        if self.groups_ is None:
            X_tab = self._get_predict_features()
            return self._predict_proba_and_reshape(estimator, X_tab)

        y_pred_dict: dict[str, pl.DataFrame] = {}
        for panel_group_name in groups:
            X_tab = self._get_predict_features(panel_group_name)
            y_pred_dict[panel_group_name] = self._predict_proba_and_reshape(estimator, X_tab, panel_group_name)
        return pl.concat(list(y_pred_dict.values()), how="horizontal")

    def _estimator_predict_proba_direct(
        self,
        estimators: list[BaseEstimator],
        groups: list[str],
    ) -> pl.DataFrame:
        """Generate probability predictions using H independent direct estimators.

        Parameters
        ----------
        estimators : list[BaseEstimator]
            H fitted estimators, one per horizon step.
        groups : list of str
            Panel group names to predict for.

        Returns
        -------
        pl.DataFrame
            Probability predictions.

        """
        if self.groups_ is None:
            X_tab = self._get_predict_features()
            frames = []
            for estimator in estimators:
                frames.append(self._predict_proba_and_reshape_single_step(estimator, X_tab))
            return pl.concat(frames)

        y_pred_dict: dict[str, list[pl.DataFrame]] = {g: [] for g in groups}
        for panel_group_name in groups:
            X_tab = self._get_predict_features(panel_group_name)
            for estimator in estimators:
                y_pred_dict[panel_group_name].append(
                    self._predict_proba_and_reshape_single_step(estimator, X_tab, panel_group_name)
                )
        return pl.concat(
            [pl.concat(v) for v in y_pred_dict.values()],
            how="horizontal",
        )

    def _predict_proba_and_reshape(
        self,
        estimator: BaseEstimator,
        X_tab: pl.DataFrame,
        panel_group_name: str | None = None,
    ) -> pl.DataFrame:
        """Call predict_proba and reshape to probability DataFrame.

        For multi-output, the estimator predicts all H steps at once.
        Each step has n_targets columns; each target column's integer
        prediction maps to n_classes probability columns.

        Parameters
        ----------
        estimator : BaseEstimator
            Fitted classifier.
        X_tab : pl.DataFrame
            Feature DataFrame of shape ``(1, n_features)``.
        panel_group_name : str or None
            Panel group prefix for column naming.

        Returns
        -------
        pl.DataFrame
            Probability DataFrame with ``{target}_proba_{class}`` columns.

        """
        assert self.local_y_t_schema_ is not None
        y_cols = list(self.local_y_t_schema_.keys())
        fh = self.fit_forecasting_horizon_
        n_targets = len(y_cols)

        # For multi-output with H*n_targets outputs, sklearn wraps in
        # MultiOutputClassifier or similar. We handle both cases.
        proba = estimator.predict_proba(X_tab)  # ty: ignore[unresolved-attribute]

        # Build result row by row (one row per forecast step)
        result_data: dict[str, list[Any]] = {}
        for target_col in y_cols:
            for label in self.classes_[target_col]:
                col_name = f"{target_col}_proba_{label}"
                if panel_group_name is not None:
                    col_name = f"{panel_group_name}__{col_name}"
                result_data[col_name] = []

        if isinstance(proba, list):
            # MultiOutputClassifier returns list of arrays, one per output
            # Each array has shape (1, n_classes_for_that_output)
            # Outputs are ordered: target1_step1, target2_step1, ..., target1_step2, ...
            for step in range(fh):
                for t_idx, target_col in enumerate(y_cols):
                    output_idx = step * n_targets + t_idx
                    step_proba = proba[output_idx][0]  # shape (n_classes,)
                    classes_for_target = self.classes_[target_col]
                    for c_idx, label in enumerate(classes_for_target):
                        col_name = f"{target_col}_proba_{label}"
                        if panel_group_name is not None:
                            col_name = f"{panel_group_name}__{col_name}"
                        if c_idx < len(step_proba):
                            result_data[col_name].append(float(step_proba[c_idx]))
                        else:
                            result_data[col_name].append(0.0)
        else:
            # Single-output classifier or single-target: proba shape (1, n_classes)
            # For multi-step, we need H rows from a single prediction. If
            # multi-output is used, the model predicts H*n_targets columns
            # and predict_proba returns a single array.
            # Fall back: treat as single-step single-target
            assert n_targets == 1
            target_col = y_cols[0]
            classes_for_target = self.classes_[target_col]

            if proba.ndim == 2 and proba.shape[0] == 1:
                # Single row prediction, map to fh=1
                step_proba = proba[0]
                for c_idx, label in enumerate(classes_for_target):
                    col_name = f"{target_col}_proba_{label}"
                    if panel_group_name is not None:
                        col_name = f"{panel_group_name}__{col_name}"
                    result_data[col_name].append(float(step_proba[c_idx]) if c_idx < len(step_proba) else 0.0)
                # If fh > 1, replicate (the recursive loop in predict_class_proba handles stepping)
                # This branch should only be reached with fh=1 in multi-output mode.

        return pl.DataFrame(result_data)

    def _predict_proba_and_reshape_single_step(
        self,
        estimator: BaseEstimator,
        X_tab: pl.DataFrame,
        panel_group_name: str | None = None,
    ) -> pl.DataFrame:
        """Call predict_proba for a single-step direct estimator.

        Parameters
        ----------
        estimator : BaseEstimator
            Fitted single-step classifier.
        X_tab : pl.DataFrame
            Feature DataFrame of shape ``(1, n_features)``.
        panel_group_name : str or None
            Panel group prefix for column naming.

        Returns
        -------
        pl.DataFrame
            Single-row probability DataFrame.

        """
        assert self.local_y_t_schema_ is not None
        y_cols = list(self.local_y_t_schema_.keys())

        proba = estimator.predict_proba(X_tab)  # ty: ignore[unresolved-attribute]

        result_data: dict[str, list[float]] = {}

        if isinstance(proba, list):
            # Multiple targets
            for t_idx, target_col in enumerate(y_cols):
                step_proba = proba[t_idx][0]
                for c_idx, label in enumerate(self.classes_[target_col]):
                    col_name = f"{target_col}_proba_{label}"
                    if panel_group_name is not None:
                        col_name = f"{panel_group_name}__{col_name}"
                    result_data[col_name] = [float(step_proba[c_idx]) if c_idx < len(step_proba) else 0.0]
        else:
            # Single target
            assert len(y_cols) == 1
            target_col = y_cols[0]
            step_proba = proba[0]
            for c_idx, label in enumerate(self.classes_[target_col]):
                col_name = f"{target_col}_proba_{label}"
                if panel_group_name is not None:
                    col_name = f"{panel_group_name}__{col_name}"
                result_data[col_name] = [float(step_proba[c_idx]) if c_idx < len(step_proba) else 0.0]

        return pl.DataFrame(result_data)

Methods¶

`fit(y, X_actual=None, forecasting_horizon=1, time_weight=None, vintage_weight=None, sample_weight_alignment='first_step', X_future=None, X_forecast=None, **params)` ¶

Fit the forecaster to historical data.

Encodes categorical targets to integer codes, tabularizes the time series, and fits the wrapped sklearn classifier.

Parameters¶

Name	Type	Description	Default
`y`	`DataFrame`	Target time series with a `"time"` column (datetime) and one or more categorical (String, Categorical, or Enum) value columns.	required
`X_actual`	`DataFrame or None`	Actual feature observations with a `"time"` column aligned with `y`. Processed by the feature transformer to produce lags, rolling statistics, and other derived features. If `None`, only target-derived features are used.	`None`
`forecasting_horizon`	`int`	Number of time steps to forecast into the future.	`1`
`time_weight`	`callable, pl.DataFrame, dict, or None`	Per-timestep weights for fitting. Accepts a callable `f(time_series) -> pl.Series`, a panel-aware callable `f(time_series, group_name) -> pl.Series`, a DataFrame with `"time"` and `"weight"` columns, or a `{datetime_or_str: float}` dict (`"*"` key sets default).	`None`
`vintage_weight`	`callable, pl.DataFrame, dict, or None`	Per-vintage weights for fitting. Same formats as `time_weight`. Resolved via direct lookup at observation time (no alignment strategy). Combined multiplicatively with `time_weight`.	`None`
`sample_weight_alignment`	`str`	Strategy for converting `time_weight` to sklearn `sample_weight` across forecast horizons. Does not apply to `vintage_weight` (which uses direct lookup).	`"first_step"`
`X_future`	`DataFrame or None`	Known future features with a `"time"` column. Deterministic values available for past and future dates. Bypasses the feature transformer.	`None`
`X_forecast`	`DataFrame or None`	External forecasts with `"vintage_time"` and `"time"` columns. Bypasses the feature transformer.	`None`
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`self`	The fitted forecaster instance.

Source Code¶

View on GitHub

Show/Hide source@_fit_context(prefer_skip_nested_validation=True)
def fit(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
    forecasting_horizon: StrictInt = 1,
    time_weight: Callable | pl.DataFrame | dict | None = None,
    vintage_weight: Callable | pl.DataFrame | dict | None = None,
    sample_weight_alignment: str = "first_step",
    X_future: pl.DataFrame | None = None,
    X_forecast: pl.DataFrame | None = None,
    **params,
) -> ClassProbaReductionForecaster:
    """Fit the forecaster to historical data.

    Encodes categorical targets to integer codes, tabularizes the time
    series, and fits the wrapped sklearn classifier.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series with a ``"time"`` column (datetime) and one
        or more categorical (String, Categorical, or Enum) value columns.
    X_actual : pl.DataFrame or None, default=None
        Actual feature observations with a ``"time"`` column aligned
        with ``y``. Processed by the feature transformer to produce
        lags, rolling statistics, and other derived features. If
        ``None``, only target-derived features are used.
    forecasting_horizon : int, default=1
        Number of time steps to forecast into the future.
    time_weight : callable, pl.DataFrame, dict, or None, default=None
        Per-timestep weights for fitting.  Accepts a callable
        ``f(time_series) -> pl.Series``, a panel-aware callable
        ``f(time_series, group_name) -> pl.Series``, a DataFrame
        with ``"time"`` and ``"weight"`` columns, or a
        ``{datetime_or_str: float}`` dict (``"*"`` key sets default).
    vintage_weight : callable, pl.DataFrame, dict, or None, default=None
        Per-vintage weights for fitting.  Same formats as
        ``time_weight``.  Resolved via direct lookup at observation
        time (no alignment strategy). Combined multiplicatively
        with ``time_weight``.
    sample_weight_alignment : str, default="first_step"
        Strategy for converting ``time_weight`` to sklearn
        ``sample_weight`` across forecast horizons. Does not apply
        to ``vintage_weight`` (which uses direct lookup).
    X_future : pl.DataFrame or None, default=None
        Known future features with a ``"time"`` column. Deterministic
        values available for past and future dates. Bypasses the
        feature transformer.
    X_forecast : pl.DataFrame or None, default=None
        External forecasts with ``"vintage_time"`` and ``"time"``
        columns. Bypasses the feature transformer.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    self
        The fitted forecaster instance.

    """
    forecasting_horizon = self._validate_fit_params(forecasting_horizon)

    # Discover classes from y before _pre_fit (which may transform y)
    # Use unprefixed (base) column names so panel groups share class labels.
    self.classes_: dict[str, list[str]] = {}
    self.n_classes_: dict[str, int] = {}
    self.label_to_code_: dict[str, dict[str, int]] = {}
    for col in y.columns:
        if col == "time":
            continue
        base_col = col.split("__")[-1] if "__" in col else col
        unique_vals = sorted(y[col].drop_nulls().unique().cast(pl.String).to_list())
        if base_col in self.classes_:
            merged = sorted(set(self.classes_[base_col]) | set(unique_vals))
            self.classes_[base_col] = merged
        else:
            self.classes_[base_col] = unique_vals
    for base_col, labels in self.classes_.items():
        self.n_classes_[base_col] = len(labels)
        self.label_to_code_[base_col] = {label: i for i, label in enumerate(labels)}

    # Encode target columns to integer codes for tabularization
    y_encoded = self._encode_target(y)

    y_t, X_t = self._pre_fit(
        y=y_encoded,
        X_actual=X_actual,
        forecasting_horizon=forecasting_horizon,
        X_future=X_future,
        X_forecast=X_forecast,
    )

    self.estimator_ = self._estimator_fit_one(
        y_t,
        X_t,
        forecasting_horizon,
        time_weight=time_weight,
        sample_weight_alignment=sample_weight_alignment,
        vintage_weight=vintage_weight,
        estimator_fit_params=params,
    )

    return self

Tutorials¶

The following example notebooks use this component:

How to Score Class-Probability Forecasts

Evaluation-Search

Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

View · Open in marimo
How to Run Hyperparameter Search

Evaluation-Search

Tune forecaster hyperparameters with GridSearchCV and RandomizedSearchCV using temporal cross-validation splitters and result scatter visualisation.

View · Open in marimo
How to Forecast Class Probabilities

Forecasting-Models

Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

View · Open in marimo
How to Combine Classification Forecasters

Forecasting-Models

Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

View · Open in marimo
Class-Probability Forecasting

Getting-Started

Forecast air quality categories using ClassProbaReductionForecaster, producing a probability distribution over four WHO air quality classes.

View · Open in marimo
How to Create a Custom Class-Probability Forecaster

Getting-Started

Implement a MajorityClassForecaster from scratch, validate it with the check generator, and compare it against ClassProbaReductionForecaster.

View · Open in marimo
Forecast Visualization

Visualization

Visualise point forecasts from single and multiple models, decomposition pipeline components, and time weight decay functions with interactive Plotly.

View · Open in marimo

ClassProbaReductionForecaster¶

yohou.class_proba.reduction.ClassProbaReductionForecaster ¶

Parameters¶

Attributes¶

Examples¶

Notes¶

See Also¶

Source Code¶

Methods¶

fit(y, X_actual=None, forecasting_horizon=1, time_weight=None, vintage_weight=None, sample_weight_alignment='first_step', X_future=None, X_forecast=None, **params) ¶

Parameters¶

Returns¶

Source Code¶

Tutorials¶

`yohou.class_proba.reduction.ClassProbaReductionForecaster` ¶

`fit(y, X_actual=None, forecasting_horizon=1, time_weight=None, vintage_weight=None, sample_weight_alignment='first_step', X_future=None, X_forecast=None, **params)` ¶