plot_score_time_series¶

`yohou.plotting.evaluation.plot_score_time_series(scorer, y_truth, y_pred, *, time_weight=None, step_weight=None, vintage_weight=None, compare_by='scorer', columns=None, groups=None, facet_by='member', facet_n_cols=2, color_palette=None, show_legend=True, title=None, x_label=None, y_label=None, width=None, height=None, connect_gaps=False, resampler=None, line_width=2.0, line_dash='solid', line_opacity=1.0, show_markers=False)` ¶

Plot scorer values over time for one or more forecasts.

Evaluates forecast quality at each timestep by computing the scorer with componentwise aggregation, then plots the resulting score time series. Useful for identifying periods where forecast performance varies.

Parameters¶

Name	Type	Description	Default
`scorer`	`BaseScorer or dict[str, BaseScorer]`	Yohou scorer instance (e.g., MeanAbsoluteError, RootMeanSquaredError). Will be cloned and configured with aggregation_method="componentwise". If BaseScorer: single scorer to evaluate. If dict: keys are scorer names, values are scorer instances. When combined with dict `y_pred`, the `compare_by` parameter controls which dimension is faceted vs overlaid.	required
`y_truth`	`DataFrame`	Ground truth values with 'time' column.	required
`y_pred`	`DataFrame or dict[str, DataFrame]`	Predicted values with 'vintage_time' and 'time' columns. - If DataFrame: single forecast to plot - If dict: multiple forecasts with keys as model names	required
`time_weight`	`callable, pl.DataFrame, dict, or None`	Time weighting function, DataFrame, or dict forwarded to `scorer.score()`. When provided, per-timestep scores are weighted before being plotted.	`None`
`step_weight`	`callable, pl.DataFrame, dict, or None`	Per-step weights forwarded to `scorer.score()`.	`None`
`vintage_weight`	`callable, pl.DataFrame, dict, or None`	Per-vintage weights forwarded to `scorer.score()`.	`None`
`compare_by`	`str`	When both `scorer` and `y_pred` are dicts, controls which dimension is overlaid (colored lines) vs faceted (subplots): `"scorer"`: overlay scorers, facet by model. `"model"`: overlay models, facet by scorer. Ignored when either `scorer` or `y_pred` is not a dict.	`"scorer"`
`columns`	`str \| list[str] \| None`	Target column name(s) to include in the score. When groups is set, acts as a member postfix filter (e.g. `"a"` selects `group__a`). When `None`, all score columns are used.	`None`
`groups`	`list[str] \| None`	Panel group prefixes for faceted subplots. When provided, each group gets its own subplot showing the score time series for that group. Groups are resolved via `inspect_panel` against y_truth.	`None`
`facet_by`	`Literal['group', 'member', 'vintage'] \| None`	Faceting axis for panel data or vintage data. `"group"` creates one subplot per group, `"member"` one per member. `"vintage"` creates one subplot per `vintage_time` value found in y_pred, showing how score evolves across forecast origins. `None` disables faceting. `"group"` and `"member"` are ignored for non-panel data.	`"member"`
`facet_n_cols`	`int`	Number of columns in the facet grid when groups is used.	`2`
`color_palette`	`list[str] \| None`	Custom color palette as hex codes. If None, uses yohou palette.	`None`
`show_legend`	`bool`	Whether to show legend when plotting multiple forecasts.	`True`
`title`	`str \| None`	Plot title. If None, generates title from scorer name.	`None`
`x_label`	`str \| None`	X-axis label. Defaults to "time".	`None`
`y_label`	`str \| None`	Y-axis label. If None, uses scorer class name.	`None`
`width`	`int \| None`	Plot width in pixels.	`None`
`height`	`int \| None`	Plot height in pixels.	`None`
`connect_gaps`	`bool`	Whether to connect gaps in the data with lines.	`False`
`resampler`	`bool \| Literal['widget'] \| None`	Enable plotly-resampler for large datasets. `True` or `"widget"` creates a `FigureWidgetResampler`; `False` or `None` uses a plain `go.Figure`.	`None`
`line_width`	`float`	Width of score lines.	`2.0`
`line_dash`	`str`	Dash style of score lines.	`"solid"`
`line_opacity`	`float`	Opacity of score lines.	`1.0`
`show_markers`	`bool`	Whether to show markers on the lines.	`False`

Returns¶

Type	Description
`Figure`	Plotly figure object.

Raises¶

Type	Description
`TypeError`	If y_truth or y_pred is not a Polars DataFrame.
`ValueError`	If DataFrames are empty or missing required columns.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import MeanAbsoluteError
>>> from yohou.plotting import plot_score_time_series

>>> # Create sample data
>>> y_truth = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "value": [10.0, 20.0, 30.0],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "value": [12.0, 19.0, 28.0],
... })

>>> # Plot score time series for single forecast
>>> scorer = MeanAbsoluteError()
>>> fig = plot_score_time_series(scorer, y_truth, y_pred)
>>> len(fig.data)
1

>>> # Plot multiple forecasts
>>> y_pred2 = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "value": [11.0, 21.0, 29.0],
... })
>>> fig = plot_score_time_series(scorer, y_truth, {"Model A": y_pred, "Model B": y_pred2})
>>> len(fig.data)
2

Notes¶

Scorer is automatically configured with aggregation_method="componentwise"
For interval scorers, use aggregation_method=["componentwise", "coveragewise"]
Requires scorer to support componentwise aggregation
All scores are computed independently at each timestep
Use facet_by="vintage" to compare score curves across forecast origins (requires y_pred with multiple vintage_time values)

Source Code¶

View on GitHub

Show/Hide sourcedef plot_score_time_series(
    scorer: BaseScorer | dict[str, BaseScorer],
    y_truth: pl.DataFrame,
    y_pred: pl.DataFrame | dict[str, pl.DataFrame],
    *,
    time_weight: Callable | pl.DataFrame | dict | None = None,
    step_weight: Callable | pl.DataFrame | dict | None = None,
    vintage_weight: Callable | pl.DataFrame | dict | None = None,
    compare_by: Literal["scorer", "model"] = "scorer",
    columns: str | list[str] | None = None,
    groups: list[str] | None = None,
    facet_by: Literal["group", "member", "vintage"] | None = "member",
    facet_n_cols: int = 2,
    color_palette: list[str] | None = None,
    show_legend: bool = True,
    title: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    width: int | None = None,
    height: int | None = None,
    connect_gaps: bool = False,
    resampler: bool | Literal["widget"] | None = None,
    line_width: float = 2.0,
    line_dash: str = "solid",
    line_opacity: float = 1.0,
    show_markers: bool = False,
) -> go.Figure:
    """Plot scorer values over time for one or more forecasts.

    Evaluates forecast quality at each timestep by computing the scorer with
    componentwise aggregation, then plots the resulting score time series.
    Useful for identifying periods where forecast performance varies.

    Parameters
    ----------
    scorer : BaseScorer or dict[str, BaseScorer]
        Yohou scorer instance (e.g., MeanAbsoluteError, RootMeanSquaredError).
        Will be cloned and configured with aggregation_method="componentwise".

        - If BaseScorer: single scorer to evaluate.
        - If dict: keys are scorer names, values are scorer instances.
          When combined with dict ``y_pred``, the ``compare_by`` parameter
          controls which dimension is faceted vs overlaid.
    y_truth : pl.DataFrame
        Ground truth values with 'time' column.
    y_pred : pl.DataFrame or dict[str, pl.DataFrame]
        Predicted values with 'vintage_time' and 'time' columns.
        - If DataFrame: single forecast to plot
        - If dict: multiple forecasts with keys as model names
    time_weight : callable, pl.DataFrame, dict, or None, default=None
        Time weighting function, DataFrame, or dict forwarded to
        ``scorer.score()``.  When provided, per-timestep scores are
        weighted before being plotted.
    step_weight : callable, pl.DataFrame, dict, or None, default=None
        Per-step weights forwarded to ``scorer.score()``.
    vintage_weight : callable, pl.DataFrame, dict, or None, default=None
        Per-vintage weights forwarded to ``scorer.score()``.
    compare_by : str, default="scorer"
        When both ``scorer`` and ``y_pred`` are dicts, controls which
        dimension is overlaid (colored lines) vs faceted (subplots):

        - ``"scorer"``: overlay scorers, facet by model.
        - ``"model"``: overlay models, facet by scorer.

        Ignored when either ``scorer`` or ``y_pred`` is not a dict.
    columns : str | list[str] | None, default=None
        Target column name(s) to include in the score.  When
        *groups* is set, acts as a member postfix filter
        (e.g. ``"a"`` selects ``group__a``).  When ``None``, all score
        columns are used.
    groups : list[str] | None, default=None
        Panel group prefixes for faceted subplots.  When provided, each
        group gets its own subplot showing the score time series for that
        group.  Groups are resolved via ``inspect_panel`` against
        *y_truth*.
    facet_by : Literal["group", "member", "vintage"] | None, default="member"
        Faceting axis for panel data or vintage data.  ``"group"`` creates
        one subplot per group, ``"member"`` one per member. ``"vintage"``
        creates one subplot per ``vintage_time`` value found in *y_pred*,
        showing how score evolves across forecast origins.
        ``None`` disables faceting. ``"group"`` and ``"member"`` are
        ignored for non-panel data.
    facet_n_cols : int, default=2
        Number of columns in the facet grid when *groups* is
        used.
    color_palette : list[str] | None, default=None
        Custom color palette as hex codes. If None, uses yohou palette.
    show_legend : bool, default=True
        Whether to show legend when plotting multiple forecasts.
    title : str | None, default=None
        Plot title. If None, generates title from scorer name.
    x_label : str | None, default=None
        X-axis label. Defaults to "time".
    y_label : str | None, default=None
        Y-axis label. If None, uses scorer class name.
    width : int | None, default=None
        Plot width in pixels.
    height : int | None, default=None
        Plot height in pixels.
    connect_gaps : bool, default=False
        Whether to connect gaps in the data with lines.
    resampler : bool | Literal["widget"] | None, default=None
        Enable plotly-resampler for large datasets.  ``True`` or
        ``"widget"`` creates a ``FigureWidgetResampler``; ``False`` or
        ``None`` uses a plain ``go.Figure``.
    line_width : float, default=2.0
        Width of score lines.
    line_dash : str, default="solid"
        Dash style of score lines.
    line_opacity : float, default=1.0
        Opacity of score lines.
    show_markers : bool, default=False
        Whether to show markers on the lines.

    Returns
    -------
    go.Figure
        Plotly figure object.

    Raises
    ------
    TypeError
        If y_truth or y_pred is not a Polars DataFrame.
    ValueError
        If DataFrames are empty or missing required columns.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import MeanAbsoluteError
    >>> from yohou.plotting import plot_score_time_series

    >>> # Create sample data
    >>> y_truth = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "value": [10.0, 20.0, 30.0],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "value": [12.0, 19.0, 28.0],
    ... })

    >>> # Plot score time series for single forecast
    >>> scorer = MeanAbsoluteError()
    >>> fig = plot_score_time_series(scorer, y_truth, y_pred)
    >>> len(fig.data)
    1

    >>> # Plot multiple forecasts
    >>> y_pred2 = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "value": [11.0, 21.0, 29.0],
    ... })
    >>> fig = plot_score_time_series(scorer, y_truth, {"Model A": y_pred, "Model B": y_pred2})
    >>> len(fig.data)
    2

    See Also
    --------
    [`plot_residuals`][yohou.plotting.plot_residuals] : Plot residual diagnostics.
    [`plot_forecast`][yohou.plotting.plot_forecast] : Plot forecasts with historical data.

    Notes
    -----
    - Scorer is automatically configured with aggregation_method="componentwise"
    - For interval scorers, use aggregation_method=["componentwise", "coveragewise"]
    - Requires scorer to support componentwise aggregation
    - All scores are computed independently at each timestep
    - Use ``facet_by="vintage"`` to compare score curves across forecast
      origins (requires ``y_pred`` with multiple ``vintage_time`` values)

    """
    # Validate ground truth
    validate_plotting_data(y_truth)
    validate_plotting_params(width=width, height=height)

    # Normalize inputs
    y_pred_dict = _normalize_y_pred(y_pred)
    scorer_dict = _normalize_scorers(scorer)

    # Prepare and fit each scorer for componentwise aggregation
    scorer_cw_dict: dict[str, BaseScorer] = {}
    for s_name, s in scorer_dict.items():
        s_cw = _prepare_scorer_for_componentwise(s)
        s_cw.fit(y_truth)
        scorer_cw_dict[s_name] = s_cw

    n_scorers = len(scorer_cw_dict)
    n_models = len(y_pred_dict)
    multi_scorer = n_scorers > 1

    # Auto-detect panel data
    _, _panel_groups = inspect_panel(y_truth)
    if groups is None and _panel_groups:
        groups = []

    if groups is not None:
        if facet_by == "vintage":
            msg = "facet_by='vintage' cannot be combined with panel groups. Use facet_by='group' or 'member' for panel data."
            raise ValueError(msg)
        if multi_scorer:
            msg = (
                "Multi-scorer is not supported with panel data in plot_score_time_series. Pass a single scorer instead."
            )
            raise ValueError(msg)
        first_key = next(iter(scorer_dict))
        first_scorer = scorer_dict[first_key]
        first_scorer_cw = scorer_cw_dict[first_key]
        colors = resolve_color_palette(color_palette, n_models)
        return _plot_score_time_series_panel(
            scorer_componentwise=first_scorer_cw,
            y_truth=y_truth,
            y_pred_dict=y_pred_dict,
            groups=groups,
            facet_n_cols=facet_n_cols,
            colors=colors,
            scorer=first_scorer,
            show_legend=show_legend,
            title=title,
            x_label=x_label,
            y_label=y_label,
            width=width,
            height=height,
            connect_gaps=connect_gaps,
            time_weight=time_weight,
            step_weight=step_weight,
            vintage_weight=vintage_weight,
            resampler=resampler,
            columns=columns,
            line_width=line_width,
            line_dash=line_dash,
            line_opacity=line_opacity,
            show_markers=show_markers,
        )

    # Case: facet_by="vintage" -> one subplot per vintage_time value
    if facet_by == "vintage":
        if multi_scorer:
            msg = "Multi-scorer is not supported with facet_by='vintage'. Pass a single scorer instead."
            raise ValueError(msg)

        first_key = next(iter(scorer_dict))
        first_scorer = scorer_dict[first_key]
        first_scorer_cw = scorer_cw_dict[first_key]

        # Collect all vintage labels across models
        all_vintages: list = []
        cw_results: dict[str, pl.DataFrame] = {}
        for model_name, y_pred_model in y_pred_dict.items():
            cw_df = _compute_componentwise_scores(
                first_scorer_cw, y_truth, y_pred_model, columns, time_weight, step_weight, vintage_weight
            )
            cw_results[model_name] = cw_df
            if "vintage_time" in cw_df.columns:
                for v in cw_df["vintage_time"].unique().sort().to_list():
                    if v not in all_vintages:
                        all_vintages.append(v)

        if len(all_vintages) < 2:
            msg = (
                "facet_by='vintage' requires predictions with multiple vintages "
                "(multiple vintage_time values). The provided y_pred has a single vintage."
            )
            raise ValueError(msg)

        n_vintages = len(all_vintages)
        n_cols_grid = min(n_vintages, facet_n_cols)
        n_rows = (n_vintages + n_cols_grid - 1) // n_cols_grid
        colors = resolve_color_palette(color_palette, n_models)

        vintage_labels = [str(v) for v in reversed(all_vintages)]
        fig = _create_subplots(
            resampler,
            rows=n_rows,
            cols=n_cols_grid,
            subplot_titles=vintage_labels,
            shared_xaxes=True,
            vertical_spacing=min(0.04, 0.3 / max(n_rows - 1, 1)),
            horizontal_spacing=0.08,
        )

        mode = "lines+markers" if show_markers else "lines"
        legend_tracker = LegendTracker()

        for vintage_idx, vintage_val in enumerate(all_vintages):
            rev_idx = n_vintages - 1 - vintage_idx
            row = rev_idx // n_cols_grid + 1
            col = rev_idx % n_cols_grid + 1

            for model_idx, (model_name, cw_df) in enumerate(cw_results.items()):
                if "vintage_time" not in cw_df.columns:
                    continue
                vintage_df = cw_df.filter(pl.col("vintage_time") == vintage_val)
                if vintage_df.is_empty():
                    continue

                fig.add_trace(
                    go.Scatter(
                        x=vintage_df["time"],
                        y=vintage_df["score"],
                        mode=mode,
                        name=model_name,
                        legendgroup=model_name,
                        showlegend=legend_tracker.should_show(model_name),
                        line={"color": colors[model_idx], "width": line_width, "dash": line_dash},
                        opacity=line_opacity,
                        marker={"size": 6} if show_markers else None,
                        connectgaps=connect_gaps,
                        hovertemplate=_make_hovertemplate(model_name, "Time", "Score", decimals=3),
                    ),
                    row=row,
                    col=col,
                )

        scorer_name = first_scorer.__class__.__name__
        default_title = title or f"{scorer_name} Over Time"
        row_height = 300
        default_height = max(row_height * n_rows, 400)

        fig = apply_default_layout(
            fig,
            title=default_title,
            x_label=x_label or "Time",
            y_label=y_label or scorer_name,
            width=width,
            height=height or default_height,
        )
        fig.update_layout(showlegend=show_legend)
        return fig

    mode = "lines+markers" if show_markers else "lines"

    # Case: multi-scorer AND multi-model -> faceted subplots
    if multi_scorer and n_models > 1:
        _warn_large_grid(n_scorers, n_models)

        if compare_by == "model":
            # Facet by scorer, overlay models
            facet_labels = list(scorer_cw_dict.keys())
            overlay_labels = list(y_pred_dict.keys())
        else:
            # Facet by model, overlay scorers
            facet_labels = list(y_pred_dict.keys())
            overlay_labels = list(scorer_cw_dict.keys())

        n_facets = len(facet_labels)
        n_cols = min(facet_n_cols, n_facets)
        n_rows = (n_facets + n_cols - 1) // n_cols
        colors = resolve_color_palette(color_palette, len(overlay_labels))

        fig = make_subplots(
            rows=n_rows,
            cols=n_cols,
            subplot_titles=facet_labels,
            shared_xaxes=True,
            vertical_spacing=_subplot_spacing(n_rows),
        )

        legend_tracker = LegendTracker()

        for facet_idx, facet_label in enumerate(facet_labels):
            row = facet_idx // n_cols + 1
            col = facet_idx % n_cols + 1

            for overlay_idx, overlay_label in enumerate(overlay_labels):
                if compare_by == "model":
                    s_cw = scorer_cw_dict[facet_label]
                    y_pm = y_pred_dict[overlay_label]
                else:
                    s_cw = scorer_cw_dict[overlay_label]
                    y_pm = y_pred_dict[facet_label]

                cw_df = _compute_componentwise_scores(
                    s_cw, y_truth, y_pm, columns, time_weight, step_weight, vintage_weight
                )

                _add_vintage_traces(
                    fig,
                    cw_df,
                    x_col="time",
                    y_col="score",
                    name=overlay_label,
                    color=colors[overlay_idx],
                    line_width=line_width,
                    line_dash=line_dash,
                    base_opacity=line_opacity,
                    mode=mode,
                    show_markers=show_markers,
                    connect_gaps=connect_gaps,
                    showlegend=legend_tracker.should_show(overlay_label),
                    row=row,
                    col=col,
                )

        default_title = title or "Score Over Time"
        row_height = 300
        default_height = max(row_height * n_rows, 400)

        fig = apply_default_layout(
            fig,
            title=default_title,
            x_label=x_label or "Time",
            y_label=y_label or "Score",
            width=width,
            height=height or default_height,
        )
        fig.update_layout(showlegend=show_legend)
        return fig

    # Case: single figure (single scorer or single model)
    fig = _create_figure(resampler)

    if multi_scorer:
        # Overlay scorers (single model case)
        colors = resolve_color_palette(color_palette, n_scorers)
        y_pred_single = next(iter(y_pred_dict.values()))

        for idx, s_name in enumerate(scorer_cw_dict):
            cw_df = _compute_componentwise_scores(
                scorer_cw_dict[s_name], y_truth, y_pred_single, columns, time_weight, step_weight, vintage_weight
            )
            _add_vintage_traces(
                fig,
                cw_df,
                x_col="time",
                y_col="score",
                name=s_name,
                color=colors[idx],
                line_width=line_width,
                line_dash=line_dash,
                base_opacity=line_opacity,
                mode=mode,
                show_markers=show_markers,
                connect_gaps=connect_gaps,
                showlegend=True,
            )

        default_title = title or "Score Over Time"
        default_y_label = y_label or "Score"
    else:
        # Overlay models (single scorer case - original behavior)
        colors = resolve_color_palette(color_palette, n_models)
        first_scorer_cw = next(iter(scorer_cw_dict.values()))

        for idx, (model_name, y_pred_model) in enumerate(y_pred_dict.items()):
            cw_df = _compute_componentwise_scores(
                first_scorer_cw, y_truth, y_pred_model, columns, time_weight, step_weight, vintage_weight
            )
            _add_vintage_traces(
                fig,
                cw_df,
                x_col="time",
                y_col="score",
                name=model_name,
                color=colors[idx],
                line_width=line_width,
                line_dash=line_dash,
                base_opacity=line_opacity,
                mode=mode,
                show_markers=show_markers,
                connect_gaps=connect_gaps,
                showlegend=True,
            )

        first_scorer = next(iter(scorer_dict.values()))
        scorer_name = first_scorer.__class__.__name__
        default_title = title or f"{scorer_name} Over Time"
        default_y_label = y_label or scorer_name

    fig = apply_default_layout(
        fig,
        title=default_title,
        x_label=x_label or "Time",
        y_label=default_y_label,
        width=width,
        height=height,
    )
    fig.update_layout(showlegend=show_legend)

    return fig

Tutorials¶

The following example notebooks use this component:

How to Wrap Functions as Transformers

Data-Features

Wrap arbitrary polars or numpy operations as sklearn transformers with FunctionTransformer, supporting stateful warmup, inverse transforms, and pipelines.

View · Open in marimo
How to Use Scikit-learn Scalers

Data-Features

Wrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.

View · Open in marimo
How to Score Class-Probability Forecasts

Evaluation-Search

Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

View · Open in marimo
How to Use Conformity Scorers

Evaluation-Search

Compare Residual, AbsoluteResidual, GammaResidual, and AbsoluteGammaResidual conformity scorers with coverage/width analysis and DistanceSimilarity interaction.

View · Open in marimo
How to Evaluate Interval Forecasts

Evaluation-Search

Evaluate prediction intervals with EmpiricalCoverage, IntervalScore, MeanIntervalWidth, PinballLoss, and CalibrationError across coverage levels.

View · Open in marimo
How to Use Point Forecast Metrics

Evaluation-Search

Compare MAE, MAPE, MASE, RMSE, and other point metrics across multiple forecasters with componentwise and groupwise aggregation.

View · Open in marimo
How to Forecast with CatBoost

Forecasting-Models

Plug CatBoostRegressor into PointReductionForecaster as a drop-in sklearn estimator, compare gradient-boosted versus Ridge linear baseline, and demonstrate the direct reduction strategy with tree-based models.

View · Open in marimo
How to Forecast Class Probabilities

Forecasting-Models

Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

View · Open in marimo
How to Combine Forecasters with VotingPointForecaster

Forecasting-Models

Build point ensembles with VotingPointForecaster using mean, weighted, and median aggregation strategies.

View · Open in marimo
Conformal Prediction Intervals

Getting-Started

Build distribution-free prediction intervals with SplitConformalForecaster using calibration holdouts and configurable conformity scoring functions.

View · Open in marimo
Naive Forecasters

Getting-Started

Baseline forecasting (the first portion of the First Forecast tutorial) with SeasonalNaive using different seasonality periods, the observe/predict streaming workflow, and rolling evaluation patterns.

View · Open in marimo
Quickstart

Quickstart

Comprehensive end-to-end tour of yohou beyond the Getting Started tutorials, covering data loading, baseline forecasting, preprocessing pipelines, decomposition, cross-validation search, and interval prediction.

View · Open in marimo
How to Visualize Forecast Evaluation Results

Visualization

Use plot_calibration, plot_score_per_step, and plot_forecast to diagnose forecast accuracy and interval calibration visually.

View · Open in marimo