Skip to content

plot_score_summary

yohou.plotting.evaluation.plot_score_summary(scorer, y_truth, y_pred, *, color_palette=None, show_legend=True, title=None, x_label=None, y_label=None, width=None, height=None, bar_opacity=0.85, sort_ascending=None, text_auto=True)

Plot a grouped bar chart comparing aggregate scores across models and scorers.

For each combination of scorer and model, compute a single aggregate score and display the results as a grouped bar chart. This is useful for quick model comparison without the per-step detail.

Parameters

Name Type Description Default
scorer BaseScorer or dict[str, BaseScorer]

Yohou scorer instance.

  • If BaseScorer: single scorer to evaluate.
  • If dict: keys are scorer names, values are scorer instances.
required
y_truth DataFrame

Ground truth with "time" column.

required
y_pred DataFrame or dict[str, DataFrame]

Predictions with "vintage_time" and "time" columns.

  • If DataFrame: single forecast.
  • If dict: keys are model names, values are prediction DataFrames.
required
color_palette list[str] | None

Custom colour palette.

None
show_legend bool

Whether to show the legend.

True
title str | None

Plot title. Defaults to "Model Comparison".

None
x_label str | None

X-axis label. Defaults to "".

None
y_label str | None

Y-axis label. Defaults to "Score".

None
width int | None

Plot width in pixels.

None
height int | None

Plot height in pixels.

None
bar_opacity float

Opacity of bars.

0.85
sort_ascending bool or None

Sort bars by score value. True for ascending, False for descending, None to keep insertion order.

None
text_auto bool

Annotate bars with their values.

True

Returns

Type Description
Figure

Plotly figure object.

Raises

Type Description
TypeError

If y_truth or y_pred is not a Polars DataFrame.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import MeanAbsoluteError
>>> from yohou.plotting import plot_score_summary
>>> y_truth = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 5,
...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
...     "value": [12.0, 19.0, 28.0, 42.0, 48.0],
... })
>>> fig = plot_score_summary(MeanAbsoluteError(), y_truth, y_pred)
>>> len(fig.data) >= 1
True

See Also

plot_score_per_step : Per-step score line/bar chart. plot_score_time_series : Score values over time.

Source Code

Show/Hide source
def plot_score_summary(
    scorer: BaseScorer | dict[str, BaseScorer],
    y_truth: pl.DataFrame,
    y_pred: pl.DataFrame | dict[str, pl.DataFrame],
    *,
    color_palette: list[str] | None = None,
    show_legend: bool = True,
    title: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    width: int | None = None,
    height: int | None = None,
    bar_opacity: float = 0.85,
    sort_ascending: bool | None = None,
    text_auto: bool = True,
) -> go.Figure:
    """Plot a grouped bar chart comparing aggregate scores across models and scorers.

    For each combination of scorer and model, compute a single aggregate
    score and display the results as a grouped bar chart. This is useful
    for quick model comparison without the per-step detail.

    Parameters
    ----------
    scorer : BaseScorer or dict[str, BaseScorer]
        Yohou scorer instance.

        - If BaseScorer: single scorer to evaluate.
        - If dict: keys are scorer names, values are scorer instances.
    y_truth : pl.DataFrame
        Ground truth with ``"time"`` column.
    y_pred : pl.DataFrame or dict[str, pl.DataFrame]
        Predictions with ``"vintage_time"`` and ``"time"`` columns.

        - If DataFrame: single forecast.
        - If dict: keys are model names, values are prediction DataFrames.
    color_palette : list[str] | None, default=None
        Custom colour palette.
    show_legend : bool, default=True
        Whether to show the legend.
    title : str | None, default=None
        Plot title. Defaults to ``"Model Comparison"``.
    x_label : str | None, default=None
        X-axis label. Defaults to ``""``.
    y_label : str | None, default=None
        Y-axis label. Defaults to ``"Score"``.
    width : int | None, default=None
        Plot width in pixels.
    height : int | None, default=None
        Plot height in pixels.
    bar_opacity : float, default=0.85
        Opacity of bars.
    sort_ascending : bool or None, default=None
        Sort bars by score value. ``True`` for ascending, ``False`` for
        descending, ``None`` to keep insertion order.
    text_auto : bool, default=True
        Annotate bars with their values.

    Returns
    -------
    go.Figure
        Plotly figure object.

    Raises
    ------
    TypeError
        If *y_truth* or *y_pred* is not a Polars DataFrame.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import MeanAbsoluteError
    >>> from yohou.plotting import plot_score_summary

    >>> y_truth = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
    ...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 5,
    ...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
    ...     "value": [12.0, 19.0, 28.0, 42.0, 48.0],
    ... })

    >>> fig = plot_score_summary(MeanAbsoluteError(), y_truth, y_pred)
    >>> len(fig.data) >= 1
    True

    See Also
    --------
    [`plot_score_per_step`][yohou.plotting.plot_score_per_step] : Per-step score line/bar chart.
    [`plot_score_time_series`][yohou.plotting.plot_score_time_series] : Score values over time.
    """
    validate_plotting_data(y_truth)
    validate_plotting_params(width=width, height=height)

    y_pred_dict: dict[str, pl.DataFrame] = _normalize_y_pred(y_pred)
    scorer_dict = _normalize_scorers(scorer)

    scorer_names = list(scorer_dict.keys())
    model_names = list(y_pred_dict.keys())

    results: dict[str, dict[str, float]] = {}
    for m_name, y_pred_m in y_pred_dict.items():
        validate_plotting_data(y_pred_m)
        model_scores: dict[str, float] = {}
        for s_name, s_orig in scorer_dict.items():
            s_agg = copy.deepcopy(s_orig)
            s_agg.fit(y_truth)
            score_val = s_agg.score(y_truth, y_pred_m)
            if isinstance(score_val, pl.DataFrame):
                score_cols = [c for c in score_val.columns if c not in _SCORER_META_COLS]
                score_val = float(score_val.select(score_cols).mean_horizontal().mean())  # type: ignore
            model_scores[s_name] = float(score_val)  # type: ignore
        results[m_name] = model_scores

    categories = scorer_names
    series_names = model_names
    series_values = [[results[m].get(s, 0.0) for s in categories] for m in series_names]

    if sort_ascending is not None and series_values:
        order = sorted(
            range(len(categories)),
            key=lambda k: series_values[0][k],
            reverse=not sort_ascending,
        )
        categories = [categories[k] for k in order]
        series_values = [[sv[k] for k in order] for sv in series_values]

    colors = resolve_color_palette(color_palette, len(series_names))
    fig = go.Figure()

    for i, (name, values) in enumerate(zip(series_names, series_values, strict=True)):
        bar_kwargs: dict = {
            "name": name,
            "marker_color": colors[i % len(colors)],
            "opacity": bar_opacity,
            "x": categories,
            "y": values,
        }
        if text_auto:
            bar_kwargs["text"] = [f"{v:.3g}" for v in values]
            bar_kwargs["textposition"] = "outside"
        fig.add_trace(go.Bar(**bar_kwargs))

    fig.update_layout(barmode="group")
    fig = apply_default_layout(
        fig,
        title=title or "Model Comparison",
        x_label=x_label or "",
        y_label=y_label or "Score",
        width=width,
        height=height,
    )
    fig.update_layout(showlegend=show_legend)
    return fig

Tutorials

The following example notebooks use this component:

  • How to Use Point Forecast Metrics


    Evaluation-Search

    Compare MAE, MAPE, MASE, RMSE, and other point metrics across multiple forecasters with componentwise and groupwise aggregation.

    View · Open in marimo

  • How to Apply Time-Weighted Training


    Forecasting-Models

    Use time_weight and sample_weight_alignment to emphasise recent or seasonal training samples in PointReductionForecaster, with visualisation of weight curves and alignment strategy comparison.

    View · Open in marimo

  • How to Combine Forecasters with VotingPointForecaster


    Forecasting-Models

    Build point ensembles with VotingPointForecaster using mean, weighted, and median aggregation strategies.

    View · Open in marimo

  • Forecasting Workflow


    Getting-Started

    Evaluate forecasters with cross-validation, search hyperparameters with GridSearchCV, and inspect residuals to diagnose model weaknesses.

    View · Open in marimo

  • Direct, Recursive, and MIMO Strategies


    Getting-Started

    Compare direct, recursive, and MIMO reduction strategies across forecasting horizons to understand the trade-offs for your use case.

    View · Open in marimo

  • How to Visualize Forecast Evaluation Results


    Visualization

    Use plot_calibration, plot_score_per_step, and plot_forecast to diagnose forecast accuracy and interval calibration visually.

    View · Open in marimo