plot_score_summary¶

`yohou.plotting.evaluation.plot_score_summary(scorer, y_truth, y_pred, *, color_palette=None, show_legend=True, title=None, x_label=None, y_label=None, width=None, height=None, bar_opacity=0.85, sort_ascending=None, text_auto=True)` ¶

Plot a grouped bar chart comparing aggregate scores across models and scorers.

For each combination of scorer and model, compute a single aggregate score and display the results as a grouped bar chart. This is useful for quick model comparison without the per-step detail.

Parameters¶

Name	Type	Description	Default
`scorer`	`BaseScorer or dict[str, BaseScorer]`	Yohou scorer instance. If BaseScorer: single scorer to evaluate. If dict: keys are scorer names, values are scorer instances.	required
`y_truth`	`DataFrame`	Ground truth with `"time"` column.	required
`y_pred`	`DataFrame or dict[str, DataFrame]`	Predictions with `"vintage_time"` and `"time"` columns. If DataFrame: single forecast. If dict: keys are model names, values are prediction DataFrames.	required
`color_palette`	`list[str] \| None`	Custom colour palette.	`None`
`show_legend`	`bool`	Whether to show the legend.	`True`
`title`	`str \| None`	Plot title. Defaults to `"Model Comparison"`.	`None`
`x_label`	`str \| None`	X-axis label. Defaults to `""`.	`None`
`y_label`	`str \| None`	Y-axis label. Defaults to `"Score"`.	`None`
`width`	`int \| None`	Plot width in pixels.	`None`
`height`	`int \| None`	Plot height in pixels.	`None`
`bar_opacity`	`float`	Opacity of bars.	`0.85`
`sort_ascending`	`bool or None`	Sort bars by score value. `True` for ascending, `False` for descending, `None` to keep insertion order.	`None`
`text_auto`	`bool`	Annotate bars with their values.	`True`

Returns¶

Type	Description
`Figure`	Plotly figure object.

Raises¶

Type	Description
`TypeError`	If y_truth or y_pred is not a Polars DataFrame.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import MeanAbsoluteError
>>> from yohou.plotting import plot_score_summary

>>> y_truth = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 5,
...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
...     "value": [12.0, 19.0, 28.0, 42.0, 48.0],
... })

>>> fig = plot_score_summary(MeanAbsoluteError(), y_truth, y_pred)
>>> len(fig.data) >= 1
True

Source Code¶

View on GitHub

Show/Hide sourcedef plot_score_summary(
    scorer: BaseScorer | dict[str, BaseScorer],
    y_truth: pl.DataFrame,
    y_pred: pl.DataFrame | dict[str, pl.DataFrame],
    *,
    color_palette: list[str] | None = None,
    show_legend: bool = True,
    title: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    width: int | None = None,
    height: int | None = None,
    bar_opacity: float = 0.85,
    sort_ascending: bool | None = None,
    text_auto: bool = True,
) -> go.Figure:
    """Plot a grouped bar chart comparing aggregate scores across models and scorers.

    For each combination of scorer and model, compute a single aggregate
    score and display the results as a grouped bar chart. This is useful
    for quick model comparison without the per-step detail.

    Parameters
    ----------
    scorer : BaseScorer or dict[str, BaseScorer]
        Yohou scorer instance.

        - If BaseScorer: single scorer to evaluate.
        - If dict: keys are scorer names, values are scorer instances.
    y_truth : pl.DataFrame
        Ground truth with ``"time"`` column.
    y_pred : pl.DataFrame or dict[str, pl.DataFrame]
        Predictions with ``"vintage_time"`` and ``"time"`` columns.

        - If DataFrame: single forecast.
        - If dict: keys are model names, values are prediction DataFrames.
    color_palette : list[str] | None, default=None
        Custom colour palette.
    show_legend : bool, default=True
        Whether to show the legend.
    title : str | None, default=None
        Plot title. Defaults to ``"Model Comparison"``.
    x_label : str | None, default=None
        X-axis label. Defaults to ``""``.
    y_label : str | None, default=None
        Y-axis label. Defaults to ``"Score"``.
    width : int | None, default=None
        Plot width in pixels.
    height : int | None, default=None
        Plot height in pixels.
    bar_opacity : float, default=0.85
        Opacity of bars.
    sort_ascending : bool or None, default=None
        Sort bars by score value. ``True`` for ascending, ``False`` for
        descending, ``None`` to keep insertion order.
    text_auto : bool, default=True
        Annotate bars with their values.

    Returns
    -------
    go.Figure
        Plotly figure object.

    Raises
    ------
    TypeError
        If *y_truth* or *y_pred* is not a Polars DataFrame.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import MeanAbsoluteError
    >>> from yohou.plotting import plot_score_summary

    >>> y_truth = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
    ...     "value": [10.0, 20.0, 30.0, 40.0, 50.0],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 5,
    ...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
    ...     "value": [12.0, 19.0, 28.0, 42.0, 48.0],
    ... })

    >>> fig = plot_score_summary(MeanAbsoluteError(), y_truth, y_pred)
    >>> len(fig.data) >= 1
    True

    See Also
    --------
    [`plot_score_per_step`][yohou.plotting.plot_score_per_step] : Per-step score line/bar chart.
    [`plot_score_time_series`][yohou.plotting.plot_score_time_series] : Score values over time.
    """
    validate_plotting_data(y_truth)
    validate_plotting_params(width=width, height=height)

    y_pred_dict: dict[str, pl.DataFrame] = _normalize_y_pred(y_pred)
    scorer_dict = _normalize_scorers(scorer)

    scorer_names = list(scorer_dict.keys())
    model_names = list(y_pred_dict.keys())

    results: dict[str, dict[str, float]] = {}
    for m_name, y_pred_m in y_pred_dict.items():
        validate_plotting_data(y_pred_m)
        model_scores: dict[str, float] = {}
        for s_name, s_orig in scorer_dict.items():
            s_agg = copy.deepcopy(s_orig)
            s_agg.fit(y_truth)
            score_val = s_agg.score(y_truth, y_pred_m)
            if isinstance(score_val, pl.DataFrame):
                score_cols = [c for c in score_val.columns if c not in _SCORER_META_COLS]
                score_val = float(score_val.select(score_cols).mean_horizontal().mean())  # type: ignore
            model_scores[s_name] = float(score_val)  # type: ignore
        results[m_name] = model_scores

    categories = scorer_names
    series_names = model_names
    series_values = [[results[m].get(s, 0.0) for s in categories] for m in series_names]

    if sort_ascending is not None and series_values:
        order = sorted(
            range(len(categories)),
            key=lambda k: series_values[0][k],
            reverse=not sort_ascending,
        )
        categories = [categories[k] for k in order]
        series_values = [[sv[k] for k in order] for sv in series_values]

    colors = resolve_color_palette(color_palette, len(series_names))
    fig = go.Figure()

    for i, (name, values) in enumerate(zip(series_names, series_values, strict=True)):
        bar_kwargs: dict = {
            "name": name,
            "marker_color": colors[i % len(colors)],
            "opacity": bar_opacity,
            "x": categories,
            "y": values,
        }
        if text_auto:
            bar_kwargs["text"] = [f"{v:.3g}" for v in values]
            bar_kwargs["textposition"] = "outside"
        fig.add_trace(go.Bar(**bar_kwargs))

    fig.update_layout(barmode="group")
    fig = apply_default_layout(
        fig,
        title=title or "Model Comparison",
        x_label=x_label or "",
        y_label=y_label or "Score",
        width=width,
        height=height,
    )
    fig.update_layout(showlegend=show_legend)
    return fig

Tutorials¶

The following example notebooks use this component:

How to Use Point Forecast Metrics

Evaluation-Search

Compare MAE, MAPE, MASE, RMSE, and other point metrics across multiple forecasters with componentwise and groupwise aggregation.

View · Open in marimo
How to Apply Time-Weighted Training

Forecasting-Models

Use time_weight and sample_weight_alignment to emphasise recent or seasonal training samples in PointReductionForecaster, with visualisation of weight curves and alignment strategy comparison.

View · Open in marimo
How to Combine Forecasters with VotingPointForecaster

Forecasting-Models

Build point ensembles with VotingPointForecaster using mean, weighted, and median aggregation strategies.

View · Open in marimo
Forecasting Workflow

Getting-Started

Evaluate forecasters with cross-validation, search hyperparameters with GridSearchCV, and inspect residuals to diagnose model weaknesses.

View · Open in marimo
Direct, Recursive, and MIMO Strategies

Getting-Started

Compare direct, recursive, and MIMO reduction strategies across forecasting horizons to understand the trade-offs for your use case.

View · Open in marimo
How to Visualize Forecast Evaluation Results

Visualization

Use plot_calibration, plot_score_per_step, and plot_forecast to diagnose forecast accuracy and interval calibration visually.

View · Open in marimo