plot_group_scores¶

`yohou.plotting.evaluation.plot_group_scores(scorer, y_truth, y_pred, *, kind='bar', distribute_by='time', color_palette=None, show_legend=True, title=None, x_label=None, y_label=None, width=None, height=None, bar_opacity=0.85, sort_ascending=None, text_auto=True)` ¶

Plot scores broken down by panel group.

Requires panel data where y_truth has group columns (e.g. group__member). Computes the scorer for each group independently and visualises the results.

Parameters¶

Name	Type	Description	Default
`scorer`	`BaseScorer or dict[str, BaseScorer]`	Yohou scorer instance. If BaseScorer: single scorer to evaluate per group. If dict: keys are scorer names, values are scorer instances.	required
`y_truth`	`DataFrame`	Ground truth panel data with `"time"` and group columns.	required
`y_pred`	`DataFrame or dict[str, DataFrame]`	Predictions. If DataFrame: single forecast. If dict: keys are model names, values are prediction DataFrames.	required
`kind`	`str`	Plot kind: `"bar"` for aggregate bar chart, `"box"` for box-plot distribution of per-record scores, `"heatmap"` for a 2D heatmap of groups vs models/scorers.	`"bar"`
`distribute_by`	`str`	For `kind="box"`, the dimension whose variability the box plot shows: `"time"`: per-timestep score variability. `"vintage"`: per-vintage score variability. `"step"`: per-step score variability. Ignored for `kind="bar"`.	`"time"`
`color_palette`	`list[str] \| None`	Custom colour palette.	`None`
`show_legend`	`bool`	Whether to show the legend.	`True`
`title`	`str \| None`	Plot title.	`None`
`x_label`	`str \| None`	X-axis label. Defaults to `"Group"`.	`None`
`y_label`	`str \| None`	Y-axis label.	`None`
`width`	`int \| None`	Plot width in pixels.	`None`
`height`	`int \| None`	Plot height in pixels.	`None`
`bar_opacity`	`float`	Opacity of bars.	`0.85`
`sort_ascending`	`bool or None`	Sort groups by score. `True` for ascending, `False` for descending, `None` for insertion order.	`None`
`text_auto`	`bool`	Annotate bars with their values (only for `kind="bar"`).	`True`

Returns¶

Type	Description
`Figure`	Plotly figure object.

Raises¶

Type	Description
`ValueError`	If `y_truth` does not contain panel group columns.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import MeanAbsoluteError
>>> from yohou.plotting import plot_group_scores

>>> y_truth = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 4)],
...     "region__east": [10.0, 20.0, 30.0],
...     "region__west": [15.0, 25.0, 35.0],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, i) for i in range(1, 4)],
...     "region__east": [12.0, 19.0, 28.0],
...     "region__west": [14.0, 26.0, 33.0],
... })

>>> fig = plot_group_scores(MeanAbsoluteError(), y_truth, y_pred)
>>> len(fig.data) >= 1
True

Source Code¶

View on GitHub

Show/Hide sourcedef plot_group_scores(
    scorer: BaseScorer | dict[str, BaseScorer],
    y_truth: pl.DataFrame,
    y_pred: pl.DataFrame | dict[str, pl.DataFrame],
    *,
    kind: Literal["bar", "box", "heatmap"] = "bar",
    distribute_by: Literal["time", "vintage", "step"] = "time",
    color_palette: list[str] | None = None,
    show_legend: bool = True,
    title: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    width: int | None = None,
    height: int | None = None,
    bar_opacity: float = 0.85,
    sort_ascending: bool | None = None,
    text_auto: bool = True,
) -> go.Figure:
    """Plot scores broken down by panel group.

    Requires panel data where ``y_truth`` has group columns
    (e.g. ``group__member``). Computes the scorer for each group
    independently and visualises the results.

    Parameters
    ----------
    scorer : BaseScorer or dict[str, BaseScorer]
        Yohou scorer instance.

        - If BaseScorer: single scorer to evaluate per group.
        - If dict: keys are scorer names, values are scorer instances.
    y_truth : pl.DataFrame
        Ground truth panel data with ``"time"`` and group columns.
    y_pred : pl.DataFrame or dict[str, pl.DataFrame]
        Predictions.

        - If DataFrame: single forecast.
        - If dict: keys are model names, values are prediction DataFrames.
    kind : str, default="bar"
        Plot kind: ``"bar"`` for aggregate bar chart, ``"box"`` for
        box-plot distribution of per-record scores, ``"heatmap"`` for
        a 2D heatmap of groups vs models/scorers.
    distribute_by : str, default="time"
        For ``kind="box"``, the dimension whose variability the
        box plot shows:

        - ``"time"``: per-timestep score variability.
        - ``"vintage"``: per-vintage score variability.
        - ``"step"``: per-step score variability.

        Ignored for ``kind="bar"``.
    color_palette : list[str] | None, default=None
        Custom colour palette.
    show_legend : bool, default=True
        Whether to show the legend.
    title : str | None, default=None
        Plot title.
    x_label : str | None, default=None
        X-axis label. Defaults to ``"Group"``.
    y_label : str | None, default=None
        Y-axis label.
    width : int | None, default=None
        Plot width in pixels.
    height : int | None, default=None
        Plot height in pixels.
    bar_opacity : float, default=0.85
        Opacity of bars.
    sort_ascending : bool or None, default=None
        Sort groups by score. ``True`` for ascending, ``False`` for
        descending, ``None`` for insertion order.
    text_auto : bool, default=True
        Annotate bars with their values (only for ``kind="bar"``).

    Returns
    -------
    go.Figure
        Plotly figure object.

    Raises
    ------
    ValueError
        If ``y_truth`` does not contain panel group columns.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import MeanAbsoluteError
    >>> from yohou.plotting import plot_group_scores

    >>> y_truth = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 4)],
    ...     "region__east": [10.0, 20.0, 30.0],
    ...     "region__west": [15.0, 25.0, 35.0],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, i) for i in range(1, 4)],
    ...     "region__east": [12.0, 19.0, 28.0],
    ...     "region__west": [14.0, 26.0, 33.0],
    ... })

    >>> fig = plot_group_scores(MeanAbsoluteError(), y_truth, y_pred)
    >>> len(fig.data) >= 1
    True

    See Also
    --------
    [`plot_score_time_series`][yohou.plotting.plot_score_time_series] : Score over time.
    [`plot_score_per_step`][yohou.plotting.plot_score_per_step] : Score by horizon step.
    """
    validate_plotting_data(y_truth)
    validate_plotting_params(kind=kind, valid_kinds={"bar", "box", "heatmap"}, width=width, height=height)

    y_pred_dict = _normalize_y_pred(y_pred)
    scorer_dict = _normalize_scorers(scorer)

    # Detect panel groups
    _, panel_groups = inspect_panel(y_truth)
    if not panel_groups:
        msg = (
            "y_truth does not contain panel group columns (e.g. 'group__member'). "
            "plot_group_scores requires panel data."
        )
        raise ValueError(msg)

    group_names = sorted(panel_groups)
    n_models = len(y_pred_dict)

    if kind == "bar":
        # Compute aggregate score per group per model per scorer
        # For bar chart: x-axis = groups, one bar per model (for single scorer)
        #                or grouped by scorer x model

        all_entries: list[tuple[str, str, str, float]] = []  # (scorer, model, group, score)

        for s_name, s_orig in scorer_dict.items():
            for m_name, y_pm in y_pred_dict.items():
                validate_plotting_data(y_pm)
                for gname in group_names:
                    g_truth_cols = ["time"] + [c for c in y_truth.columns if c.startswith(f"{gname}__")]
                    g_pred_cols = [
                        c for c in y_pm.columns if c in ("time", "vintage_time") or c.startswith(f"{gname}__")
                    ]
                    y_truth_g = y_truth.select(g_truth_cols)
                    y_pm_g = y_pm.select(g_pred_cols)

                    s_agg = copy.deepcopy(s_orig)
                    s_agg.fit(y_truth_g)
                    score_val = s_agg.score(y_truth_g, y_pm_g)

                    if isinstance(score_val, pl.DataFrame):
                        score_cols = [
                            c for c in score_val.columns if c not in ("time", "vintage_time", "forecasting_step")
                        ]
                        group_score = float(score_val.select(score_cols).mean_horizontal().mean())  # type: ignore
                    else:
                        group_score = float(score_val)  # type: ignore
                    all_entries.append((s_name, m_name, gname, group_score))

        # Build grouped bar chart
        n_scorers = len(scorer_dict)
        multi_scorer = n_scorers > 1

        if multi_scorer and n_models > 1:
            # Series = model x scorer combination
            series_labels = [f"{m} / {s}" for s in scorer_dict for m in y_pred_dict]
        elif multi_scorer:
            series_labels = list(scorer_dict.keys())
        else:
            series_labels = list(y_pred_dict.keys())

        colors = resolve_color_palette(color_palette, len(series_labels))

        # Optional sorting by first series
        if sort_ascending is not None and all_entries:
            first_series_scores = {
                e[2]: e[3]
                for e in all_entries
                if (e[0] == list(scorer_dict.keys())[0] and e[1] == list(y_pred_dict.keys())[0])
            }
            group_names = sorted(
                group_names,
                key=lambda g: first_series_scores.get(g, 0.0),
                reverse=not sort_ascending,
            )

        fig = go.Figure()
        series_idx = 0

        for s_name in scorer_dict:
            for m_name in y_pred_dict:
                values = []
                for gn in group_names:
                    match = [e[3] for e in all_entries if e[0] == s_name and e[1] == m_name and e[2] == gn]
                    values.append(match[0] if match else 0.0)

                if multi_scorer and n_models > 1:
                    label = f"{m_name} / {s_name}"
                elif multi_scorer:
                    label = s_name
                else:
                    label = m_name

                bar_kwargs: dict = {
                    "x": group_names,
                    "y": values,
                    "name": label,
                    "marker_color": colors[series_idx % len(colors)],
                    "opacity": bar_opacity,
                }
                if text_auto:
                    bar_kwargs["text"] = [f"{v:.3g}" for v in values]
                    bar_kwargs["textposition"] = "outside"

                fig.add_trace(go.Bar(**bar_kwargs))
                series_idx += 1

        fig.update_layout(barmode="group")

        if multi_scorer:
            default_title = title or "Group Scores"
            default_y = y_label or "Score"
        else:
            first_scorer = next(iter(scorer_dict.values()))
            scorer_name = first_scorer.__class__.__name__
            default_title = title or f"{scorer_name} by Group"
            default_y = y_label or scorer_name

        fig = apply_default_layout(
            fig,
            title=default_title,
            x_label=x_label or "Group",
            y_label=default_y,
            width=width,
            height=height,
        )
        fig.update_layout(showlegend=show_legend)
        return fig

    if kind == "heatmap":
        # Heatmap: groups on y-axis, models/scorers on x-axis, score as color
        all_entries: list[tuple[str, str, str, float]] = []

        for s_name, s_orig in scorer_dict.items():
            for m_name, y_pm in y_pred_dict.items():
                validate_plotting_data(y_pm)
                for gname in group_names:
                    g_truth_cols = ["time"] + [c for c in y_truth.columns if c.startswith(f"{gname}__")]
                    g_pred_cols = [
                        c for c in y_pm.columns if c in ("time", "vintage_time") or c.startswith(f"{gname}__")
                    ]
                    y_truth_g = y_truth.select(g_truth_cols)
                    y_pm_g = y_pm.select(g_pred_cols)

                    s_agg = copy.deepcopy(s_orig)
                    s_agg.fit(y_truth_g)
                    score_val = s_agg.score(y_truth_g, y_pm_g)

                    if isinstance(score_val, pl.DataFrame):
                        score_cols = [
                            c for c in score_val.columns if c not in ("time", "vintage_time", "forecasting_step")
                        ]
                        group_score = float(score_val.select(score_cols).mean_horizontal().mean())  # type: ignore
                    else:
                        group_score = float(score_val)  # type: ignore
                    all_entries.append((s_name, m_name, gname, group_score))

        n_scorers = len(scorer_dict)
        multi_scorer = n_scorers > 1

        if multi_scorer and n_models > 1:
            series_labels = [f"{m} / {s}" for s in scorer_dict for m in y_pred_dict]
        elif multi_scorer:
            series_labels = list(scorer_dict.keys())
        else:
            series_labels = list(y_pred_dict.keys())

        # Build z-matrix: rows = groups, cols = series
        z_matrix: list[list[float]] = []
        for gname in group_names:
            row: list[float] = []
            for s_name in scorer_dict:
                for m_name in y_pred_dict:
                    match = [e[3] for e in all_entries if e[0] == s_name and e[1] == m_name and e[2] == gname]
                    row.append(match[0] if match else 0.0)
            z_matrix.append(row)

        # Auto-select colorscale direction
        first_scorer = next(iter(scorer_dict.values()))
        lower_better = getattr(first_scorer, "_lower_is_better", True)
        colorscale = "Blues" if lower_better else "Blues_r"

        text_vals = [[f"{v:.3g}" for v in row] for row in z_matrix] if text_auto else None

        fig = go.Figure(
            data=go.Heatmap(
                z=z_matrix,
                x=series_labels,
                y=group_names,
                colorscale=colorscale,
                text=text_vals,
                texttemplate="%{text}" if text_auto else "",
                hovertemplate="Group: %{y}<br>%{x}: %{z:.3g}<extra></extra>",
            )
        )

        scorer_name = first_scorer.__class__.__name__
        default_title = title or ("Group Scores" if multi_scorer else f"{scorer_name} by Group")

        fig = apply_default_layout(
            fig,
            title=default_title,
            x_label=x_label
            or (
                "Model / Scorer"
                if multi_scorer and n_models > 1
                else "Model"
                if n_models > 1
                else "Scorer"
                if multi_scorer
                else "Model"
            ),
            y_label=y_label or "Group",
            width=width,
            height=height,
        )
        fig.update_layout(showlegend=False)
        return fig

    # kind="box": distribution of per-record scores across groups
    # Use componentwise or stepwise/vintagewise aggregation depending on distribute_by
    agg_map = {
        "time": "componentwise",
        "vintage": "stepwise",
        "step": "vintagewise",
    }
    agg_method = agg_map[distribute_by]

    first_scorer = next(iter(scorer_dict.values()))
    first_y_pred = next(iter(y_pred_dict.values()))

    s_cw = copy.deepcopy(first_scorer)
    if isinstance(s_cw, BaseIntervalScorer):
        s_cw.set_params(aggregation_method=[agg_method, "coveragewise"])
    else:
        s_cw.set_params(aggregation_method=agg_method)
    s_cw.fit(y_truth)

    scores_df = s_cw.score(y_truth, first_y_pred)
    if not isinstance(scores_df, pl.DataFrame):
        msg = f"Scorer must return DataFrame for {agg_method} aggregation, got {type(scores_df).__name__}"
        raise TypeError(msg)

    colors = resolve_color_palette(color_palette, len(group_names))
    fig = go.Figure()

    for g_idx, gname in enumerate(group_names):
        g_cols = [c for c in scores_df.columns if c.startswith(f"{gname}__")]
        if not g_cols:
            continue
        if len(g_cols) == 1:
            g_scores = scores_df[g_cols[0]].drop_nulls().to_numpy()
        else:
            g_scores = scores_df.select(g_cols).to_numpy().flatten()
            g_scores = g_scores[~np.isnan(g_scores)]

        fig.add_trace(
            go.Box(
                y=g_scores,
                name=gname,
                marker_color=colors[g_idx % len(colors)],
                boxmean=True,
            )
        )

    scorer_name = first_scorer.__class__.__name__
    fig = apply_default_layout(
        fig,
        title=title or f"{scorer_name} Distribution by Group",
        x_label=x_label or "Group",
        y_label=y_label or scorer_name,
        width=width,
        height=height,
    )
    fig.update_layout(showlegend=show_legend)
    return fig

Tutorials¶

The following example notebooks use this component:

How to Visualize Forecast Evaluation Results

Visualization

Use plot_calibration, plot_score_per_step, and plot_forecast to diagnose forecast accuracy and interval calibration visually.

View · Open in marimo