Skip to content

BrierScore

yohou.metrics.class_proba.BrierScore

Bases: BaseClassProbaScorer

Multi-class Brier score for class-probability forecasts.

Measures the mean squared difference between predicted probabilities and one-hot encoded true class labels. Equivalent to the Brier score generalized to multiple classes.

The multi-class Brier score is:

\[\\text{BS} = \\frac{1}{n}\\sum_{i=1}^{n}\\sum_{k=1}^{K}(\\hat{p}_{ik} - o_{ik})^2\]

where \(\\hat{p}_{ik}\) is the predicted probability for class \(k\), \(o_{ik}\) is 1 if class \(k\) is the true class and 0 otherwise, and \(K\) is the number of classes.

Parameters

Name Type Description Default
aggregation_method list of str or str

Dimensions to aggregate over. See BaseClassProbaScorer.

"all"
groups list of str, dict of str to float, or None

Panel group filter (list) or filter with weights (dict). See BaseClassProbaScorer.

None
components list of str, dict of str to float, or None

Component filter (list) or filter with weights (dict). See BaseClassProbaScorer.

None

Attributes

Name Type Description
lower_is_better bool

Always True for BrierScore.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import BrierScore
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = BrierScore()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
0.113...

Notes

  • Ranges from 0 (perfect) to 2 (worst possible for binary).
  • More sensitive to calibration than accuracy.
  • Proper scoring rule: optimized by the true probability distribution.

See Also

  • LogLoss : Logarithmic loss (cross-entropy).
  • Accuracy : Classification accuracy from argmax.

Source Code

Show/Hide source
class BrierScore(BaseClassProbaScorer):
    r"""Multi-class Brier score for class-probability forecasts.

    Measures the mean squared difference between predicted probabilities and
    one-hot encoded true class labels. Equivalent to the Brier score
    generalized to multiple classes.

    The multi-class Brier score is:

    $$\\text{BS} = \\frac{1}{n}\\sum_{i=1}^{n}\\sum_{k=1}^{K}(\\hat{p}_{ik} - o_{ik})^2$$

    where $\\hat{p}_{ik}$ is the predicted probability for class $k$,
    $o_{ik}$ is 1 if class $k$ is the true class and 0 otherwise,
    and $K$ is the number of classes.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.

    Attributes
    ----------
    lower_is_better : bool
        Always True for BrierScore.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import BrierScore
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = BrierScore()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)  # doctest: +ELLIPSIS
    0.113...

    Notes
    -----
    - Ranges from 0 (perfect) to 2 (worst possible for binary).
    - More sensitive to calibration than accuracy.
    - Proper scoring rule: optimized by the true probability distribution.

    See Also
    --------
    - [`LogLoss`][yohou.metrics.class_proba.LogLoss] : Logarithmic loss (cross-entropy).
    - [`Accuracy`][yohou.metrics.classification.Accuracy] : Classification accuracy from argmax.

    """

    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
    }

    _metric_name = "brier_score"

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )

    def _compute_raw_errors(self, y_truth, y_pred):
        """Compute per-row Brier score values."""
        target_cols = self._extract_target_columns(y_truth)
        scores_dict: dict[str, list[float]] = {}

        for target_col in target_cols:
            proba_cols, class_labels = self._extract_class_proba_columns(y_pred, target_col)
            true_labels = y_truth[target_col].cast(pl.String)

            per_row_scores = []
            for row_idx in range(len(y_truth)):
                true_label = true_labels[row_idx]
                row_score = 0.0
                for k, label in enumerate(class_labels):
                    prob = float(y_pred[proba_cols[k]][row_idx])
                    indicator = 1.0 if label == true_label else 0.0
                    row_score += (prob - indicator) ** 2
                per_row_scores.append(row_score)

            scores_dict[target_col] = per_row_scores

        return pl.DataFrame(scores_dict)

Tutorials

The following example notebooks use this component:

  • How to Score Class-Probability Forecasts


    Evaluation-Search

    Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

    View · Open in marimo

  • How to Forecast Class Probabilities


    Forecasting-Models

    Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

    View · Open in marimo

  • How to Combine Classification Forecasters


    Forecasting-Models

    Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

    View · Open in marimo

  • Class-Probability Forecasting


    Getting-Started

    Forecast air quality categories using ClassProbaReductionForecaster, producing a probability distribution over four WHO air quality classes.

    View · Open in marimo