BrierScore¶

`yohou.metrics.class_proba.BrierScore` ¶

Bases: BaseClassProbaScorer

Multi-class Brier score for class-probability forecasts.

Measures the mean squared difference between predicted probabilities and one-hot encoded true class labels. Equivalent to the Brier score generalized to multiple classes.

The multi-class Brier score is:

\[\\text{BS} = \\frac{1}{n}\\sum_{i=1}^{n}\\sum_{k=1}^{K}(\\hat{p}_{ik} - o_{ik})^2\]

where \(\\hat{p}_{ik}\) is the predicted probability for class \(k\), \(o_{ik}\) is 1 if class \(k\) is the true class and 0 otherwise, and \(K\) is the number of classes.

Parameters¶

Name	Type	Description	Default
`aggregation_method`	`list of str or str`	Dimensions to aggregate over. See `BaseClassProbaScorer`.	`"all"`
`groups`	`list of str, dict of str to float, or None`	Panel group filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.	`None`
`components`	`list of str, dict of str to float, or None`	Component filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.	`None`

Attributes¶

Name	Type	Description
`lower_is_better`	`bool`	Always True for BrierScore.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import BrierScore
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = BrierScore()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
0.113...

Notes¶

Ranges from 0 (perfect) to 2 (worst possible for binary).
More sensitive to calibration than accuracy.
Proper scoring rule: optimized by the true probability distribution.

Source Code¶

View on GitHub

Show/Hide sourceclass BrierScore(BaseClassProbaScorer):
    r"""Multi-class Brier score for class-probability forecasts.

    Measures the mean squared difference between predicted probabilities and
    one-hot encoded true class labels. Equivalent to the Brier score
    generalized to multiple classes.

    The multi-class Brier score is:

    $$\\text{BS} = \\frac{1}{n}\\sum_{i=1}^{n}\\sum_{k=1}^{K}(\\hat{p}_{ik} - o_{ik})^2$$

    where $\\hat{p}_{ik}$ is the predicted probability for class $k$,
    $o_{ik}$ is 1 if class $k$ is the true class and 0 otherwise,
    and $K$ is the number of classes.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.

    Attributes
    ----------
    lower_is_better : bool
        Always True for BrierScore.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import BrierScore
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = BrierScore()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)  # doctest: +ELLIPSIS
    0.113...

    Notes
    -----
    - Ranges from 0 (perfect) to 2 (worst possible for binary).
    - More sensitive to calibration than accuracy.
    - Proper scoring rule: optimized by the true probability distribution.

    See Also
    --------
    - [`LogLoss`][yohou.metrics.class_proba.LogLoss] : Logarithmic loss (cross-entropy).
    - [`Accuracy`][yohou.metrics.classification.Accuracy] : Classification accuracy from argmax.

    """

    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
    }

    _metric_name = "brier_score"

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )

    def _compute_raw_errors(self, y_truth, y_pred):
        """Compute per-row Brier score values."""
        target_cols = self._extract_target_columns(y_truth)
        scores_dict: dict[str, list[float]] = {}

        for target_col in target_cols:
            proba_cols, class_labels = self._extract_class_proba_columns(y_pred, target_col)
            true_labels = y_truth[target_col].cast(pl.String)

            per_row_scores = []
            for row_idx in range(len(y_truth)):
                true_label = true_labels[row_idx]
                row_score = 0.0
                for k, label in enumerate(class_labels):
                    prob = float(y_pred[proba_cols[k]][row_idx])
                    indicator = 1.0 if label == true_label else 0.0
                    row_score += (prob - indicator) ** 2
                per_row_scores.append(row_score)

            scores_dict[target_col] = per_row_scores

        return pl.DataFrame(scores_dict)

Tutorials¶

The following example notebooks use this component:

How to Score Class-Probability Forecasts

Evaluation-Search

Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

View · Open in marimo
How to Forecast Class Probabilities

Forecasting-Models

Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

View · Open in marimo
How to Combine Classification Forecasters

Forecasting-Models

Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

View · Open in marimo
Class-Probability Forecasting

Getting-Started

Forecast air quality categories using ClassProbaReductionForecaster, producing a probability distribution over four WHO air quality classes.

View · Open in marimo