Accuracy¶

`yohou.metrics.classification.Accuracy` ¶

Bases: BaseHardLabelScorer

Categorical accuracy from class-probability forecasts.

Computes the fraction of time steps where the predicted class (argmax of probabilities) matches the true class.

\[\text{Accuracy} = \frac{1}{n}\sum_{i=1}^{n}\mathbb{1}[\hat{y}_i = y_i]\]

where \(\hat{y}_i = \arg\max_k \hat{p}_{ik}\) is the predicted class.

Parameters¶

Name	Type	Description	Default
`aggregation_method`	`list of str or str`	Dimensions to aggregate over. See `BaseClassProbaScorer`.	`"all"`
`groups`	`list of str, dict of str to float, or None`	Panel group filter or filter with weights.	`None`
`components`	`list of str, dict of str to float, or None`	Component filter or filter with weights.	`None`

Attributes¶

Name	Type	Description
`lower_is_better`	`bool`	Always False for accuracy (higher is better).

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics.classification import Accuracy
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = Accuracy()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
1.0

Notes¶

Accuracy uses micro averaging internally: TP / (TP + FP). In multiclass settings each prediction is either a TP for one class or an FP for another, so TP + FP equals the total sample count and TP / (TP + FP) = correct / N.

Source Code¶

View on GitHub

Show/Hide sourceclass Accuracy(BaseHardLabelScorer):
    r"""Categorical accuracy from class-probability forecasts.

    Computes the fraction of time steps where the predicted class (argmax
    of probabilities) matches the true class.

    $$\text{Accuracy} = \frac{1}{n}\sum_{i=1}^{n}\mathbb{1}[\hat{y}_i = y_i]$$

    where $\hat{y}_i = \arg\max_k \hat{p}_{ik}$ is the predicted class.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter or filter with weights.
    components : list of str, dict of str to float, or None, default=None
        Component filter or filter with weights.

    Attributes
    ----------
    lower_is_better : bool
        Always False for accuracy (higher is better).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics.classification import Accuracy
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = Accuracy()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)
    1.0

    Notes
    -----
    Accuracy uses micro averaging internally: TP / (TP + FP). In
    multiclass settings each prediction is either a TP for one class or
    an FP for another, so TP + FP equals the total sample count and
    TP / (TP + FP) = correct / N.

    See Also
    --------
    - [`LogLoss`][yohou.metrics.class_proba.LogLoss] : Logarithmic loss (cross-entropy).
    - [`BrierScore`][yohou.metrics.class_proba.BrierScore] : Multi-class Brier score.
    - [`Precision`][yohou.metrics.classification.Precision] : Precision (positive predictive value).

    """

    # Spread BaseClassProbaScorer constraints directly, skipping
    # BaseHardLabelScorer's average and zero_division additions.
    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
    }

    _metric_name = "accuracy"
    _lower_is_better = False

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            average="micro",
            zero_division=0.0,
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )

    def _compute_metric_from_counts(self, counts: pl.DataFrame) -> pl.DataFrame:
        """Compute accuracy from confusion counts."""
        return counts.select(
            pl
            .when(pl.col("tp") + pl.col("fp") == 0)
            .then(self.zero_division)
            .otherwise(pl.col("tp") / (pl.col("tp") + pl.col("fp")))
            .alias("value")
        )

Tutorials¶

The following example notebooks use this component:

How to Score Class-Probability Forecasts

Evaluation-Search

Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

View · Open in marimo
How to Use Point Forecast Metrics

Evaluation-Search

Compare MAE, MAPE, MASE, RMSE, and other point metrics across multiple forecasters with componentwise and groupwise aggregation.

View · Open in marimo
How to Forecast Class Probabilities

Forecasting-Models

Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

View · Open in marimo
How to Combine Classification Forecasters

Forecasting-Models

Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

View · Open in marimo
Class-Probability Forecasting

Getting-Started

Forecast air quality categories using ClassProbaReductionForecaster, producing a probability distribution over four WHO air quality classes.

View · Open in marimo