LogLoss¶

`yohou.metrics.class_proba.LogLoss` ¶

Bases: BaseClassProbaScorer

Logarithmic loss (cross-entropy) for class-probability forecasts.

Measures the quality of predicted probability distributions by computing the negative log-likelihood of the true class under the predicted distribution.

The log loss for a single observation is:

\[\\text{LogLoss} = -\\frac{1}{n}\\sum_{i=1}^{n}\\log(\\hat{p}_{i,y_i})\]

where \(\\hat{p}_{i,y_i}\) is the predicted probability assigned to the true class \(y_i\) for observation \(i\).

Parameters¶

Name	Type	Description	Default
`aggregation_method`	`list of str or str`	Dimensions to aggregate over. See `BaseClassProbaScorer`.	`"all"`
`groups`	`list of str, dict of str to float, or None`	Panel group filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.	`None`
`components`	`list of str, dict of str to float, or None`	Component filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.	`None`

Attributes¶

Name	Type	Description
`lower_is_better`	`bool`	Always True for LogLoss.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import LogLoss
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = LogLoss()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
0.312...

Notes¶

Lower values indicate better calibrated probability estimates.
Heavily penalizes confident wrong predictions (assigning near-zero probability to the true class).
Probabilities are clipped to [1e-15, 1 - 1e-15] to avoid numerical issues with log(0).

Source Code¶

View on GitHub

Show/Hide sourceclass LogLoss(BaseClassProbaScorer):
    r"""Logarithmic loss (cross-entropy) for class-probability forecasts.

    Measures the quality of predicted probability distributions by computing
    the negative log-likelihood of the true class under the predicted
    distribution.

    The log loss for a single observation is:

    $$\\text{LogLoss} = -\\frac{1}{n}\\sum_{i=1}^{n}\\log(\\hat{p}_{i,y_i})$$

    where $\\hat{p}_{i,y_i}$ is the predicted probability assigned to the
    true class $y_i$ for observation $i$.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.

    Attributes
    ----------
    lower_is_better : bool
        Always True for LogLoss.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import LogLoss
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = LogLoss()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)  # doctest: +ELLIPSIS
    0.312...

    Notes
    -----
    - Lower values indicate better calibrated probability estimates.
    - Heavily penalizes confident wrong predictions (assigning near-zero
      probability to the true class).
    - Probabilities are clipped to ``[1e-15, 1 - 1e-15]`` to avoid
      numerical issues with ``log(0)``.

    See Also
    --------
    - [`BrierScore`][yohou.metrics.class_proba.BrierScore] : Multi-class Brier score.
    - [`Accuracy`][yohou.metrics.classification.Accuracy] : Classification accuracy from argmax.

    """

    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
    }

    _metric_name = "log_loss"

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )

    def _compute_raw_errors(self, y_truth, y_pred):
        """Compute per-row log loss values."""
        target_cols = self._extract_target_columns(y_truth)
        scores_dict: dict[str, list[float]] = {}

        for target_col in target_cols:
            proba_cols, class_labels = self._extract_class_proba_columns(y_pred, target_col)
            true_labels = y_truth[target_col].cast(pl.String)

            per_row_scores = []
            for row_idx in range(len(y_truth)):
                true_label = true_labels[row_idx]
                label_idx = class_labels.index(true_label) if true_label in class_labels else None
                if label_idx is not None:
                    prob = float(y_pred[proba_cols[label_idx]][row_idx])
                    prob = np.clip(prob, 1e-15, 1 - 1e-15)
                    per_row_scores.append(-np.log(prob))
                else:
                    per_row_scores.append(-np.log(1e-15))

            scores_dict[target_col] = per_row_scores

        return pl.DataFrame(scores_dict)

Tutorials¶

The following example notebooks use this component:

How to Score Class-Probability Forecasts

Evaluation-Search

Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

View · Open in marimo
How to Forecast Class Probabilities

Forecasting-Models

Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

View · Open in marimo
How to Combine Classification Forecasters

Forecasting-Models

Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

View · Open in marimo