Skip to content

LogLoss

yohou.metrics.class_proba.LogLoss

Bases: BaseClassProbaScorer

Logarithmic loss (cross-entropy) for class-probability forecasts.

Measures the quality of predicted probability distributions by computing the negative log-likelihood of the true class under the predicted distribution.

The log loss for a single observation is:

\[\\text{LogLoss} = -\\frac{1}{n}\\sum_{i=1}^{n}\\log(\\hat{p}_{i,y_i})\]

where \(\\hat{p}_{i,y_i}\) is the predicted probability assigned to the true class \(y_i\) for observation \(i\).

Parameters

Name Type Description Default
aggregation_method list of str or str

Dimensions to aggregate over. See BaseClassProbaScorer.

"all"
groups list of str, dict of str to float, or None

Panel group filter (list) or filter with weights (dict). See BaseClassProbaScorer.

None
components list of str, dict of str to float, or None

Component filter (list) or filter with weights (dict). See BaseClassProbaScorer.

None

Attributes

Name Type Description
lower_is_better bool

Always True for LogLoss.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import LogLoss
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = LogLoss()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
0.312...

Notes

  • Lower values indicate better calibrated probability estimates.
  • Heavily penalizes confident wrong predictions (assigning near-zero probability to the true class).
  • Probabilities are clipped to [1e-15, 1 - 1e-15] to avoid numerical issues with log(0).

See Also

Source Code

Show/Hide source
class LogLoss(BaseClassProbaScorer):
    r"""Logarithmic loss (cross-entropy) for class-probability forecasts.

    Measures the quality of predicted probability distributions by computing
    the negative log-likelihood of the true class under the predicted
    distribution.

    The log loss for a single observation is:

    $$\\text{LogLoss} = -\\frac{1}{n}\\sum_{i=1}^{n}\\log(\\hat{p}_{i,y_i})$$

    where $\\hat{p}_{i,y_i}$ is the predicted probability assigned to the
    true class $y_i$ for observation $i$.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict). See `BaseClassProbaScorer`.

    Attributes
    ----------
    lower_is_better : bool
        Always True for LogLoss.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import LogLoss
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = LogLoss()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)  # doctest: +ELLIPSIS
    0.312...

    Notes
    -----
    - Lower values indicate better calibrated probability estimates.
    - Heavily penalizes confident wrong predictions (assigning near-zero
      probability to the true class).
    - Probabilities are clipped to ``[1e-15, 1 - 1e-15]`` to avoid
      numerical issues with ``log(0)``.

    See Also
    --------
    - [`BrierScore`][yohou.metrics.class_proba.BrierScore] : Multi-class Brier score.
    - [`Accuracy`][yohou.metrics.classification.Accuracy] : Classification accuracy from argmax.

    """

    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
    }

    _metric_name = "log_loss"

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )

    def _compute_raw_errors(self, y_truth, y_pred):
        """Compute per-row log loss values."""
        target_cols = self._extract_target_columns(y_truth)
        scores_dict: dict[str, list[float]] = {}

        for target_col in target_cols:
            proba_cols, class_labels = self._extract_class_proba_columns(y_pred, target_col)
            true_labels = y_truth[target_col].cast(pl.String)

            per_row_scores = []
            for row_idx in range(len(y_truth)):
                true_label = true_labels[row_idx]
                label_idx = class_labels.index(true_label) if true_label in class_labels else None
                if label_idx is not None:
                    prob = float(y_pred[proba_cols[label_idx]][row_idx])
                    prob = np.clip(prob, 1e-15, 1 - 1e-15)
                    per_row_scores.append(-np.log(prob))
                else:
                    per_row_scores.append(-np.log(1e-15))

            scores_dict[target_col] = per_row_scores

        return pl.DataFrame(scores_dict)

Tutorials

The following example notebooks use this component:

  • How to Score Class-Probability Forecasts


    Evaluation-Search

    Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.

    View · Open in marimo

  • How to Forecast Class Probabilities


    Forecasting-Models

    Use ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.

    View · Open in marimo

  • How to Combine Classification Forecasters


    Forecasting-Models

    Build classification ensembles with VotingClassProbaForecaster using soft and hard voting strategies.

    View · Open in marimo