Skip to content

RankedProbabilityScore

yohou.metrics.class_proba.RankedProbabilityScore

Bases: BaseClassProbaScorer

Ranked Probability Score for class-probability forecasts.

Measures the quality of predicted probability distributions for ordered (ordinal) classes by comparing cumulative probability distributions. Generalizes the Brier score to ordinal multi-class settings by penalizing predictions that place probability mass far from the true class.

The RPS for a single observation is:

\[\text{RPS} = \frac{1}{K-1}\sum_{k=1}^{K-1}\left(\sum_{j=1}^{k}\hat{p}_{j} - \sum_{j=1}^{k}o_{j}\right)^2\]

where \(\hat{p}_j\) is the predicted probability for class \(j\), \(o_j\) is 1 if the true class is \(j\) and 0 otherwise, and \(K\) is the number of classes. The normalization by \(K-1\) follows the standard forecasting convention.

Parameters

Name Type Description Default
class_order list of str or None

Explicit ordering of class labels for the cumulative sum. When None, classes are ordered by their column order in y_pred (i.e. the {target}_proba_{class} column order).

None
aggregation_method list of str or str

Dimensions to aggregate over. See BaseClassProbaScorer.

"all"
groups list of str, dict of str to float, or None

Panel group filter (list) or filter with weights (dict).

None
components list of str, dict of str to float, or None

Component filter (list) or filter with weights (dict).

None

Attributes

Name Type Description
lower_is_better bool

Always True for RPS.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import RankedProbabilityScore
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = RankedProbabilityScore()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
0.041...

Notes

  • RPS is a proper scoring rule for ordinal outcomes.
  • For K=2, RPS equals the Brier score (up to normalization).
  • Sensitive to the distance between predicted and true class in the ordinal ranking, unlike Brier score which treats all misclassifications equally.
  • The class_order parameter lets you specify a meaningful ordering for ordinal variables (e.g. ["low", "medium", "high"]).

See Also

  • BrierScore : Brier score (unordered multi-class).
  • LogLoss : Logarithmic loss (cross-entropy).

Source Code

Show/Hide source
class RankedProbabilityScore(BaseClassProbaScorer):
    r"""Ranked Probability Score for class-probability forecasts.

    Measures the quality of predicted probability distributions for ordered
    (ordinal) classes by comparing cumulative probability distributions.
    Generalizes the Brier score to ordinal multi-class settings by penalizing
    predictions that place probability mass far from the true class.

    The RPS for a single observation is:

    $$\text{RPS} = \frac{1}{K-1}\sum_{k=1}^{K-1}\left(\sum_{j=1}^{k}\hat{p}_{j} - \sum_{j=1}^{k}o_{j}\right)^2$$

    where $\hat{p}_j$ is the predicted probability for class $j$, $o_j$ is
    1 if the true class is $j$ and 0 otherwise, and $K$ is the number of
    classes. The normalization by $K-1$ follows the standard forecasting
    convention.

    Parameters
    ----------
    class_order : list of str or None, default=None
        Explicit ordering of class labels for the cumulative sum. When
        None, classes are ordered by their column order in ``y_pred``
        (i.e. the ``{target}_proba_{class}`` column order).
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict).
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict).

    Attributes
    ----------
    lower_is_better : bool
        Always True for RPS.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import RankedProbabilityScore
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = RankedProbabilityScore()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)  # doctest: +ELLIPSIS
    0.041...

    Notes
    -----
    - RPS is a proper scoring rule for ordinal outcomes.
    - For K=2, RPS equals the Brier score (up to normalization).
    - Sensitive to the distance between predicted and true class in the
      ordinal ranking, unlike Brier score which treats all misclassifications
      equally.
    - The ``class_order`` parameter lets you specify a meaningful ordering
      for ordinal variables (e.g. ``["low", "medium", "high"]``).

    See Also
    --------
    - [`BrierScore`][yohou.metrics.class_proba.BrierScore] : Brier score (unordered multi-class).
    - [`LogLoss`][yohou.metrics.class_proba.LogLoss] : Logarithmic loss (cross-entropy).

    """

    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
        "class_order": [list, None],
    }

    _metric_name = "rps"

    def __init__(
        self,
        class_order: list[str] | None = None,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )
        self.class_order = class_order

    def _compute_raw_errors(self, y_truth, y_pred):
        """Compute per-row RPS values."""
        target_cols = self._extract_target_columns(y_truth)
        scores_dict: dict[str, list[float]] = {}

        for target_col in target_cols:
            proba_cols, class_labels = self._extract_class_proba_columns(y_pred, target_col)
            true_labels = y_truth[target_col].cast(pl.String)

            # Determine class order
            if self.class_order is not None:
                order = self.class_order
                # Reorder proba_cols to match class_order
                label_to_col = dict(zip(class_labels, proba_cols, strict=True))
                ordered_cols = [label_to_col[label] for label in order]
                ordered_labels = order
            else:
                ordered_cols = proba_cols
                ordered_labels = class_labels

            k = len(ordered_labels)
            norm = max(k - 1, 1)  # Avoid division by zero for K=1

            # Vectorized computation
            proba_arr = y_pred.select(ordered_cols).to_numpy()  # (n, K)
            true_arr = true_labels.to_numpy().astype(str)
            labels_arr = np.array(ordered_labels)
            one_hot = (true_arr[:, None] == labels_arr[None, :]).astype(np.float64)  # (n, K)

            cum_pred = np.cumsum(proba_arr, axis=1)[:, :-1]  # (n, K-1)
            cum_true = np.cumsum(one_hot, axis=1)[:, :-1]

            scores_dict[target_col] = (np.sum((cum_pred - cum_true) ** 2, axis=1) / norm).tolist()

        return pl.DataFrame(scores_dict)