RankedProbabilityScore¶

`yohou.metrics.class_proba.RankedProbabilityScore` ¶

Bases: BaseClassProbaScorer

Ranked Probability Score for class-probability forecasts.

Measures the quality of predicted probability distributions for ordered (ordinal) classes by comparing cumulative probability distributions. Generalizes the Brier score to ordinal multi-class settings by penalizing predictions that place probability mass far from the true class.

The RPS for a single observation is:

\[\text{RPS} = \frac{1}{K-1}\sum_{k=1}^{K-1}\left(\sum_{j=1}^{k}\hat{p}_{j} - \sum_{j=1}^{k}o_{j}\right)^2\]

where \(\hat{p}_j\) is the predicted probability for class \(j\), \(o_j\) is 1 if the true class is \(j\) and 0 otherwise, and \(K\) is the number of classes. The normalization by \(K-1\) follows the standard forecasting convention.

Parameters¶

Name	Type	Description	Default
`class_order`	`list of str or None`	Explicit ordering of class labels for the cumulative sum. When None, classes are ordered by their column order in `y_pred` (i.e. the `{target}_proba_{class}` column order).	`None`
`aggregation_method`	`list of str or str`	Dimensions to aggregate over. See `BaseClassProbaScorer`.	`"all"`
`groups`	`list of str, dict of str to float, or None`	Panel group filter (list) or filter with weights (dict).	`None`
`components`	`list of str, dict of str to float, or None`	Component filter (list) or filter with weights (dict).	`None`

Attributes¶

Name	Type	Description
`lower_is_better`	`bool`	Always True for RPS.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import RankedProbabilityScore
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather": ["sunny", "rainy", "cloudy"],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "weather_proba_sunny": [0.7, 0.1, 0.2],
...     "weather_proba_rainy": [0.2, 0.8, 0.1],
...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
... })
>>> scorer = RankedProbabilityScore()
>>> _ = scorer.fit(y_true)
>>> scorer.score(y_true, y_pred)
0.041...

Notes¶

RPS is a proper scoring rule for ordinal outcomes.
For K=2, RPS equals the Brier score (up to normalization).
Sensitive to the distance between predicted and true class in the ordinal ranking, unlike Brier score which treats all misclassifications equally.
The class_order parameter lets you specify a meaningful ordering for ordinal variables (e.g. ["low", "medium", "high"]).

Source Code¶

View on GitHub

Show/Hide sourceclass RankedProbabilityScore(BaseClassProbaScorer):
    r"""Ranked Probability Score for class-probability forecasts.

    Measures the quality of predicted probability distributions for ordered
    (ordinal) classes by comparing cumulative probability distributions.
    Generalizes the Brier score to ordinal multi-class settings by penalizing
    predictions that place probability mass far from the true class.

    The RPS for a single observation is:

    $$\text{RPS} = \frac{1}{K-1}\sum_{k=1}^{K-1}\left(\sum_{j=1}^{k}\hat{p}_{j} - \sum_{j=1}^{k}o_{j}\right)^2$$

    where $\hat{p}_j$ is the predicted probability for class $j$, $o_j$ is
    1 if the true class is $j$ and 0 otherwise, and $K$ is the number of
    classes. The normalization by $K-1$ follows the standard forecasting
    convention.

    Parameters
    ----------
    class_order : list of str or None, default=None
        Explicit ordering of class labels for the cumulative sum. When
        None, classes are ordered by their column order in ``y_pred``
        (i.e. the ``{target}_proba_{class}`` column order).
    aggregation_method : list of str or str, default="all"
        Dimensions to aggregate over. See `BaseClassProbaScorer`.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict).
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict).

    Attributes
    ----------
    lower_is_better : bool
        Always True for RPS.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import RankedProbabilityScore
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather": ["sunny", "rainy", "cloudy"],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "weather_proba_sunny": [0.7, 0.1, 0.2],
    ...     "weather_proba_rainy": [0.2, 0.8, 0.1],
    ...     "weather_proba_cloudy": [0.1, 0.1, 0.7],
    ... })
    >>> scorer = RankedProbabilityScore()
    >>> _ = scorer.fit(y_true)
    >>> scorer.score(y_true, y_pred)  # doctest: +ELLIPSIS
    0.041...

    Notes
    -----
    - RPS is a proper scoring rule for ordinal outcomes.
    - For K=2, RPS equals the Brier score (up to normalization).
    - Sensitive to the distance between predicted and true class in the
      ordinal ranking, unlike Brier score which treats all misclassifications
      equally.
    - The ``class_order`` parameter lets you specify a meaningful ordering
      for ordinal variables (e.g. ``["low", "medium", "high"]``).

    See Also
    --------
    - [`BrierScore`][yohou.metrics.class_proba.BrierScore] : Brier score (unordered multi-class).
    - [`LogLoss`][yohou.metrics.class_proba.LogLoss] : Logarithmic loss (cross-entropy).

    """

    _parameter_constraints: dict = {
        **BaseClassProbaScorer._parameter_constraints,
        "class_order": [list, None],
    }

    _metric_name = "rps"

    def __init__(
        self,
        class_order: list[str] | None = None,
        aggregation_method: list[str] | str = "all",
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        super().__init__(
            aggregation_method=aggregation_method,
            groups=groups,
            components=components,
        )
        self.class_order = class_order

    def _compute_raw_errors(self, y_truth, y_pred):
        """Compute per-row RPS values."""
        target_cols = self._extract_target_columns(y_truth)
        scores_dict: dict[str, list[float]] = {}

        for target_col in target_cols:
            proba_cols, class_labels = self._extract_class_proba_columns(y_pred, target_col)
            true_labels = y_truth[target_col].cast(pl.String)

            # Determine class order
            if self.class_order is not None:
                order = self.class_order
                # Reorder proba_cols to match class_order
                label_to_col = dict(zip(class_labels, proba_cols, strict=True))
                ordered_cols = [label_to_col[label] for label in order]
                ordered_labels = order
            else:
                ordered_cols = proba_cols
                ordered_labels = class_labels

            k = len(ordered_labels)
            norm = max(k - 1, 1)  # Avoid division by zero for K=1

            # Vectorized computation
            proba_arr = y_pred.select(ordered_cols).to_numpy()  # (n, K)
            true_arr = true_labels.to_numpy().astype(str)
            labels_arr = np.array(ordered_labels)
            one_hot = (true_arr[:, None] == labels_arr[None, :]).astype(np.float64)  # (n, K)

            cum_pred = np.cumsum(proba_arr, axis=1)[:, :-1]  # (n, K-1)
            cum_true = np.cumsum(one_hot, axis=1)[:, :-1]

            scores_dict[target_col] = (np.sum((cum_pred - cum_true) ** 2, axis=1) / norm).tolist()

        return pl.DataFrame(scores_dict)