Skip to content

EmpiricalCoverage

yohou.metrics.interval.EmpiricalCoverage

Bases: BaseIntervalScorer

Empirical coverage rate for prediction intervals.

Measures the proportion of true values falling within the predicted intervals. A well-calibrated forecaster should achieve coverage close to the nominal rate (e.g., 90% for α=0.9).

The empirical coverage for a given coverage rate α is:

\[\\text{Coverage}(\\alpha) = \\frac{1}{n}\\sum_{i=1}^{n} \\mathbb{1}(y_i \\in [L_i(\\alpha), U_i(\\alpha)])\]

where \(L_i(\\alpha)\) and \(U_i(\\alpha)\) are the lower and upper bounds at coverage \(\\alpha\), \(\\mathbb{1}(\\cdot)\) is the indicator function, and \(n\) is the number of observations.

Parameters

Name Type Description Default
aggregation_method list of str or str

Dimensions to collapse when aggregating scores. Orthogonal modes:

  • "stepwise": Collapse the forecasting-step dimension.
  • "vintagewise": Collapse the vintage/observed-time dimension.
  • "componentwise": Collapse components, return per-timestep scores.
  • "groupwise": Collapse panel groups (panel data only).
  • "coveragewise": Collapse coverage rates (return average coverage).

  • "all": Collapse all dimensions (returns scalar).

"all"
coverage_rates list of float, dict of float to float, or None

Coverage rate filter (list) or filter with weights (dict).

None
groups list of str, dict of str to float, or None

Panel group filter (list) or filter with weights (dict).

None
components list of str, dict of str to float, or None

Component filter (list) or filter with weights (dict).

None

Attributes

Name Type Description
lower_is_better bool

False for coverage (deviations from nominal rate in either direction are bad).

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.metrics import EmpiricalCoverage
>>> y_true = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "value": [10.0, 20.0, 30.0],
... })
>>> y_pred = pl.DataFrame({
...     "vintage_time": [datetime(2019, 12, 31)] * 3,
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
...     "value_lower_0.9": [8.0, 18.0, 28.0],
...     "value_upper_0.9": [12.0, 22.0, 32.0],
... })
>>> coverage = EmpiricalCoverage()
>>> _ = coverage.fit(y_true)
>>> coverage.score(y_true, y_pred)
1.0

Notes

  • Perfect coverage = nominal rate (e.g., 0.9 for 90% intervals)
  • Over-coverage (> nominal) indicates conservative (wide) intervals
  • Under-coverage (< nominal) indicates poor calibration
  • Missing values are excluded from computation

See Also

Source Code

Show/Hide source
class EmpiricalCoverage(BaseIntervalScorer):
    r"""Empirical coverage rate for prediction intervals.

    Measures the proportion of true values falling within the predicted
    intervals. A well-calibrated forecaster should achieve coverage close
    to the nominal rate (e.g., 90% for α=0.9).

    The empirical coverage for a given coverage rate α is:

    $$\\text{Coverage}(\\alpha) = \\frac{1}{n}\\sum_{i=1}^{n} \\mathbb{1}(y_i \\in [L_i(\\alpha), U_i(\\alpha)])$$

    where $L_i(\\alpha)$ and $U_i(\\alpha)$ are the lower and upper bounds at coverage $\\alpha$,
    $\\mathbb{1}(\\cdot)$ is the indicator function, and $n$ is the number of observations.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to collapse when aggregating scores. Orthogonal modes:

        - "stepwise": Collapse the forecasting-step dimension.
        - "vintagewise": Collapse the vintage/observed-time dimension.
        - "componentwise": Collapse components, return per-timestep scores.
        - "groupwise": Collapse panel groups (panel data only).
        - "coveragewise": Collapse coverage rates (return average coverage).

        - "all": Collapse all dimensions (returns scalar).
    coverage_rates : list of float, dict of float to float, or None, default=None
        Coverage rate filter (list) or filter with weights (dict).
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict).
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict).

    Attributes
    ----------
    lower_is_better : bool
        False for coverage (deviations from nominal rate in either direction are bad).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.metrics import EmpiricalCoverage
    >>> y_true = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "value": [10.0, 20.0, 30.0],
    ... })
    >>> y_pred = pl.DataFrame({
    ...     "vintage_time": [datetime(2019, 12, 31)] * 3,
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 3)],
    ...     "value_lower_0.9": [8.0, 18.0, 28.0],
    ...     "value_upper_0.9": [12.0, 22.0, 32.0],
    ... })
    >>> coverage = EmpiricalCoverage()
    >>> _ = coverage.fit(y_true)
    >>> coverage.score(y_true, y_pred)
    1.0

    Notes
    -----
    - Perfect coverage = nominal rate (e.g., 0.9 for 90% intervals)
    - Over-coverage (> nominal) indicates conservative (wide) intervals
    - Under-coverage (< nominal) indicates poor calibration
    - Missing values are excluded from computation

    See Also
    --------
    - [`MeanIntervalWidth`][yohou.metrics.interval.MeanIntervalWidth] : Evaluates interval sharpness
    - [`IntervalScore`][yohou.metrics.interval.IntervalScore] : Combined coverage and sharpness metric
    - [`CalibrationError`][yohou.metrics.interval.CalibrationError] : Aggregate miscalibration metric

    """

    _parameter_constraints: dict = {
        **BaseIntervalScorer._parameter_constraints,
    }

    _metric_name = "coverage"
    _lower_is_better = False

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        coverage_rates: list[float] | dict[float, float] | None = None,
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ) -> None:
        agg_list = aggregation_method
        if aggregation_method == "all":
            agg_list = ["stepwise", "vintagewise", "componentwise", "groupwise", "coveragewise"]

        super().__init__(
            aggregation_method=agg_list,
            coverage_rates=coverage_rates,
            groups=groups,
            components=components,
        )

    def _compute_raw_scores(self, y_truth, y_pred, coverage_rates, target_columns):
        """Compute per-row empirical coverage indicators."""
        frames = []
        for rate in coverage_rates:
            rate_data = {}
            for col in target_columns:
                lower_col = f"{col}_lower_{rate}"
                upper_col = f"{col}_upper_{rate}"
                if lower_col in y_pred.columns and upper_col in y_pred.columns:
                    in_interval = (y_truth[col] >= y_pred[lower_col]) & (y_truth[col] <= y_pred[upper_col])
                    rate_data[col] = in_interval.cast(pl.Float64)
            frames.append(pl.DataFrame(rate_data).with_columns(pl.lit(rate).alias("coverage_rate")))
        return pl.concat(frames)

Tutorials

The following example notebooks use this component:

  • How to Use Conformity Scorers


    Evaluation-Search

    Compare Residual, AbsoluteResidual, GammaResidual, and AbsoluteGammaResidual conformity scorers with coverage/width analysis and DistanceSimilarity interaction.

    View · Open in marimo

  • How to Evaluate Interval Forecasts


    Evaluation-Search

    Evaluate prediction intervals with EmpiricalCoverage, IntervalScore, MeanIntervalWidth, PinballLoss, and CalibrationError across coverage levels.

    View · Open in marimo

  • How to Search Interval Forecaster Hyperparameters


    Evaluation-Search

    Tune interval forecaster parameters directly with interval metrics in GridSearchCV, including mixed point+interval multimetric search.

    View · Open in marimo

  • How to Forecast Intervals with CatBoost Multiquantile


    Forecasting-Models

    Use IntervalReductionForecaster with CatBoost's native multiquantile objective for simultaneous lower and upper bound estimation.

    View · Open in marimo

  • How to Build Interval Forecasts with Reduction


    Forecasting-Models

    Wrap any quantile-capable sklearn estimator with IntervalReductionForecaster to produce calibrated prediction intervals across multiple horizons.

    View · Open in marimo

  • Conformal Prediction Intervals


    Getting-Started

    Build distribution-free prediction intervals with SplitConformalForecaster using calibration holdouts and configurable conformity scoring functions.

    View · Open in marimo

  • Interval Forecasting


    Getting-Started

    Wrap a point forecaster with SplitConformalForecaster to produce 95% prediction intervals with statistical coverage guarantees.

    View · Open in marimo