BaseIntervalScorer¶

`yohou.metrics.base.BaseIntervalScorer` ¶

Bases: BaseScorer

Base class for interval forecast metrics.

Interval forecasters produce prediction intervals. Metrics derived from this class evaluate coverage and width trade-offs.

.. note:: The _response_method attribute indicates which forecaster method produces the predictions that this scorer expects.

Parameters¶

Name	Type	Description	Default
`aggregation_method`	`list of str or str`	Dimensions to collapse when aggregating scores. Orthogonal modes: "stepwise": Collapse the forecasting-step dimension (average across steps). "vintagewise": Collapse the vintage/observed-time dimension. "componentwise": Collapse components, return per-timestep scores. "groupwise": Collapse panel groups (panel data only). "coveragewise": Collapse coverage rates (return average across rates). "all": Collapse all dimensions (returns scalar). Same as `["stepwise", "vintagewise", "componentwise", "groupwise", "coveragewise"]`.	`"all"`
`coverage_rates`	`list of float, dict of float to float, or None`	Coverage rate filter (list) or filter with weights (dict). If None, all coverage rates are included with equal weight.	`None`
`groups`	`list of str, dict of str to float, or None`	Panel group filter (list) or filter with weights (dict). If None, all panel groups are included with equal weight.	`None`
`components`	`list of str, dict of str to float, or None`	Component filter (list) or filter with weights (dict). If None, all components are included with equal weight.	`None`

Source Code¶

View on GitHub

Show/Hide sourceclass BaseIntervalScorer(BaseScorer, metaclass=abc.ABCMeta):
    """Base class for interval forecast metrics.

    Interval forecasters produce prediction intervals. Metrics derived from this
    class evaluate coverage and width trade-offs.

    .. note:: The ``_response_method`` attribute indicates which forecaster
       method produces the predictions that this scorer expects.

    Parameters
    ----------
    aggregation_method : list of str or str, default="all"
        Dimensions to collapse when aggregating scores. Orthogonal modes:

        - "stepwise": Collapse the forecasting-step dimension (average across steps).
        - "vintagewise": Collapse the vintage/observed-time dimension.
        - "componentwise": Collapse components, return per-timestep scores.
        - "groupwise": Collapse panel groups (panel data only).
        - "coveragewise": Collapse coverage rates (return average across rates).
        - "all": Collapse all dimensions (returns scalar). Same as
          ``["stepwise", "vintagewise", "componentwise", "groupwise", "coveragewise"]``.
    coverage_rates : list of float, dict of float to float, or None, default=None
        Coverage rate filter (list) or filter with weights (dict). If None,
        all coverage rates are included with equal weight.
    groups : list of str, dict of str to float, or None, default=None
        Panel group filter (list) or filter with weights (dict). If None,
        all panel groups are included with equal weight.
    components : list of str, dict of str to float, or None, default=None
        Component filter (list) or filter with weights (dict). If None,
        all components are included with equal weight.

    See Also
    --------
    - [`IntervalScore`][yohou.metrics.interval.IntervalScore] : Concrete interval scorer implementation.
    - [`EmpiricalCoverage`][yohou.metrics.interval.EmpiricalCoverage] : Concrete interval scorer implementation.
    - [`BaseIntervalForecaster`][yohou.interval.base.BaseIntervalForecaster] : Produces interval forecasts.

    """

    _response_method: str = "predict_interval"

    _parameter_constraints: dict = {
        **BaseScorer._parameter_constraints,
        "aggregation_method": [
            list,
            StrOptions({"all", "stepwise", "vintagewise", "componentwise", "groupwise", "coveragewise"}),
        ],
        "coverage_rates": [list, dict, None],
    }

    def __init__(
        self,
        aggregation_method: list[str] | str = "all",
        coverage_rates: list[float] | dict[float, float] | None = None,
        groups: list[str] | dict[str, float] | None = None,
        components: list[str] | dict[str, float] | None = None,
    ):
        super().__init__(
            groups=groups,
            components=components,
        )
        self.aggregation_method = aggregation_method
        self.coverage_rates = coverage_rates

    def _validate_coverage_rates(self) -> None:
        """Validate coverage parameter.

        Raises
        ------
        ValueError
            If coverage validation fails.
        TypeError
            If coverage contains non-hashable types.

        """
        coverage_filter = self._filter_keys(self.coverage_rates)
        if coverage_filter is not None:
            if len(coverage_filter) == 0:
                raise ValueError("coverage_rates cannot be empty")

            # Check for hashable types (catch lists, dicts, etc.)
            for i, rate in enumerate(coverage_filter):
                try:
                    hash(rate)
                except TypeError:
                    raise TypeError(
                        f"coverage_rates[{i}] is not hashable (got {type(rate).__name__}). "
                        f"All elements must be numeric (int or float)."
                    ) from None

            # Check all elements are numeric
            if not all(isinstance(rate, int | float) for rate in coverage_filter):
                raise ValueError(
                    f"All elements in coverage_rates must be numeric (int or float), "
                    f"got types: {[type(r).__name__ for r in coverage_filter]}"
                )

            # Check range
            for rate in coverage_filter:
                if not 0 <= rate <= 1:
                    raise ValueError(f"All coverage rates must be between 0 and 1 (inclusive), got {rate}")

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(self, y_train: pl.DataFrame, *, forecaster=None, **params) -> BaseIntervalScorer:
        """Fit the scorer on training data.

        Validates ``coverage_rates``, ``aggregation_method``,
        ``groups``, and ``component_names``.

        Parameters
        ----------
        y_train : pl.DataFrame
            Training target time series with a ``"time"`` column and one or
            more numeric value columns.
        forecaster : BaseForecaster or None, default=None
            If provided, metadata is extracted directly from the fitted
            forecaster instead of being re-inferred from ``y_train``.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        self
            The fitted scorer instance.

        Raises
        ------
        ValueError
            If ``coverage_rates`` are invalid, ``aggregation_method`` contains
            invalid values, or if ``groups`` / ``component_names``
            are not found in ``y_train``.

        """
        # Validate coverage_rates
        self._validate_coverage_rates()

        # Validate interval-specific parameters (aggregation_method with coveragewise)
        valid_methods = {"stepwise", "vintagewise", "componentwise", "groupwise", "coveragewise"}
        self._validate_parameters(
            y_train=y_train,
            aggregation_method=self.aggregation_method,
            valid_aggregation_methods=valid_methods,
        )

        return super().fit(y_train, forecaster=forecaster, **params)

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with scorer-specific attributes.

        """
        tags = super().__sklearn_tags__()
        assert tags.scorer_tags is not None
        tags.scorer_tags.prediction_type = "interval"
        return tags

    @abc.abstractmethod
    def _compute_raw_scores(
        self,
        y_truth: pl.DataFrame,
        y_pred: pl.DataFrame,
        coverage_rates: list[float],
        target_columns: list[str],
    ) -> pl.DataFrame:
        """Compute per-timestep per-component raw scores for each coverage rate.

        Subclasses implement only this method.

        Parameters
        ----------
        y_truth : pl.DataFrame
            Ground truth values (time column already removed).
        y_pred : pl.DataFrame
            Predicted intervals (time column already removed).
        coverage_rates : list of float
            Coverage rates extracted from prediction columns.
        target_columns : list of str
            Target column base names from ground truth.

        Returns
        -------
        pl.DataFrame
            Flat DataFrame with component columns plus a ``coverage_rate`` column.
            Rows = n_timesteps * n_rates.

        """

    def score(
        self,
        y_truth: pl.DataFrame,
        y_pred: pl.DataFrame,
        /,
        time_weight: Callable | pl.DataFrame | dict[datetime | str, float] | None = None,
        step_weight: Callable | pl.DataFrame | dict[int | str, float] | None = None,
        vintage_weight: Callable | pl.DataFrame | dict[datetime | str, float] | None = None,
        **params,
    ) -> float | pl.DataFrame:
        """Compute the interval metric score.

        Template method: validate -> pre-filter zeros -> extract rates/columns
        -> compute raw scores -> apply weights -> aggregate -> rename.

        Parameters
        ----------
        y_truth : pl.DataFrame
            True values with ``"time"`` column.
        y_pred : pl.DataFrame
            Predicted intervals with ``"time"`` column.
        time_weight : callable, pl.DataFrame, dict, or None, default=None
            Time-based evaluation weights. Accepts a callable
            ``f(time_series) -> pl.Series``, a panel-aware callable
            ``f(time_series, group_name) -> pl.Series``, a DataFrame
            with ``"time"`` and ``"weight"`` columns, or a
            ``{datetime_or_str: float}`` dict (``"*"`` key sets default).
        step_weight : callable, pl.DataFrame, dict, or None, default=None
            Per-step weights. Same formats as ``time_weight`` but keyed on
            ``"forecasting_step"``.
        vintage_weight : callable, pl.DataFrame, dict, or None, default=None
            Per-vintage weights. Same formats as ``time_weight`` but keyed
            on ``"vintage_time"``.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        float or pl.DataFrame
            Aggregated metric score.

        """
        check_is_fitted(self, ["_is_fitted"])

        y_truth, y_pred, context = validate_scorer_data(
            self,
            y_truth,
            y_pred,
        )

        # 0. Resolve weights and pre-filter zero-weight rows
        y_truth, y_pred, context, tw, sw, _ = self._pre_filter_zero_weights(
            y_truth,
            y_pred,
            context,
            time_weight,
            step_weight,
            vintage_weight,
        )

        coverage_rates = self._extract_coverage_rates(y_pred)
        target_columns = self._extract_target_columns(y_truth)

        # 1. Compute raw per-timestep per-component per-rate scores (flat DataFrame)
        raw_scores = self._compute_raw_scores(y_truth, y_pred, coverage_rates, target_columns)

        # 2. Apply weights (two-stage, tiled across coverage rates)
        n_rates = raw_scores["coverage_rate"].n_unique() if "coverage_rate" in raw_scores.columns else 1
        raw_scores = self._apply_weights(raw_scores, tw, sw, n_rates=n_rates)

        # 3. Aggregate (includes transform + rename via _aggregate_per_vintage_scores)
        return self._aggregate_scores(raw_scores, context=context)

    def _extract_coverage_rates(self, y_pred: pl.DataFrame) -> list[float]:
        """Extract unique coverage rates from interval prediction columns.

        Parses column names like "value_lower_0.95", "sales__store_1_upper_0.5"
        to extract all unique coverage rates present in the DataFrame.

        Parameters
        ----------
        y_pred : pl.DataFrame
            Interval predictions with columns following pattern
            "{col}_lower_{rate}" or "{col}_upper_{rate}".

        Returns
        -------
        list of float
            Sorted list of unique coverage rates.

        """
        rates = set()
        # Match both global (value_lower_0.95) and panel (sales__store_1_lower_0.95) patterns
        pattern = re.compile(r"^(.+)_(lower|upper)_(\d+\.?\d*)$")

        for col in y_pred.columns:
            match = pattern.match(col)
            if match:
                rate_str = match.group(3)
                rates.add(float(rate_str))

        return sorted(rates)

    def _extract_target_columns(self, y_truth: pl.DataFrame) -> list[str]:
        """Extract target column base names from ground truth.

        Returns non-time column names from the ground truth DataFrame.
        For global data: ["value", "sales"]
        For panel data: ["sales__store_1", "sales__store_2"]

        Parameters
        ----------
        y_truth : pl.DataFrame
            Ground truth with target columns (time columns already removed).

        Returns
        -------
        list of str
            Target column names.

        """
        # After _validate_inputs, time columns are already removed
        return y_truth.columns

Methods¶

`fit(y_train, *, forecaster=None, **params)` ¶

Fit the scorer on training data.

Validates coverage_rates, aggregation_method, groups, and component_names.

Parameters¶

Name	Type	Description	Default
`y_train`	`DataFrame`	Training target time series with a `"time"` column and one or more numeric value columns.	required
`forecaster`	`BaseForecaster or None`	If provided, metadata is extracted directly from the fitted forecaster instead of being re-inferred from `y_train`.	`None`
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`self`	The fitted scorer instance.

Raises¶

Type	Description
`ValueError`	If `coverage_rates` are invalid, `aggregation_method` contains invalid values, or if `groups` / `component_names` are not found in `y_train`.

Source Code¶

View on GitHub

Show/Hide source@_fit_context(prefer_skip_nested_validation=True)
def fit(self, y_train: pl.DataFrame, *, forecaster=None, **params) -> BaseIntervalScorer:
    """Fit the scorer on training data.

    Validates ``coverage_rates``, ``aggregation_method``,
    ``groups``, and ``component_names``.

    Parameters
    ----------
    y_train : pl.DataFrame
        Training target time series with a ``"time"`` column and one or
        more numeric value columns.
    forecaster : BaseForecaster or None, default=None
        If provided, metadata is extracted directly from the fitted
        forecaster instead of being re-inferred from ``y_train``.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    self
        The fitted scorer instance.

    Raises
    ------
    ValueError
        If ``coverage_rates`` are invalid, ``aggregation_method`` contains
        invalid values, or if ``groups`` / ``component_names``
        are not found in ``y_train``.

    """
    # Validate coverage_rates
    self._validate_coverage_rates()

    # Validate interval-specific parameters (aggregation_method with coveragewise)
    valid_methods = {"stepwise", "vintagewise", "componentwise", "groupwise", "coveragewise"}
    self._validate_parameters(
        y_train=y_train,
        aggregation_method=self.aggregation_method,
        valid_aggregation_methods=valid_methods,
    )

    return super().fit(y_train, forecaster=forecaster, **params)

`__sklearn_tags__()` ¶

Get estimator tags.

Returns¶

Type	Description
`Tags`	Estimator tags with scorer-specific attributes.

Source Code¶

View on GitHub

Show/Hide sourcedef __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with scorer-specific attributes.

    """
    tags = super().__sklearn_tags__()
    assert tags.scorer_tags is not None
    tags.scorer_tags.prediction_type = "interval"
    return tags

`score(y_truth, y_pred, /, time_weight=None, step_weight=None, vintage_weight=None, **params)` ¶

Compute the interval metric score.

Template method: validate -> pre-filter zeros -> extract rates/columns -> compute raw scores -> apply weights -> aggregate -> rename.

Parameters¶

Name	Type	Description	Default
`y_truth`	`DataFrame`	True values with `"time"` column.	required
`y_pred`	`DataFrame`	Predicted intervals with `"time"` column.	required
`time_weight`	`callable, pl.DataFrame, dict, or None`	Time-based evaluation weights. Accepts a callable `f(time_series) -> pl.Series`, a panel-aware callable `f(time_series, group_name) -> pl.Series`, a DataFrame with `"time"` and `"weight"` columns, or a `{datetime_or_str: float}` dict (`"*"` key sets default).	`None`
`step_weight`	`callable, pl.DataFrame, dict, or None`	Per-step weights. Same formats as `time_weight` but keyed on `"forecasting_step"`.	`None`
`vintage_weight`	`callable, pl.DataFrame, dict, or None`	Per-vintage weights. Same formats as `time_weight` but keyed on `"vintage_time"`.	`None`
`**params`	`dict`	Metadata to route to nested estimators.	`{}`

Returns¶

Type	Description
`float or DataFrame`	Aggregated metric score.

Source Code¶

View on GitHub

Show/Hide sourcedef score(
    self,
    y_truth: pl.DataFrame,
    y_pred: pl.DataFrame,
    /,
    time_weight: Callable | pl.DataFrame | dict[datetime | str, float] | None = None,
    step_weight: Callable | pl.DataFrame | dict[int | str, float] | None = None,
    vintage_weight: Callable | pl.DataFrame | dict[datetime | str, float] | None = None,
    **params,
) -> float | pl.DataFrame:
    """Compute the interval metric score.

    Template method: validate -> pre-filter zeros -> extract rates/columns
    -> compute raw scores -> apply weights -> aggregate -> rename.

    Parameters
    ----------
    y_truth : pl.DataFrame
        True values with ``"time"`` column.
    y_pred : pl.DataFrame
        Predicted intervals with ``"time"`` column.
    time_weight : callable, pl.DataFrame, dict, or None, default=None
        Time-based evaluation weights. Accepts a callable
        ``f(time_series) -> pl.Series``, a panel-aware callable
        ``f(time_series, group_name) -> pl.Series``, a DataFrame
        with ``"time"`` and ``"weight"`` columns, or a
        ``{datetime_or_str: float}`` dict (``"*"`` key sets default).
    step_weight : callable, pl.DataFrame, dict, or None, default=None
        Per-step weights. Same formats as ``time_weight`` but keyed on
        ``"forecasting_step"``.
    vintage_weight : callable, pl.DataFrame, dict, or None, default=None
        Per-vintage weights. Same formats as ``time_weight`` but keyed
        on ``"vintage_time"``.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    float or pl.DataFrame
        Aggregated metric score.

    """
    check_is_fitted(self, ["_is_fitted"])

    y_truth, y_pred, context = validate_scorer_data(
        self,
        y_truth,
        y_pred,
    )

    # 0. Resolve weights and pre-filter zero-weight rows
    y_truth, y_pred, context, tw, sw, _ = self._pre_filter_zero_weights(
        y_truth,
        y_pred,
        context,
        time_weight,
        step_weight,
        vintage_weight,
    )

    coverage_rates = self._extract_coverage_rates(y_pred)
    target_columns = self._extract_target_columns(y_truth)

    # 1. Compute raw per-timestep per-component per-rate scores (flat DataFrame)
    raw_scores = self._compute_raw_scores(y_truth, y_pred, coverage_rates, target_columns)

    # 2. Apply weights (two-stage, tiled across coverage rates)
    n_rates = raw_scores["coverage_rate"].n_unique() if "coverage_rate" in raw_scores.columns else 1
    raw_scores = self._apply_weights(raw_scores, tw, sw, n_rates=n_rates)

    # 3. Aggregate (includes transform + rename via _aggregate_per_vintage_scores)
    return self._aggregate_scores(raw_scores, context=context)

BaseIntervalScorer¶

yohou.metrics.base.BaseIntervalScorer ¶

Parameters¶

See Also¶

Source Code¶

Methods¶

fit(y_train, *, forecaster=None, **params) ¶

Parameters¶

Returns¶

Raises¶

Source Code¶

__sklearn_tags__() ¶

Returns¶

Source Code¶

score(y_truth, y_pred, /, time_weight=None, step_weight=None, vintage_weight=None, **params) ¶

Parameters¶

Returns¶

Source Code¶

`yohou.metrics.base.BaseIntervalScorer` ¶

`fit(y_train, *, forecaster=None, **params)` ¶

`__sklearn_tags__()` ¶

`score(y_truth, y_pred, /, time_weight=None, step_weight=None, vintage_weight=None, **params)` ¶