Skip to content

Interval Forecasting

A point forecast produces a single predicted value ("sales will be 150 units tomorrow"). An interval forecast produces a range ("sales will be between 120 and 180 units tomorrow, with 90% probability"). The range is more honest: it acknowledges uncertainty and gives decision-makers the information they need to plan for risk. How wide should you set inventory buffers? How much capacity should you reserve? These questions require knowing not just the expected outcome, but how wrong the forecast might be.

Yohou provides two approaches to interval forecasting: conformal prediction (distribution-free, wraps any point forecaster) and quantile regression (learns interval bounds directly). Both produce prediction intervals at user-specified coverage rates and integrate with yohou's standard fit/predict_interval/observe lifecycle.

Conformal Prediction

Traditional statistical methods (ARIMA, exponential smoothing) produce prediction intervals by assuming the forecast errors follow a known distribution, typically Gaussian. Under this assumption, interval width is derived analytically from the estimated error variance. The advantage is simplicity; the risk is that the assumption is wrong. Heavy-tailed, skewed, or heteroscedastic errors produce intervals that are too narrow and undercover in practice.

Conformal prediction avoids this problem entirely. It is a distribution-free framework that constructs prediction intervals from calibration data alone. The core idea: if you have a collection of past prediction errors, the quantile of those errors tells you how wide the interval needs to be. If the 90th percentile of past absolute errors is 15, then adding and subtracting 15 from a point prediction gives an interval that would have covered 90% of past observations.

The formal guarantee is marginal coverage: over the calibration set, the intervals contain the true value at least as often as the stated coverage rate. This guarantee holds regardless of the underlying data distribution, the forecasting model, or the time series characteristics. The only assumption is exchangeability of the calibration residuals, roughly meaning that the calibration errors are representative of future errors.

In practice, this assumption can be violated when the data distribution shifts over time. Yohou's observe mechanism helps here: as new data arrives, the calibration set is updated incrementally, keeping the conformity scores aligned with recent behavior.

Yohou focuses on distribution-free approaches because they pair naturally with the reduction framework: any sklearn regressor can serve as the base model, and the interval construction does not depend on the regressor's internal assumptions.

Split Conformal Forecasting

SplitConformalForecaster implements the split conformal approach. It divides the training data into two portions: a training set and a calibration set. The wrapped point forecaster trains on the first portion, then generates predictions on the held-out calibration portion. The differences between those predictions and the actual calibration values become the conformity scores that determine interval width.

The process works as follows:

  1. Split: The last calibration_size observations form the calibration set; the remainder is the training set.
  2. Train: The point forecaster fits on the training set.
  3. Calibrate: The fitted forecaster predicts across the calibration set using a rolling observe-predict loop (stride of 1), producing one set of conformity scores per forecast horizon step.
  4. Predict: At inference time, the quantile of the calibration scores at the desired coverage rate sets the interval bounds around the point prediction.
from yohou.point import SeasonalNaive
from yohou.interval import SplitConformalForecaster

forecaster = SplitConformalForecaster(
    point_forecaster=SeasonalNaive(seasonality=7),
    calibration_size=100,
)
forecaster.fit(y=train, forecasting_horizon=7, coverage_rates=[0.9])
intervals = forecaster.predict_interval(coverage_rates=[0.9])

Because the calibration scores are computed per horizon step, step 1 and step 7 predictions can have different interval widths. This reflects the natural behavior that uncertainty grows with the forecast horizon.

An important nuance: the coverage guarantee is marginal, meaning it holds on average across the calibration set. It does not guarantee that any specific individual prediction interval will contain the true value. In regions where the model performs poorly, the actual coverage can be lower; in regions where the model is accurate, it can be higher.

Updating with New Observations

After initial fitting, calling observe() with new data updates the conformity scores before updating the underlying point forecaster. This ordering ensures that the next predict_interval() call reflects both the refreshed calibration set and the updated model. As new observations arrive, the conformity score distribution evolves to reflect recent forecast accuracy, keeping the intervals well-calibrated even as the data changes.

The rewind() method reverses post-fit observations, removing their conformity scores from the calibration set. It will not remove data that was part of the original fit.

Conformity Scorers

The choice of conformity scorer controls how the calibration residuals translate into interval bounds. Different scorers produce intervals with different geometric properties.

Signed residuals: Residual computes \(s = y - \hat{y}\). Because positive and negative errors are preserved separately, the lower and upper quantiles can differ. This produces asymmetric intervals where the point prediction is not necessarily at the center. Asymmetric intervals are appropriate when the error distribution is skewed (for example, when demand forecasts tend to underpredict more than they overpredict). This is the default scorer.

Absolute residuals: AbsoluteResidual computes \(s = |y - \hat{y}|\). A single quantile is added and subtracted from the point prediction, producing symmetric intervals centered on the forecast. This works well when errors are roughly symmetric around zero.

Gamma (relative) residuals: GammaResidual computes \(s = (y - \hat{y}) / (\hat{y} + \epsilon)\). By normalizing the error by the prediction magnitude, this scorer produces intervals that scale with the level of the series. When the target value is large, the interval is wide; when it is small, the interval is narrow. This is the right choice for data with multiplicative seasonality or heteroscedastic variance that grows proportionally with the signal. AbsoluteGammaResidual is the symmetric variant.

Switching scorers requires no changes to the forecaster; pass a different conformity_scorer to the SplitConformalForecaster constructor.

Adaptive Intervals

Standard conformal prediction uses the same set of calibration scores for every prediction, regardless of context. This means the interval width is constant: a prediction during a holiday peak gets the same interval as a prediction on a quiet Tuesday. In many applications, the uncertainty genuinely varies across different conditions.

Similarity-based weighting addresses this. Instead of treating all calibration residuals equally when computing quantiles, it assigns higher weights to calibration points that are "similar" to the current prediction context. The weighted quantile then produces intervals that adapt to local conditions.

Distance Similarity

DistanceSimilarity computes distances between the current prediction context and each calibration point in feature space, then converts distances to weights using a softmax of negative distances:

\[w_{ji} = \frac{\exp(-d(x_j, x_i))}{\sum_k \exp(-d(x_j, x_k))}\]

Calibration points close to the current prediction get exponentially higher weights than distant ones. The distance metric is configurable: euclidean, cityblock, cosine, or any metric supported by scipy.spatial.distance.cdist.

from yohou.interval import SplitConformalForecaster, DistanceSimilarity

forecaster = SplitConformalForecaster(
    similarity=DistanceSimilarity(metric="euclidean"),
    calibration_size=100,
)

Temporal Similarity

TemporalSimilarity captures seasonal patterns by extracting Fourier features (sine and cosine components) from timestamps at specified seasonal periods. Predictions at similar seasonal positions (for example, all Mondays, or all January observations) receive higher calibration weights.

from yohou.interval import SplitConformalForecaster, TemporalSimilarity

forecaster = SplitConformalForecaster(
    similarity=TemporalSimilarity(seasonalities=[7.0, 365.25]),
    calibration_size=100,
)

The seasonalities parameter accepts a list of period lengths. A weekly seasonality of 7.0 groups similar days of the week; an annual seasonality of 365.25 groups similar times of year. The harmonics parameter controls how many sine/cosine pairs are generated per seasonality, allowing finer or coarser seasonal grouping.

Composite Similarity

CompositeSimilarity combines multiple similarity measures into a single weighting scheme. This is useful when both feature-space proximity and temporal proximity matter.

from yohou.interval import (
    SplitConformalForecaster,
    CompositeSimilarity,
    DistanceSimilarity,
    TemporalSimilarity,
)

forecaster = SplitConformalForecaster(
    similarity=CompositeSimilarity(
        similarities=[
            DistanceSimilarity(metric="euclidean"),
            TemporalSimilarity(seasonalities=[7.0]),
        ],
        combination="multiply",
    ),
    calibration_size=100,
)

The combination parameter controls how weight matrices are merged: "multiply" takes the element-wise product (both similarities must agree for a calibration point to receive high weight), while "mean" takes the weighted average. An optional weights list assigns relative importance to each sub-similarity.

Tradeoffs

The tradeoff with all similarity measures is effective sample size. Heavily weighting a few nearby calibration points reduces variance but can increase bias if the local neighborhood is too small. Larger calibration sets help by providing more data points in each local region.

Quantile Reduction Intervals

IntervalReductionForecaster takes a fundamentally different approach. Instead of wrapping a point forecaster and calibrating intervals after the fact, it trains quantile regression models that directly predict the interval bounds. For a coverage rate \(\alpha\), it fits two models: one for the lower quantile and one for the upper quantile:

\[\hat{y}^{\text{lower}} = f_{(1-\alpha)/2}(\mathbf{x}_t), \quad \hat{y}^{\text{upper}} = f_{(1+\alpha)/2}(\mathbf{x}_t)\]

For a 90% coverage rate, this means one model at the 5th percentile (lower bound) and one at the 95th percentile (upper bound).

from yohou.interval import IntervalReductionForecaster

forecaster = IntervalReductionForecaster()
forecaster.fit(y=train, forecasting_horizon=7, coverage_rates=[0.9])
intervals = forecaster.predict_interval(coverage_rates=[0.9])

Reduction Strategies

Like PointReductionForecaster, the interval variant supports three reduction_strategy options:

  • Multi-output ("multi-output", the default): a single model predicts all horizon steps simultaneously per quantile.
  • Direct ("direct"): fits independent models per horizon step per quantile. With n_jobs, these can run in parallel.
  • Dir-rec ("dir-rec"): sequential models where later steps see predictions from earlier steps.

The same tradeoffs apply: multi-output is fastest, direct avoids error accumulation, and dir-rec captures inter-step dependencies. See Reduction Strategies for the full treatment of these approaches.

Multi-Quantile Estimators

By default, IntervalReductionForecaster uses QuantileRegressor from sklearn and fits separate models for each quantile (two per coverage rate). Some gradient boosting libraries support predicting multiple quantiles in a single model, which is significantly faster.

The forecaster automatically detects multi-quantile capability in two cases:

  • CatBoost: when loss_function starts with "MultiQuantile", a single model is trained for all requested quantiles simultaneously.
  • LightGBM: when objective="quantile" with an alpha parameter.

When a multi-quantile estimator is detected, all quantiles for all coverage rates are combined into a single training pass. For example, coverage rates [0.9, 0.95] produce quantiles [0.025, 0.05, 0.5, 0.95, 0.975] in one model rather than ten separate models.

Comparison with Conformal Prediction

Because the quantile models learn the conditional distribution directly from features, their intervals naturally adapt to heteroscedastic data without needing explicit similarity weighting. The disadvantage is that quantile regression does not carry the same finite-sample coverage guarantee as conformal prediction, and its accuracy depends entirely on how well the model captures the conditional quantiles.

Panel Data

Both SplitConformalForecaster and IntervalReductionForecaster support panel data through the panel_strategy parameter:

  • "global" (default): treats each group (entity) independently, fitting separate calibration sets or quantile models per group.
  • "multivariate": pools data across groups, sharing calibration scores or quantile models. This is useful when individual groups have limited data and borrowing strength across entities improves interval quality.

The choice mirrors the panel strategies available in point forecasting. See Panel Data for the full treatment.

Coverage Rates

Coverage rates are specified as floats in the range (0, 1]. Multiple rates can be requested in a single call. Higher coverage rates produce wider intervals: a 95% interval must be wider than a 90% interval to capture more of the distribution.

Coverage rates are first specified at fit() time (defaulting to [0.95] if omitted). At predict_interval() time, you can request different coverage rates without re-fitting. For SplitConformalForecaster, this works because the conformity scores are stored and the quantile computation is applied at prediction time; no re-calibration is needed. For IntervalReductionForecaster, new coverage rates require that the underlying estimator can produce predictions at the corresponding quantile levels.

The right coverage rate reflects the cost asymmetry of the decision. Safety-critical applications (capacity planning, risk management) warrant high coverage because the cost of being caught outside the interval is severe. Low-stakes decisions can tolerate narrower intervals that are more actionable even if they miss more often.

References

  • Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic Learning in a Random World. Springer. (foundational conformal prediction framework)
  • Lei, J., G'Sell, M., Rinaldo, A., Tibshirani, R.J., & Wasserman, L. (2018). Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523), 1094-1111. DOI:10.1080/01621459.2017.1307116
  • Barber, R.F., Candes, E.J., Ramdas, A., & Tibshirani, R.J. (2023). Conformal prediction beyond exchangeability. Annals of Statistics, 51(2), 816-845. DOI:10.1214/23-AOS2276
  • Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice, 3rd edition, Chapter 5.5 (prediction intervals).

Connections

Interval forecasting builds on yohou's Reduction Forecasting foundation. Every interval method either wraps a point forecaster or extends the same reduction machinery. The observe/predict_interval lifecycle mirrors the point forecasting API, so switching between point and interval forecasts requires minimal code changes.

For evaluating interval forecasts, see the interval metrics in Forecast Accuracy. Coverage rate and interval width metrics help diagnose whether intervals are well-calibrated: too narrow means the stated coverage is not achieved, too wide means the intervals are uninformative.

For cross-validation with interval forecasters, the Model Selection tools work with predict_interval in the same way they work with predict.

VotingIntervalForecaster provides an ensemble approach to combining prediction intervals from multiple models. It supports three aggregation methods: averaging bounds, taking medians, or taking the envelope (minimum of lower bounds, maximum of upper bounds for the most conservative intervals). See Ensemble Forecasting for details.

For practical recipes, see How to Forecast with Prediction Intervals. For a hands-on introduction, see the Interval Forecasting Tutorial.

Interactive examples: Conformal Forecasting, Conformity Scorers, and Distance Similarity.