Skip to content

Forecaster Composition

Yohou provides four classes that compose forecasters into larger forecasting structures. Each component is itself a full forecaster with fit/predict/observe/rewind lifecycle, not a transformer or preprocessing step. They address situations where a single forecaster cannot handle the full problem: additive components in the data, target columns with different dynamics, features that must be forecast before the target, or panel groups with fundamentally different patterns.

All four classes support panel data, integrate with hyperparameter search and cross-validation, and can be nested inside each other or wrapped by ensemble voters.

For composing transformers (feature pipelines, scaling chains, lag features), see Feature Pipelines.

DecompositionPipeline

DecompositionPipeline decomposes a time series into additive components by fitting forecasters in sequence. Each forecaster models the residuals left by all previous forecasters, and the final prediction is the sum of all component predictions:

\[\hat{y}_t = \hat{f}_1(t) + \hat{f}_2(t) + \cdots + \hat{f}_k(t)\]

The forecasters parameter takes a list of (name, forecaster) tuples. All entries must be point forecasters (interval or class probability forecasters are not supported because residuals from probabilistic outputs are not well defined):

from yohou.compose import DecompositionPipeline
from yohou.stationarity import PolynomialTrendForecaster
from yohou.point import SeasonalNaive

pipeline = DecompositionPipeline(forecasters=[
    ("trend", PolynomialTrendForecaster(degree=1)),
    ("seasonality", SeasonalNaive(seasonality=12)),
])

The first forecaster fits the raw data and produces a trend forecast. The second receives the residuals (original minus trend) and models what remains. Ordering matters: placing a trend model first, then a seasonal model, then a residual model follows the classical decompose-forecast-recompose pattern.

Multiplicative decomposition

For multiplicative relationships, pass target_transformer=LogTransformer(). This transforms the target into log-space where multiplication becomes addition, applies the additive pipeline, and back-transforms the result:

\[y_t = f_1(t) \cdot f_2(t) \cdot \varepsilon_t \quad\Rightarrow\quad \log y_t = \log f_1(t) + \log f_2(t) + \log \varepsilon_t\]

Feature transformation

The optional feature_transformer parameter applies a transformer to exogenous features once at the pipeline level before any forecaster receives them. All component forecasters share the same transformed features, so feature preprocessing does not need to be duplicated inside each component.

Diagnostic residuals

Setting store_residuals=True saves the intermediate residuals after each component in pipeline.residuals_, a dictionary mapping forecaster name to a Polars DataFrame. This is useful for inspecting whether a component successfully captured its intended pattern or whether signal remains for downstream components to model.

ColumnForecaster

ColumnForecaster assigns different forecasters to different target columns, then concatenates predictions horizontally. Each entry in the forecasters list is a (name, forecaster, columns) tuple, where columns is a string or list of strings identifying which target columns that forecaster is responsible for.

This is useful when target columns have fundamentally different characteristics. A slow-moving trend variable might work best with a linear model while a volatile signal needs gradient boosting. Forcing a single model to handle both can produce mediocre predictions for each.

Remainder handling

Columns not claimed by any forecaster are handled by the remainder parameter:

  • "drop" (default): unclaimed columns are excluded from predictions.
  • "passthrough": unclaimed columns are passed through unchanged.
  • A forecaster instance: unclaimed columns are forecast by that model.

Each column must appear in exactly one forecaster. Overlapping assignments raise an error.

Exogenous features and forecaster types

All forecasters receive the full exogenous data (X_actual, X_future, X_forecast), but each sees only its assigned target columns in y. This means a feature that is relevant to multiple targets only needs to appear once.

Because ColumnForecaster wraps arbitrary forecasters, it supports point predictions, interval predictions, and class probability predictions. The available methods depend on the capabilities of the inner forecasters. Setting n_jobs enables parallel fitting across column groups.

When verbose_feature_names_out=True, output columns are prefixed with the forecaster name (for example, sales_model__revenue), which avoids ambiguity when multiple forecasters produce columns with the same name.

ForecastedFeatureForecaster

ForecastedFeatureForecaster is a two-stage forecaster for scenarios where exogenous features (X_actual) are available during training but not at prediction time. It chains a feature_forecaster that predicts future feature values with a target_forecaster that uses those predicted features to forecast y. The class requires X_actual at fit time and raises a ValueError if it is not provided.

graph LR
  subgraph fit
    direction TB
    A["X_actual"] --> B["feature_fcstr"]
    B -->|strategy| C["target_fcstr"]
    D["y"] --> C
  end

  subgraph predict
    direction TB
    E["target_fcstr"] --> F["ลท_pred"]
  end

  fit ~~~ predict

The distribution shift problem

The core challenge is a training/prediction mismatch. At prediction time the target forecaster receives forecasted (imperfect) feature values, but during training the real feature values are available. Training on real features and predicting with forecasted ones can degrade accuracy. The strategy parameter controls how this is handled.

Training strategies

"actual" (default) fits the feature forecaster on the full X_actual, then fits the target forecaster on the full y with the real X_actual. This is the simplest approach but creates a distribution mismatch: the target forecaster trains on perfect features and predicts with imperfect ones.

"predicted" splits the training data at position int(len(y) * split_ratio). The feature forecaster trains on the first portion and predicts features for the second. The target forecaster then trains on the second portion using those predicted (imperfect) features. This avoids the distribution shift but sacrifices some training data. The split_ratio parameter (default 0.5) controls the split point; setting it lower gives the target forecaster more training data at the cost of a less accurate feature forecaster.

"rewind" fits the feature forecaster on all data, rewinds it to the observation horizon, then predicts features from the rewind point onward. The target forecaster trains on those predicted features. This approach uses all data for feature learning while still exposing the target forecaster to imperfect features, balancing data efficiency with distribution alignment.

Prediction capabilities

ForecastedFeatureForecaster delegates all prediction calls to the target forecaster. If the target forecaster supports interval predictions or class probability predictions, those methods become available on the composite. The feature forecaster always produces point predictions regardless.

For the data-shaping perspective on exogenous features (the three types X_actual, X_future, X_forecast, and step-indexed columns), see Exogenous Features.

LocalPanelForecaster

LocalPanelForecaster fits a separate clone of a forecaster per panel group rather than a single global model. The input must be panel data (columns with the group__column naming convention). Each clone sees unprefixed, single-series data: a group named store_a with column store_a__sales receives a DataFrame with a plain sales column.

This is appropriate when groups have fundamentally different dynamics (for example, products with unrelated demand patterns) and a global model would blur the distinctions. The trade-off is that each group trains on only its own data, which can be a problem for groups with short histories. Global models share information across groups at the cost of missing group-specific patterns.

Exogenous feature routing

Exogenous features can be panel-specific (prefixed, like store_a__temperature) or global (unprefixed, like holiday_flag). LocalPanelForecaster extracts each group's prefixed columns, strips the prefixes, and combines them with any global columns. Each clone therefore receives a clean, unprefixed feature set tailored to its group.

Parallel fitting and prediction types

Setting n_jobs enables parallel fitting across groups, which is helpful when the number of groups is large. The class supports whatever prediction types the wrapped forecaster supports: point predictions are always available, and interval predictions are available if the inner forecaster provides them.

After fitting, the per-group clones are accessible through the forecasters_ attribute, a dictionary mapping group names to fitted forecaster instances.

State Propagation Through Composite Forecasters

When you call observe() on a composite forecaster, the new data flows through to each sub-component in a pattern that mirrors fit and predict. Understanding this flow helps predict what will happen when new data arrives in a production observe/predict loop.

DecompositionPipeline processes observations in the same order as training. Each forecaster in the chain predicts its component, subtracts it from the incoming data, and passes the residual to the next. This preserves the additive decomposition: calling observe() then predict() produces the same result as re-fitting on the extended data, as long as the components remain stable. The observe_predict() method handles this residual decomposition internally, ensuring rolling evaluation produces correct multi-component predictions.

ColumnForecaster routes each target column to its assigned forecaster. All forecasters receive the full exogenous data, but each observes only its own target columns. Calling observe() independently updates each column's model without cross-contamination.

ForecastedFeatureForecaster chains observations in two stages. The feature forecaster observes X_actual columns as its target. The target forecaster then observes y together with the actual feature values. This maintains the two-stage contract at observation time, not just during initial fitting.

LocalPanelForecaster dispatches observe() to each group's clone with only the rows belonging to that group. Each group maintains independent state, so observing new data for one group does not affect others.

Calling rewind() reverses these operations across all composite types, restoring each sub-component to its previous observation window. This is useful for what-if analysis: observe new data, predict, rewind, try different data, predict again.

For the metadata routing infrastructure that enables these operations to flow through search and cross-validation objects, see Metadata Routing.

Connections

Feature Pipelines covers composing transformers rather than forecasters. Exogenous Features explains the three exogenous parameter types and step-indexed columns that ForecastedFeatureForecaster is designed around. Ensemble Forecasting describes combining forecasters by voting rather than by decomposition or column assignment. For how parameters like time_weight flow through pipelines and search objects, see Metadata Routing.

For practical recipes, see How to Compose Feature Pipelines and How to Combine Forecasters with Ensembles. The compose API is documented in the yohou.compose reference.