How to Handle Long Series¶

This guide shows you how to limit look-back history, down-weight old observations during evaluation, and resample when the data frequency does not match the forecast requirement.

Prerequisites¶

Familiarity with core concepts such as observation_horizon (Core Concepts)
Familiarity with time weighting (Use Time Weighting)

Try it interactively¶

How to Handle Long Series

Limit history with observation_horizon, weight recent errors with exponential decay, and downsample high-frequency data.

View · Open in marimo

Limit History with observation_horizon¶

When concept drift makes historical patterns less predictive, limiting history avoids diluting the model with stale signal. Each stateful transformer exposes a read-only observation_horizon property that reports how many past timesteps it retains. The value is derived from the constructor parameters: for LagTransformer it equals the maximum lag, and for SeasonalDifferencing it equals the seasonality period. Older observations are dropped as new ones arrive via observe.

To keep only recent history, choose parameter values that match the look-back you need:

from yohou.preprocessing import LagTransformer
from yohou.stationarity import SeasonalDifferencing

# observation_horizon = 7 (max lag)
lag_transformer = LagTransformer(lag=[1, 7])

# observation_horizon = 7 (seasonality period)
seasonal_diff = SeasonalDifferencing(seasonality=7)

In a FeaturePipeline, the pipeline's observation_horizon is the sum of all its steps' horizons. Inspect it after fitting to verify the total look-back is what you expect:

from yohou.compose import FeaturePipeline

pipeline = FeaturePipeline([
    ("deseason", SeasonalDifferencing(seasonality=7)),
    ("lags", LagTransformer(lag=[1, 7])),
])
pipeline.fit(y)
pipeline.observation_horizon  # 14

Weighting Recent Observations in Evaluation¶

Limiting history is a binary cutoff. Time weighting is a softer alternative that keeps all data but gives more importance to recent errors during model selection. Construct an ExponentialDecayWeighter or LinearDecayWeighter and pass it to the time_weighter slot of scorers and forecasters:

from yohou.weighting import ExponentialDecayWeighter

weighter = ExponentialDecayWeighter(half_life=365)

See Use Time Weighting for full recipes on configuring weighters on scorers and forecasters and tuning them via search.

Downsampling to a Lower Frequency¶

When data arrives at a higher frequency than the forecast requirement, downsampling reduces computational cost and can improve model quality by removing noise that is irrelevant to the forecasting horizon.

Downsampler aggregates observations to a lower frequency. Place it at the start of the pipeline, before any stateful transformers:

from yohou.preprocessing.resampling import Downsampler

# Aggregate hourly data to daily, summing within each day
downsampler = Downsampler(interval="1d", aggregation="sum")

Available aggregation options: "mean" (default), "sum", "min", "max", "first", "last", and "median". If the boundary alignment matters (for example, whether a daily bin starts at midnight or noon), adjust closed and label:

downsampler = Downsampler(
    interval="1d",
    aggregation="sum",
    closed="right",
    label="right",
)

Upsampling to a Higher Frequency¶

Upsampler increases frequency by interpolating between existing observations. Use it only when obtaining higher-frequency input data is not possible, because interpolation creates artificial data points without genuine information content:

from yohou.preprocessing.resampling import Upsampler

upsampler = Upsampler(interval="1h", interpolation="linear")

Available interpolation options: "linear" (default), "nearest", "forward", and "backward".

Combining Techniques¶

A common recipe for long series: downsample high-frequency data, limit look-back via transformer parameters, and evaluate with time-weighted scoring.

from sklearn.linear_model import Ridge
from yohou.preprocessing.resampling import Downsampler
from yohou.stationarity import SeasonalDifferencing
from yohou.preprocessing import LagTransformer
from yohou.compose import FeaturePipeline
from yohou.point import PointReductionForecaster
from yohou.model_selection import train_test_split
from yohou.weighting import ExponentialDecayWeighter
from yohou.metrics import MeanAbsoluteError

# 1. Downsample hourly data to daily
downsampler = Downsampler(interval="1d", aggregation="mean")
y_daily = downsampler.fit_transform(y)

# 2. Build a feature pipeline with bounded look-back
pipeline = FeaturePipeline([
    ("deseason", SeasonalDifferencing(seasonality=7)),
    ("lags", LagTransformer(lag=[1, 7, 14])),
])

# 3. Fit a forecaster on the downsampled series
y_train, y_test = train_test_split(y_daily, test_size=14)
forecaster = PointReductionForecaster(
    estimator=Ridge(),
    actual_transformer=pipeline,
)
forecaster.fit(y_train, forecasting_horizon=14)
y_pred = forecaster.predict()

# 4. Evaluate with exponential decay (half-life of one year)
scorer = MeanAbsoluteError(time_weighter=ExponentialDecayWeighter(half_life=365))
scorer.fit(y_train)
score = scorer.score(y_test, y_pred)