Skip to content

How to Handle Long Series

This guide shows you how to limit look-back history, down-weight old observations during evaluation, and resample when the data frequency does not match the forecast requirement.

Prerequisites

Try it interactively

How to Handle Long Series

Limit history with observation_horizon, weight recent errors with exponential decay, and downsample high-frequency data.

ViewOpen in marimo

Limit History with observation_horizon

Each stateful transformer exposes a read-only observation_horizon property that reports how many past timesteps it retains. The value is derived from the constructor parameters: for LagTransformer it equals the maximum lag, and for SeasonalDifferencing it equals the seasonality period. Older observations are dropped as new ones arrive via observe.

To keep only recent history, choose parameter values that match the look-back you need:

from yohou.preprocessing import LagTransformer
from yohou.stationarity import SeasonalDifferencing

# observation_horizon = 7 (max lag)
lag_transformer = LagTransformer(lag=[1, 7])

# observation_horizon = 7 (seasonality period)
seasonal_diff = SeasonalDifferencing(seasonality=7)

In a FeaturePipeline, the pipeline's observation_horizon is the sum of all its steps' horizons. Inspect it after fitting to verify the total look-back is what you expect:

from yohou.compose import FeaturePipeline

pipeline = FeaturePipeline([
    ("deseason", SeasonalDifferencing(seasonality=7)),
    ("lags", LagTransformer(lag=[1, 7])),
])
pipeline.fit(y)
pipeline.observation_horizon  # 14

Weighting Recent Observations in Evaluation

Limiting history is a binary cutoff. Time weighting is a softer alternative that keeps all data but gives more importance to recent errors during model selection. Pass an exponential_decay_weight or linear_decay_weight function to scorers and forecasters:

from yohou.utils.weighting import exponential_decay_weight

weight_fn = exponential_decay_weight(half_life=365)

See Use Time Weighting for full recipes on passing weights to scorers, forecasters, and search objects.

Downsampling to a Lower Frequency

When data arrives at a higher frequency than the forecast requirement, downsampling reduces computational cost and can improve model quality by removing noise that is irrelevant to the forecasting horizon.

Downsampler aggregates observations to a lower frequency. Place it at the start of the pipeline, before any stateful transformers:

from yohou.preprocessing.resampling import Downsampler

# Aggregate hourly data to daily, summing within each day
downsampler = Downsampler(interval="1d", aggregation="sum")

Available aggregation options: "mean" (default), "sum", "min", "max", "first", "last", and "median". If the boundary alignment matters (for example, whether a daily bin starts at midnight or noon), adjust closed and label:

downsampler = Downsampler(
    interval="1d",
    aggregation="sum",
    closed="right",
    label="right",
)

Upsampling to a Higher Frequency

Upsampler increases frequency by interpolating between existing observations. Use it only when obtaining higher-frequency input data is not possible, because interpolation creates artificial data points without genuine information content:

from yohou.preprocessing.resampling import Upsampler

upsampler = Upsampler(interval="1h", interpolation="linear")

Available interpolation options: "linear" (default), "nearest", "forward", and "backward".

Combining Techniques

A common recipe for long series: downsample high-frequency data, limit look-back via transformer parameters, and evaluate with time-weighted scoring.

from yohou.preprocessing.resampling import Downsampler
from yohou.stationarity import SeasonalDifferencing
from yohou.preprocessing import LagTransformer
from yohou.compose import FeaturePipeline
from yohou.utils.weighting import exponential_decay_weight
from yohou.metrics import MeanAbsoluteError

# 1. Downsample hourly data to daily
downsampler = Downsampler(interval="1d", aggregation="mean")

# 2. Build a feature pipeline with bounded look-back
pipeline = FeaturePipeline([
    ("deseason", SeasonalDifferencing(seasonality=7)),
    ("lags", LagTransformer(lag=[1, 7, 14])),
])

# 3. Evaluate with exponential decay (half-life of one year)
scorer = MeanAbsoluteError()
scorer.fit(y_train)
score = scorer.score(
    y_test, y_pred,
    time_weight=exponential_decay_weight(half_life=365),
)

See Also