Skip to content

How to Use Preprocessing Transformers

This guide shows you how to prepare features for a forecasting model using Yohou's preprocessing transformers. Use these when you need to create lag features, compute rolling statistics, scale values, wrap custom logic, or apply different transformations to different columns.

Prerequisites

Try it interactively

How to Use ColumnTransformer

Route columns through distinct transformers with ColumnTransformer, including remainder handling and automatic panel-aware column detection.

ViewOpen in marimo
How to Wrap Functions as Transformers

Wrap arbitrary polars or numpy operations as sklearn transformers with FunctionTransformer, supporting stateful warmup, inverse transforms, and pipelines.

ViewOpen in marimo
How to Use Scikit-learn Scalers

Wrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.

ViewOpen in marimo
How to Apply Window Transformations

Feature engineering with LagTransformer, RollingStatisticsTransformer, SlidingWindowFunctionTransformer, and ExponentialMovingAverage on time series data.

ViewOpen in marimo

Create Lag Features with LagTransformer

LagTransformer creates lagged copies of each value column, producing autoregressive inputs for a forecaster. Output columns follow the pattern {col}_lag_{k}:

from yohou.preprocessing import LagTransformer

lags = LagTransformer(lag=[1, 3, 6, 12])
lags.fit(y_train)
y_lagged = lags.transform(y_train)

The transformer's observation_horizon equals the largest lag, since that many past rows are needed to produce a complete output:

print(lags.observation_horizon)  # 12

If your series has a strong seasonal pattern, MeanLagTransformer averages across multiple seasonal multiples of a base lag:

from yohou.preprocessing import MeanLagTransformer

# Average lags 12, 24, 36 (3 yearly cycles for monthly data)
mean_lags = MeanLagTransformer(lag=12, n_lags=3)

Compute Rolling Statistics

RollingStatisticsTransformer computes rolling aggregates over a sliding window. Available statistics: mean, std, min, max, median, sum, var, q25, q75:

from yohou.preprocessing import RollingStatisticsTransformer

rolling = RollingStatisticsTransformer(
    window_size=12, statistics=["mean", "std"]
)
rolling.fit(y_train)
y_rolled = rolling.transform(y_train)

Output columns follow the pattern {col}_{statistic} (e.g., value_mean, value_std). The first window_size - 1 rows are dropped because they contain incomplete windows:

print(rolling.observation_horizon)  # 11  (window_size - 1)

For custom aggregation logic, use SlidingWindowFunctionTransformer with any callable:

import numpy as np
from yohou.preprocessing import SlidingWindowFunctionTransformer

# Coefficient of variation over a 7-step window
cv = SlidingWindowFunctionTransformer(
    func=lambda x: np.std(x) / np.mean(x), window_size=7
)

Scale and Normalize Values

Yohou provides native scaler wrappers that work directly with polars DataFrames, preserving the "time" column automatically:

from yohou.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(y_train)
y_scaled = scaler.transform(y_train)

Other built-in scalers: MinMaxScaler, RobustScaler, MaxAbsScaler. All support inverse_transform for reversing the scaling during prediction.

If you need an sklearn transformer that doesn't have a native wrapper (e.g., a custom encoder), use SklearnTransformer to adapt it:

from sklearn.preprocessing import KBinsDiscretizer
from yohou.preprocessing import SklearnTransformer

discretizer = SklearnTransformer(
    transformer=KBinsDiscretizer, n_bins=5, encode="ordinal"
)

Wrap Custom Functions with FunctionTransformer

FunctionTransformer wraps a plain Python function into a transformer that works inside a pipeline:

import polars as pl
from yohou.preprocessing import FunctionTransformer

def log_transform(df):
    return df.with_columns(pl.all().exclude("time").log())

def exp_transform(df):
    return df.with_columns(pl.all().exclude("time").exp())

transformer = FunctionTransformer(func=log_transform, inverse_func=exp_transform)
transformer.fit(y_train)
y_log = transformer.transform(y_train)

Providing inverse_func lets target transformers reverse the operation during prediction. If the function is not invertible, omit it.

Select Columns with ColumnTransformer

ColumnTransformer applies different transformers to different column subsets. Use this when a multivariate series needs distinct treatment per column:

from yohou.compose import ColumnTransformer
from yohou.preprocessing import LagTransformer, RollingStatisticsTransformer

ct = ColumnTransformer(
    transformers=[
        ("lags", LagTransformer(lag=[1, 2, 3]), ["temperature"]),
        ("rolling", RollingStatisticsTransformer(window_size=7), ["humidity"]),
    ],
    remainder="drop",
)

ct.fit(y_train)
y_features = ct.transform(y_train)

Set remainder="passthrough" to keep columns not assigned to any transformer in the output.

See Also