How to Use Preprocessing Transformers¶
This guide shows you how to prepare features for a forecasting model using Yohou's preprocessing transformers. Use these when you need to create lag features, compute rolling statistics, scale values, wrap custom logic, or apply different transformations to different columns.
Prerequisites¶
- Familiarity with the fit/predict API (Getting Started)
- Understanding of feature pipelines (Feature Pipelines)
Try it interactively
Route columns through distinct transformers with ColumnTransformer, including remainder handling and automatic panel-aware column detection.
ViewOpen in marimoWrap arbitrary polars or numpy operations as sklearn transformers with FunctionTransformer, supporting stateful warmup, inverse transforms, and pipelines.
ViewOpen in marimoWrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.
ViewOpen in marimoFeature engineering with LagTransformer, RollingStatisticsTransformer, SlidingWindowFunctionTransformer, and ExponentialMovingAverage on time series data.
ViewOpen in marimoCreate Lag Features with LagTransformer¶
LagTransformer creates lagged copies of each value column, producing autoregressive inputs for a forecaster. Output columns follow the pattern {col}_lag_{k}:
from yohou.preprocessing import LagTransformer
lags = LagTransformer(lag=[1, 3, 6, 12])
lags.fit(y_train)
y_lagged = lags.transform(y_train)
The transformer's observation_horizon equals the largest lag, since that many past rows are needed to produce a complete output:
If your series has a strong seasonal pattern, MeanLagTransformer averages across multiple seasonal multiples of a base lag:
from yohou.preprocessing import MeanLagTransformer
# Average lags 12, 24, 36 (3 yearly cycles for monthly data)
mean_lags = MeanLagTransformer(lag=12, n_lags=3)
Compute Rolling Statistics¶
RollingStatisticsTransformer computes rolling aggregates over a sliding window. Available statistics: mean, std, min, max, median, sum, var, q25, q75:
from yohou.preprocessing import RollingStatisticsTransformer
rolling = RollingStatisticsTransformer(
window_size=12, statistics=["mean", "std"]
)
rolling.fit(y_train)
y_rolled = rolling.transform(y_train)
Output columns follow the pattern {col}_{statistic} (e.g., value_mean, value_std). The first window_size - 1 rows are dropped because they contain incomplete windows:
For custom aggregation logic, use SlidingWindowFunctionTransformer with any callable:
import numpy as np
from yohou.preprocessing import SlidingWindowFunctionTransformer
# Coefficient of variation over a 7-step window
cv = SlidingWindowFunctionTransformer(
func=lambda x: np.std(x) / np.mean(x), window_size=7
)
Scale and Normalize Values¶
Yohou provides native scaler wrappers that work directly with polars DataFrames, preserving the "time" column automatically:
from yohou.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(y_train)
y_scaled = scaler.transform(y_train)
Other built-in scalers: MinMaxScaler, RobustScaler, MaxAbsScaler. All support inverse_transform for reversing the scaling during prediction.
If you need an sklearn transformer that doesn't have a native wrapper (e.g., a custom encoder), use SklearnTransformer to adapt it:
from sklearn.preprocessing import KBinsDiscretizer
from yohou.preprocessing import SklearnTransformer
discretizer = SklearnTransformer(
transformer=KBinsDiscretizer, n_bins=5, encode="ordinal"
)
Wrap Custom Functions with FunctionTransformer¶
FunctionTransformer wraps a plain Python function into a transformer that works inside a pipeline:
import polars as pl
from yohou.preprocessing import FunctionTransformer
def log_transform(df):
return df.with_columns(pl.all().exclude("time").log())
def exp_transform(df):
return df.with_columns(pl.all().exclude("time").exp())
transformer = FunctionTransformer(func=log_transform, inverse_func=exp_transform)
transformer.fit(y_train)
y_log = transformer.transform(y_train)
Providing inverse_func lets target transformers reverse the operation during prediction. If the function is not invertible, omit it.
Select Columns with ColumnTransformer¶
ColumnTransformer applies different transformers to different column subsets. Use this when a multivariate series needs distinct treatment per column:
from yohou.compose import ColumnTransformer
from yohou.preprocessing import LagTransformer, RollingStatisticsTransformer
ct = ColumnTransformer(
transformers=[
("lags", LagTransformer(lag=[1, 2, 3]), ["temperature"]),
("rolling", RollingStatisticsTransformer(window_size=7), ["humidity"]),
],
remainder="drop",
)
ct.fit(y_train)
y_features = ct.transform(y_train)
Set remainder="passthrough" to keep columns not assigned to any transformer in the output.
See Also¶
- How to Compose Feature Pipelines for chaining transformers sequentially and in parallel
- How to Clean and Resample Time Series for data preparation before feature engineering
- Preprocessing API Reference for full parameter documentation