LagTransformer¶

`yohou.preprocessing.window.LagTransformer` ¶

Bases: BaseTransformer

Create lagged features from time series data.

Creates lagged versions of each feature column, where each lag shifts the data by a specified number of time steps. This is essential for time series forecasting using supervised learning approaches.

Parameters¶

Name	Type	Description	Default
`lag`	`int >= 1 or list of ints >= 1`	Lag(s) to create. Can be a single integer or a list of integers. Each lag value must be >= 1.	`1`

Attributes¶

Name	Type	Description
`lags_`	`list of int`	Effective list of lags used for transformation.

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import LagTransformer

>>> # Create sample data
>>> X = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 11)],
...     "value": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
... })

>>> # Create lag-1 and lag-2 features
>>> transformer = LagTransformer(lag=[1, 2])
>>> transformer.fit(X)
LagTransformer(...)
>>> X_lagged = transformer.transform(X)
>>> X_lagged.columns
['time', 'value_lag_1', 'value_lag_2']
>>> len(X_lagged)  # First 2 rows dropped (max(lag) = 2)
8

Notes¶

Lag features are created using yohou.utils.tabularization.tabularize, which shifts each numeric column by the specified number of time steps. The first max(lags) rows are dropped because they contain incomplete lag values, setting observation_horizon = max(lags).

When used inside a pipeline with observe/rewind, the observation buffer retains enough history to produce lag features without data loss on subsequent transform calls.

Source Code¶

View on GitHub

Show/Hide sourceclass LagTransformer(BaseTransformer):
    """Create lagged features from time series data.

    Creates lagged versions of each feature column, where each lag shifts the
    data by a specified number of time steps. This is essential for time series
    forecasting using supervised learning approaches.

    Parameters
    ----------
    lag : int >= 1 or list of ints >= 1, default=1
        Lag(s) to create. Can be a single integer or a list of integers.
        Each lag value must be >= 1.

    Attributes
    ----------
    lags_ : list of int
        Effective list of lags used for transformation.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.preprocessing import LagTransformer

    >>> # Create sample data
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 11)],
    ...     "value": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
    ... })

    >>> # Create lag-1 and lag-2 features
    >>> transformer = LagTransformer(lag=[1, 2])
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    LagTransformer(...)
    >>> X_lagged = transformer.transform(X)
    >>> X_lagged.columns
    ['time', 'value_lag_1', 'value_lag_2']
    >>> len(X_lagged)  # First 2 rows dropped (max(lag) = 2)
    8

    See Also
    --------
    `MeanLagTransformer` : Averages multiple lag multiples into a single feature.
    `RollingStatisticsTransformer` : Compute rolling statistics (mean, std, etc.).
    `SlidingWindowFunctionTransformer` : Apply custom functions to sliding windows.
    `tabularize` : Underlying tabularization function.

    Notes
    -----
    Lag features are created using ``yohou.utils.tabularization.tabularize``,
    which shifts each numeric column by the specified number of time steps.
    The first ``max(lags)`` rows are dropped because they contain incomplete
    lag values, setting ``observation_horizon = max(lags)``.

    When used inside a pipeline with ``observe``/``rewind``, the observation
    buffer retains enough history to produce lag features without data loss
    on subsequent ``transform`` calls.

    """

    _parameter_constraints: dict = {
        "lag": [Interval(numbers.Integral, 1, None, closed="left"), list],
    }

    _tags = {"stateful": True}

    def __init__(self, lag: StrictInt | list[StrictInt] = 1):
        self.lag = lag

    @property
    def observation_horizon(self) -> int:  # noqa: D102
        """Return the number of past observations needed."""
        lags = self.lag if isinstance(self.lag, list) else [self.lag]
        return max(lags)

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        self.lags_: list[int] = self.lag if isinstance(self.lag, list) else [self.lag]

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Transform the input time series.

        Parameters
        ----------
        X : pl.DataFrame
            Validated input time series.

        Returns
        -------
        pl.DataFrame
            Transformed time series with a ``"time"`` column and transformed
            value columns.

        """
        X_t = tabularize(X, self.lags_)

        return X_t

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        input_features = _check_feature_names_in(self, input_features)
        feature_names = [f"{col}_lag_{lag}" for col in input_features for lag in self.lags_]

        arr: list[str] = np.asarray(feature_names, dtype=object).tolist()
        return arr

Methods¶

`observation_horizon` `property` ¶

Return the number of past observations needed.

`get_feature_names_out(input_features=None)` ¶

Get output feature names for transformation.

Parameters¶

Name	Type	Description	Default
`input_features`	`array-like of str or None`	Column names of the input features. If `None`, uses the feature names seen during `fit`.	`None`

Returns¶

Type	Description
`list of str`	Output feature names after transformation.

Source Code¶

View on GitHub

Show/Hide sourcedef get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    input_features = _check_feature_names_in(self, input_features)
    feature_names = [f"{col}_lag_{lag}" for col in input_features for lag in self.lags_]

    arr: list[str] = np.asarray(feature_names, dtype=object).tolist()
    return arr

Tutorials¶

The following example notebooks use this component:

How to Tune Fourier Seasonality Terms

Data-Features

Explore how Fourier harmonic count affects seasonal fit quality, compare Fourier vs Pattern seasonality, and tune harmonics jointly with GridSearchCV.

View · Open in marimo
How to Handle Long Series

Data-Features

Limit history with observation_horizon, weight recent errors with exponential decay, and downsample high-frequency data.

View · Open in marimo
How to Aggregate Scorer Results

Evaluation-Search

Demonstrate all scorer aggregation strategies (stepwise, vintagewise, componentwise, groupwise, coveragewise, all) on panel data with weighted group aggregation.

View · Open in marimo
How to Forecast with CatBoost

Forecasting-Models

Plug CatBoostRegressor into PointReductionForecaster as a drop-in sklearn estimator, compare gradient-boosted versus Ridge linear baseline, and demonstrate the direct reduction strategy with tree-based models.

View · Open in marimo
How to Choose a Decomposition Strategy

Forecasting-Models

Build 2- and 3-component DecompositionPipeline forecasters chaining trend, seasonality, and residual models with target pre-transformation.

View · Open in marimo
How to Use Lagged Forecasts as Features

Forecasting-Models

Compare ForecastedFeatureForecaster strategies (actual, predicted, rewind) and split ratio tuning for chaining feature and target forecasters.

View · Open in marimo
How to Configure LocalPanelForecaster

Panel-Data

Wrap any forecaster with LocalPanelForecaster for fully independent per-group clones, parallel fitting via n_jobs, and selective group operations.

View · Open in marimo
How to Run Panel Cross-Validation

Panel-Data

Time series cross-validation on panel data with GridSearchCV, selective group observation, rewind operations, and groupwise performance comparison.

View · Open in marimo
How to Forecast Panel Prediction Intervals

Panel-Data

Combine conformal and quantile regression intervals on panel data with per-group coverage analysis, calibration plots, and groupwise interval scoring.

View · Open in marimo
How to Apply Stationarity to Panel Data

Panel-Data

Apply per-group stationarity transforms on panel data with SeasonalDifferencing, DecompositionPipeline (polynomial trend + pattern seasonality), and residuals.

View · Open in marimo

LagTransformer¶

yohou.preprocessing.window.LagTransformer ¶

Parameters¶

Attributes¶

Examples¶

See Also¶

Notes¶

Source Code¶

Methods¶

observation_horizon property ¶

get_feature_names_out(input_features=None) ¶

Parameters¶

Returns¶

Source Code¶

Tutorials¶

`yohou.preprocessing.window.LagTransformer` ¶

`observation_horizon` `property` ¶

`get_feature_names_out(input_features=None)` ¶