Skip to content

LagTransformer

yohou.preprocessing.window.LagTransformer

Bases: BaseTransformer

Create lagged features from time series data.

Creates lagged versions of each feature column, where each lag shifts the data by a specified number of time steps. This is essential for time series forecasting using supervised learning approaches.

Parameters

Name Type Description Default
lag int >= 1 or list of ints >= 1

Lag(s) to create. Can be a single integer or a list of integers. Each lag value must be >= 1.

1

Attributes

Name Type Description
lags_ list of int

Effective list of lags used for transformation.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import LagTransformer
>>> # Create sample data
>>> X = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 11)],
...     "value": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
... })
>>> # Create lag-1 and lag-2 features
>>> transformer = LagTransformer(lag=[1, 2])
>>> transformer.fit(X)
LagTransformer(...)
>>> X_lagged = transformer.transform(X)
>>> X_lagged.columns
['time', 'value_lag_1', 'value_lag_2']
>>> len(X_lagged)  # First 2 rows dropped (max(lag) = 2)
8

See Also

MeanLagTransformer : Averages multiple lag multiples into a single feature. RollingStatisticsTransformer : Compute rolling statistics (mean, std, etc.). SlidingWindowFunctionTransformer : Apply custom functions to sliding windows. tabularize : Underlying tabularization function.

Notes

Lag features are created using yohou.utils.tabularization.tabularize, which shifts each numeric column by the specified number of time steps. The first max(lags) rows are dropped because they contain incomplete lag values, setting observation_horizon = max(lags).

When used inside a pipeline with observe/rewind, the observation buffer retains enough history to produce lag features without data loss on subsequent transform calls.

Source Code

Show/Hide source
class LagTransformer(BaseTransformer):
    """Create lagged features from time series data.

    Creates lagged versions of each feature column, where each lag shifts the
    data by a specified number of time steps. This is essential for time series
    forecasting using supervised learning approaches.

    Parameters
    ----------
    lag : int >= 1 or list of ints >= 1, default=1
        Lag(s) to create. Can be a single integer or a list of integers.
        Each lag value must be >= 1.

    Attributes
    ----------
    lags_ : list of int
        Effective list of lags used for transformation.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.preprocessing import LagTransformer

    >>> # Create sample data
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 11)],
    ...     "value": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
    ... })

    >>> # Create lag-1 and lag-2 features
    >>> transformer = LagTransformer(lag=[1, 2])
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    LagTransformer(...)
    >>> X_lagged = transformer.transform(X)
    >>> X_lagged.columns
    ['time', 'value_lag_1', 'value_lag_2']
    >>> len(X_lagged)  # First 2 rows dropped (max(lag) = 2)
    8

    See Also
    --------
    `MeanLagTransformer` : Averages multiple lag multiples into a single feature.
    `RollingStatisticsTransformer` : Compute rolling statistics (mean, std, etc.).
    `SlidingWindowFunctionTransformer` : Apply custom functions to sliding windows.
    `tabularize` : Underlying tabularization function.

    Notes
    -----
    Lag features are created using ``yohou.utils.tabularization.tabularize``,
    which shifts each numeric column by the specified number of time steps.
    The first ``max(lags)`` rows are dropped because they contain incomplete
    lag values, setting ``observation_horizon = max(lags)``.

    When used inside a pipeline with ``observe``/``rewind``, the observation
    buffer retains enough history to produce lag features without data loss
    on subsequent ``transform`` calls.

    """

    _parameter_constraints: dict = {
        "lag": [Interval(numbers.Integral, 1, None, closed="left"), list],
    }

    _tags = {"stateful": True}

    def __init__(self, lag: StrictInt | list[StrictInt] = 1):
        self.lag = lag

    @property
    def observation_horizon(self) -> int:  # noqa: D102
        """Return the number of past observations needed."""
        lags = self.lag if isinstance(self.lag, list) else [self.lag]
        return max(lags)

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        self.lags_: list[int] = self.lag if isinstance(self.lag, list) else [self.lag]

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Transform the input time series.

        Parameters
        ----------
        X : pl.DataFrame
            Validated input time series.

        Returns
        -------
        pl.DataFrame
            Transformed time series with a ``"time"`` column and transformed
            value columns.

        """
        X_t = tabularize(X, self.lags_)

        return X_t

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        input_features = _check_feature_names_in(self, input_features)
        feature_names = [f"{col}_lag_{lag}" for col in input_features for lag in self.lags_]

        arr: list[str] = np.asarray(feature_names, dtype=object).tolist()
        return arr

Methods

observation_horizon property

Return the number of past observations needed.

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features array-like of str or None

Column names of the input features. If None, uses the feature names seen during fit.

None
Returns
Type Description
list of str

Output feature names after transformation.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    input_features = _check_feature_names_in(self, input_features)
    feature_names = [f"{col}_lag_{lag}" for col in input_features for lag in self.lags_]

    arr: list[str] = np.asarray(feature_names, dtype=object).tolist()
    return arr

Tutorials

The following example notebooks use this component:

  • How to Tune Fourier Seasonality Terms


    Data-Features

    Explore how Fourier harmonic count affects seasonal fit quality, compare Fourier vs Pattern seasonality, and tune harmonics jointly with GridSearchCV.

    View · Open in marimo

  • How to Handle Long Series


    Data-Features

    Limit history with observation_horizon, weight recent errors with exponential decay, and downsample high-frequency data.

    View · Open in marimo

  • How to Aggregate Scorer Results


    Evaluation-Search

    Demonstrate all scorer aggregation strategies (stepwise, vintagewise, componentwise, groupwise, coveragewise, all) on panel data with weighted group aggregation.

    View · Open in marimo

  • How to Forecast with CatBoost


    Forecasting-Models

    Plug CatBoostRegressor into PointReductionForecaster as a drop-in sklearn estimator, compare gradient-boosted versus Ridge linear baseline, and demonstrate the direct reduction strategy with tree-based models.

    View · Open in marimo

  • How to Choose a Decomposition Strategy


    Forecasting-Models

    Build 2- and 3-component DecompositionPipeline forecasters chaining trend, seasonality, and residual models with target pre-transformation.

    View · Open in marimo

  • How to Use Lagged Forecasts as Features


    Forecasting-Models

    Compare ForecastedFeatureForecaster strategies (actual, predicted, rewind) and split ratio tuning for chaining feature and target forecasters.

    View · Open in marimo

  • How to Configure LocalPanelForecaster


    Panel-Data

    Wrap any forecaster with LocalPanelForecaster for fully independent per-group clones, parallel fitting via n_jobs, and selective group operations.

    View · Open in marimo

  • How to Run Panel Cross-Validation


    Panel-Data

    Time series cross-validation on panel data with GridSearchCV, selective group observation, rewind operations, and groupwise performance comparison.

    View · Open in marimo

  • How to Forecast Panel Prediction Intervals


    Panel-Data

    Combine conformal and quantile regression intervals on panel data with per-group coverage analysis, calibration plots, and groupwise interval scoring.

    View · Open in marimo

  • How to Apply Stationarity to Panel Data


    Panel-Data

    Apply per-group stationarity transforms on panel data with SeasonalDifferencing, DecompositionPipeline (polynomial trend + pattern seasonality), and residuals.

    View · Open in marimo