tabularize¶
yohou.utils.tabularization.tabularize(df_time_series, lags)
¶
Convert time series to tabular format using lags.
Creates a tabular dataset by generating lagged versions of each time series column. This is the core operation for reduction-based forecasting, enabling use of sklearn estimators for time series prediction.
Parameters¶
| Name | Type | Description | Default |
|---|---|---|---|
df_time_series
|
DataFrame
|
Time series DataFrame with columns to be lagged (excluding "time"). |
required |
lags
|
Sequence of int
|
Lag values to create. Each value i creates features shifted by i time steps. For example, lags=[1, 2, 3] creates lag_1, lag_2, and lag_3 features. |
required |
Returns¶
| Type | Description |
|---|---|
DataFrame
|
Tabularized DataFrame with lagged feature columns. The first max(lags) rows are dropped since they would contain null values. Column names follow the pattern "{original_column}lag". |
Examples¶
>>> import polars as pl
>>> # Original time series
>>> df = pl.DataFrame({"time": [1, 2, 3, 4, 5], "value": [10, 20, 30, 40, 50]})
>>> # Create lag features for lags 1, 2
>>> df_tabular = tabularize(df, lags=[1, 2])
>>> df_tabular
shape: (3, 5)
┌──────┬────────────┬────────────┬─────────────┬─────────────┐
│ time ┆ time_lag_1 ┆ time_lag_2 ┆ value_lag_1 ┆ value_lag_2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪════════════╪════════════╪═════════════╪═════════════╡
│ 3 ┆ 2 ┆ 1 ┆ 20 ┆ 10 │
│ 4 ┆ 3 ┆ 2 ┆ 30 ┆ 20 │
│ 5 ┆ 4 ┆ 3 ┆ 40 ┆ 30 │
└──────┴────────────┴────────────┴─────────────┴─────────────┘
See Also¶
BaseReductionForecaster: Uses tabularize for forecastingLagTransformer: Transformer that applies similar lagging logic