Skip to content

SimpleTimeImputer

yohou.preprocessing.imputation.SimpleTimeImputer

Bases: BaseTransformer

Time series imputation using interpolation or filling methods.

Imputes missing values using time series-aware methods like linear interpolation, forward fill, backward fill, or combinations.

Parameters

Name Type Description Default
method (linear, forward, backward, nearest, fill_both)

Imputation method: - "linear": Linear interpolation between known values - "forward": Forward fill (last observation carried forward) - "backward": Backward fill (next observation carried backward) - "nearest": Use nearest non-null value - "fill_both": Forward fill then backward fill (handles edges)

"linear"
limit int or None

Maximum number of consecutive NaN values to fill. If None, no limit.

None

Attributes

Name Type Description
method_ str

Validated imputation method.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> import numpy as np
>>> from yohou.preprocessing import SimpleTimeImputer
>>> X = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 8)],
...     "value": [1.0, np.nan, np.nan, 4.0, np.nan, 6.0, 7.0],
... })
>>> # Linear interpolation
>>> imputer = SimpleTimeImputer(method="linear")
>>> imputer.fit(X)
SimpleTimeImputer()
>>> X_imputed = imputer.transform(X)
>>> X_imputed["value"].null_count()
0
>>> # Forward fill with limit
>>> imputer = SimpleTimeImputer(method="forward", limit=1)
>>> imputer.fit(X)
SimpleTimeImputer(...)
>>> X_imputed = imputer.transform(X)
>>> "time" in X_imputed.columns
True

See Also

Source Code

Show/Hide source
class SimpleTimeImputer(BaseTransformer):
    """Time series imputation using interpolation or filling methods.

    Imputes missing values using time series-aware methods like linear
    interpolation, forward fill, backward fill, or combinations.

    Parameters
    ----------
    method : {"linear", "forward", "backward", "nearest", "fill_both"}, default="linear"
        Imputation method:
        - "linear": Linear interpolation between known values
        - "forward": Forward fill (last observation carried forward)
        - "backward": Backward fill (next observation carried backward)
        - "nearest": Use nearest non-null value
        - "fill_both": Forward fill then backward fill (handles edges)
    limit : int or None, default=None
        Maximum number of consecutive NaN values to fill. If None, no limit.

    Attributes
    ----------
    method_ : str
        Validated imputation method.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> import numpy as np
    >>> from yohou.preprocessing import SimpleTimeImputer

    >>> X = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 8)],
    ...     "value": [1.0, np.nan, np.nan, 4.0, np.nan, 6.0, 7.0],
    ... })

    >>> # Linear interpolation
    >>> imputer = SimpleTimeImputer(method="linear")
    >>> imputer.fit(X)
    SimpleTimeImputer()
    >>> X_imputed = imputer.transform(X)
    >>> X_imputed["value"].null_count()
    0

    >>> # Forward fill with limit
    >>> imputer = SimpleTimeImputer(method="forward", limit=1)
    >>> imputer.fit(X)  # doctest: +ELLIPSIS
    SimpleTimeImputer(...)
    >>> X_imputed = imputer.transform(X)
    >>> "time" in X_imputed.columns
    True

    See Also
    --------
    - [`SimpleImputer`][yohou.preprocessing.imputation.SimpleImputer] : Simple constant-strategy imputation.
    - [`SeasonalImputer`][yohou.preprocessing.imputation.SeasonalImputer] : Seasonal decomposition-based imputation.

    """

    _valid_methods = {"linear", "forward", "backward", "nearest", "fill_both"}

    _parameter_constraints: dict = {
        "method": [StrOptions(_valid_methods)],
        "limit": [Interval(numbers.Integral, 1, None, closed="left"), None],
    }

    _tags = {"stateful": False, "invertible": False}

    def __init__(
        self,
        method: str = "linear",
        limit: int | None = None,
    ):
        self.method = method
        self.limit = limit

    @_fit_context(prefer_skip_nested_validation=True)
    def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params) -> "SimpleTimeImputer":
        """Fit the imputer (validates parameters).

        Parameters
        ----------
        X : pl.DataFrame
            Input time series with a ``"time"`` column (datetime) and one or
            more numeric columns.
        y : pl.DataFrame or None, default=None
            Ignored.  Present for API compatibility.
        **params : dict
            Metadata to route to nested estimators.

        Returns
        -------
        self
            The fitted transformer instance.

        """
        X = validate_transformer_data(self, X=X, reset=True)
        BaseTransformer.fit(self, X, y, **params)

        self.method_ = self.method

        return self

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Impute missing values in time series.

        Parameters
        ----------
        X : pl.DataFrame
            Validated input time series.

        Returns
        -------
        pl.DataFrame
            Imputed time series.

        """
        # Get data columns
        data_cols = [c for c in X.columns if c != "time"]

        # Build expressions for imputation
        exprs = [pl.col("time")]

        for col_name in data_cols:
            col = pl.col(col_name)

            if self.method_ == "linear":
                # Polars interpolate does linear by default
                imputed = col.interpolate()
            elif self.method_ == "forward":
                imputed = col.forward_fill(limit=self.limit)
            elif self.method_ == "backward":
                imputed = col.backward_fill(limit=self.limit)
            elif self.method_ == "nearest":
                # Use forward then backward to get nearest
                imputed = col.forward_fill().backward_fill()
            elif self.method_ == "fill_both":
                # Forward fill then backward fill
                imputed = col.forward_fill(limit=self.limit).backward_fill(limit=self.limit)
            else:
                imputed = col

            exprs.append(imputed.alias(col_name))

        return X.select(exprs)

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : list of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        check_is_fitted(self, ["feature_names_in_"])
        input_features = _check_feature_names_in(self, input_features)
        return list(input_features)

Methods

fit(X, y=None, **params)

Fit the imputer (validates parameters).

Parameters
Name Type Description Default
X DataFrame

Input time series with a "time" column (datetime) and one or more numeric columns.

required
y DataFrame or None

Ignored. Present for API compatibility.

None
**params dict

Metadata to route to nested estimators.

{}
Returns
Type Description
self

The fitted transformer instance.

Source Code
Show/Hide source
@_fit_context(prefer_skip_nested_validation=True)
def fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None, **params) -> "SimpleTimeImputer":
    """Fit the imputer (validates parameters).

    Parameters
    ----------
    X : pl.DataFrame
        Input time series with a ``"time"`` column (datetime) and one or
        more numeric columns.
    y : pl.DataFrame or None, default=None
        Ignored.  Present for API compatibility.
    **params : dict
        Metadata to route to nested estimators.

    Returns
    -------
    self
        The fitted transformer instance.

    """
    X = validate_transformer_data(self, X=X, reset=True)
    BaseTransformer.fit(self, X, y, **params)

    self.method_ = self.method

    return self

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features list of str or None

Column names of the input features. If None, uses the feature names seen during fit.

None
Returns
Type Description
list of str

Output feature names after transformation.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : list of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    check_is_fitted(self, ["feature_names_in_"])
    input_features = _check_feature_names_in(self, input_features)
    return list(input_features)

Tutorials

The following example notebooks use this component:

  • How to Handle Missing Data


    Data-Features

    Compare SimpleTimeImputer, SeasonalImputer, SimpleImputer, and TransformedSpaceKNNImputer on synthetic block and scattered gaps in monthly tourism data.

    View · Open in marimo

  • How to Clean Time Series Data


    Data-Features

    End-to-end data cleaning pipeline combining SimpleTimeImputer and SeasonalImputer for missing values with OutlierThresholdHandler for anomaly clipping.

    View · Open in marimo

  • How to Preprocess Panel Data


    Panel-Data

    Automatic panel-aware transformation (StandardScaler, rolling stats, imputation) plus manual per-group workflows with get_group_df and dict_to_panel.

    View · Open in marimo