Skip to content

SimpleImputer

yohou.preprocessing.imputation.SimpleImputer

Bases: SklearnTransformer

Simple imputation using sklearn's SimpleImputer.

Replaces missing values using a simple strategy (mean, median, most frequent, or constant). Wraps sklearn's SimpleImputer while preserving polars DataFrame structure and time column.

Parameters

Name Type Description Default
strategy (mean, median, most_frequent, constant)

Imputation strategy: - "mean": Replace with mean of each column - "median": Replace with median of each column - "most_frequent": Replace with most frequent value - "constant": Replace with fill_value

"mean"
fill_value str or numerical value

When strategy="constant", fill_value is used to replace missing values. For string or object columns, fill_value must be a string.

None
missing_values int, float, str, or np.nan

The placeholder for missing values. All occurrences of missing_values will be imputed.

np.nan

Attributes

Name Type Description
instance_ SimpleImputer

The fitted sklearn SimpleImputer instance.

statistics_ ndarray of shape (n_features,)

The imputation fill value for each feature (same as sklearn's statistics_).

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> import numpy as np
>>> from yohou.preprocessing import SimpleImputer
>>> X = pl.DataFrame({
...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
...     "value": [1.0, np.nan, 3.0, np.nan, 5.0],
... })
>>> imputer = SimpleImputer(strategy="mean")
>>> imputer.fit(X)
SimpleImputer(...)
>>> X_imputed = imputer.transform(X)
>>> X_imputed["value"].null_count()
0

See Also

Source Code

Show/Hide source
class SimpleImputer(SklearnTransformer):
    """Simple imputation using sklearn's SimpleImputer.

    Replaces missing values using a simple strategy (mean, median, most frequent,
    or constant). Wraps sklearn's SimpleImputer while preserving polars DataFrame
    structure and time column.

    Parameters
    ----------
    strategy : {"mean", "median", "most_frequent", "constant"}, default="mean"
        Imputation strategy:
        - "mean": Replace with mean of each column
        - "median": Replace with median of each column
        - "most_frequent": Replace with most frequent value
        - "constant": Replace with fill_value
    fill_value : str or numerical value, default=None
        When strategy="constant", fill_value is used to replace missing values.
        For string or object columns, fill_value must be a string.
    missing_values : int, float, str, or np.nan, default=np.nan
        The placeholder for missing values. All occurrences of missing_values
        will be imputed.

    Attributes
    ----------
    instance_ : SimpleImputer
        The fitted sklearn SimpleImputer instance.
    statistics_ : ndarray of shape (n_features,)
        The imputation fill value for each feature (same as sklearn's statistics_).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> import numpy as np
    >>> from yohou.preprocessing import SimpleImputer

    >>> X = pl.DataFrame({
    ...     "time": [datetime(2020, 1, i) for i in range(1, 6)],
    ...     "value": [1.0, np.nan, 3.0, np.nan, 5.0],
    ... })
    >>> imputer = SimpleImputer(strategy="mean")
    >>> imputer.fit(X)  # doctest: +ELLIPSIS
    SimpleImputer(...)
    >>> X_imputed = imputer.transform(X)
    >>> X_imputed["value"].null_count()
    0

    See Also
    --------
    - [`TransformedSpaceKNNImputer`][yohou.preprocessing.imputation.TransformedSpaceKNNImputer] : K-nearest neighbors imputation.
    - [`SimpleTimeImputer`][yohou.preprocessing.imputation.SimpleTimeImputer] : Time series specific imputation methods.
    `sklearn.impute.SimpleImputer` : Underlying implementation.

    """

    _estimator_default_class = sklearn_SimpleImputer

    def __init__(self, strategy="mean", fill_value=None, missing_values=np.nan, copy=True, **kwargs):
        super().__init__(strategy=strategy, fill_value=fill_value, missing_values=missing_values, copy=copy, **kwargs)

    @property
    def statistics_(self):
        """Get imputation statistics from fitted imputer."""
        check_is_fitted(self, ["instance_"])
        return self.instance_.statistics_

Methods

statistics_ property

Get imputation statistics from fitted imputer.

Tutorials

The following example notebooks use this component:

  • How to Handle Missing Data


    Data-Features

    Compare SimpleTimeImputer, SeasonalImputer, SimpleImputer, and TransformedSpaceKNNImputer on synthetic block and scattered gaps in monthly tourism data.

    View · Open in marimo