SimpleImputer¶
yohou.preprocessing.imputation.SimpleImputer
¶
Bases: SklearnTransformer
Simple imputation using sklearn's SimpleImputer.
Replaces missing values using a simple strategy (mean, median, most frequent, or constant). Wraps sklearn's SimpleImputer while preserving polars DataFrame structure and time column.
Parameters¶
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
(mean, median, most_frequent, constant)
|
Imputation strategy: - "mean": Replace with mean of each column - "median": Replace with median of each column - "most_frequent": Replace with most frequent value - "constant": Replace with fill_value |
"mean"
|
fill_value
|
str or numerical value
|
When strategy="constant", fill_value is used to replace missing values. For string or object columns, fill_value must be a string. |
None
|
missing_values
|
int, float, str, or np.nan
|
The placeholder for missing values. All occurrences of missing_values will be imputed. |
np.nan
|
Attributes¶
| Name | Type | Description |
|---|---|---|
instance_ |
SimpleImputer
|
The fitted sklearn SimpleImputer instance. |
statistics_ |
ndarray of shape (n_features,)
|
The imputation fill value for each feature (same as sklearn's statistics_). |
Examples¶
>>> import polars as pl
>>> from datetime import datetime
>>> import numpy as np
>>> from yohou.preprocessing import SimpleImputer
>>> X = pl.DataFrame({
... "time": [datetime(2020, 1, i) for i in range(1, 6)],
... "value": [1.0, np.nan, 3.0, np.nan, 5.0],
... })
>>> imputer = SimpleImputer(strategy="mean")
>>> imputer.fit(X)
SimpleImputer(...)
>>> X_imputed = imputer.transform(X)
>>> X_imputed["value"].null_count()
0
See Also¶
TransformedSpaceKNNImputer: K-nearest neighbors imputation.SimpleTimeImputer: Time series specific imputation methods.sklearn.impute.SimpleImputer: Underlying implementation.
Source Code¶
Show/Hide source
Methods¶
statistics_
property
¶
Get imputation statistics from fitted imputer.
Tutorials¶
The following example notebooks use this component:
-
How to Handle Missing Data
Data-Features
Compare SimpleTimeImputer, SeasonalImputer, SimpleImputer, and TransformedSpaceKNNImputer on synthetic block and scattered gaps in monthly tourism data.