RobustScaler¶
yohou.preprocessing.sklearn_wrappers.RobustScaler
¶
Bases: SklearnScaler
Scale features using statistics that are robust to outliers.
This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).
Centering and scaling happen independently on each feature by computing the
relevant statistics on the samples in the training set. Median and
interquartile range are then stored to be used on later data using the
transform() method.
Standardization of a dataset is a common preprocessing for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, using the median and the interquartile range often give better results.
This is a Yohou wrapper that preserves the polars DataFrame structure and "time" column.
Parameters¶
| Name | Type | Description | Default |
|---|---|---|---|
with_centering
|
bool
|
If True, center the data before scaling. |
True
|
with_scaling
|
bool
|
If True, scale the data to interquartile range. |
True
|
quantile_range
|
(tuple(q_min, q_max), 0.0 < q_min < q_max < 100.0)
|
Quantile range used to calculate |
(25.0, 75.0)
|
unit_variance
|
bool
|
If True, scale data so that normally distributed features have a variance of 1. |
False
|
Attributes¶
| Name | Type | Description |
|---|---|---|
instance_ |
RobustScaler
|
The fitted sklearn RobustScaler instance. |
center_ |
array of floats
|
The median value for each feature in the training set. |
scale_ |
array of floats
|
The (scaled) interquartile range for each feature in the training set. |
Examples¶
>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import RobustScaler
>>> X = pl.DataFrame({
... "time": [datetime(2024, 1, i) for i in range(1, 6)],
... "value": [10.0, 20.0, 30.0, 100.0, 50.0], # 100 is an outlier
... })
>>> scaler = RobustScaler()
>>> scaler.fit(X)
RobustScaler(...)
>>> X_scaled = scaler.transform(X)
>>> # Median-centered and scaled by IQR
>>> "time" in X_scaled.columns
True
See Also¶
StandardScaler: Scale using mean and standard deviation (sensitive to outliers).
Source Code¶
Show/Hide source
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 | |
Methods¶
Tutorials¶
The following example notebooks use this component:
-
How to Use Scikit-learn Scalers
Data-Features
Wrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.