StandardScaler¶
yohou.preprocessing.sklearn_wrappers.StandardScaler
¶
Bases: SklearnScaler
Standardize features by removing the mean and scaling to unit variance.
The standard score of a sample x is calculated as::
z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False,
and s is the standard deviation of the training samples or one if
with_std=False.
Centering and scaling happen independently on each feature by computing the
relevant statistics on the samples in the training set. Mean and standard
deviation are then stored to be used on later data using transform().
Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
StandardScaler is sensitive to outliers, and the features may scale
differently from each other in the presence of outliers. For outlier-robust
scaling, use RobustScaler instead.
This is a Yohou wrapper that preserves the polars DataFrame structure and "time" column.
Parameters¶
| Name | Type | Description | Default |
|---|---|---|---|
with_mean
|
bool
|
If True, center the data before scaling. |
True
|
with_std
|
bool
|
If True, scale the data to unit variance (or equivalently, unit standard deviation). |
True
|
Attributes¶
| Name | Type | Description |
|---|---|---|
instance_ |
StandardScaler
|
The fitted sklearn StandardScaler instance. |
scale_ |
ndarray of shape (n_features,) or None
|
Per feature relative scaling of the data to achieve zero mean and unit
variance. Equal to |
mean_ |
ndarray of shape (n_features,) or None
|
The mean value for each feature in the training set. Equal to |
var_ |
ndarray of shape (n_features,) or None
|
The variance for each feature in the training set. Equal to |
Examples¶
>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.preprocessing import StandardScaler
>>> X = pl.DataFrame({
... "time": [datetime(2024, 1, i) for i in range(1, 6)],
... "value": [10.0, 20.0, 30.0, 40.0, 50.0],
... })
>>> scaler = StandardScaler()
>>> scaler.fit(X)
StandardScaler(...)
>>> X_scaled = scaler.transform(X)
>>> # Values are standardized (mean=0, std=1)
>>> round(X_scaled["value"].mean(), 10)
0.0
See Also¶
MinMaxScaler: Scale features to a given range.RobustScaler: Scale using statistics robust to outliers.
Source Code¶
Show/Hide source
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
Methods¶
Tutorials¶
The following example notebooks use this component:
-
How to Use Scikit-learn Scalers
Data-Features
Wrap sklearn scalers (StandardScaler, MinMaxScaler, RobustScaler, PowerTransformer, PolynomialFeatures) for polars DataFrames with inverse transforms.