ASinhTransformer¶

`yohou.stationarity.transformers.ASinhTransformer` ¶

Bases: BaseTransformer

Variance stabilization through arcsinh transform.

Applies the transformation:

\[y = \operatorname{asinh}\!\left(\frac{X - \tilde{X}}{\text{MAD}}\right)\]

where \(\tilde{X}\) is the median and \(\text{MAD} = c \cdot \text{median}(|X - \tilde{X}|)\) with scale factor \(c = 1.4826\) by default to match the standard deviation for normally distributed data.

This transformation is useful for:

Stabilizing variance in heteroscedastic time series
Handling data with outliers (asinh is less sensitive than log)
Data that can be negative (unlike log transform)

Parameters¶

Name	Type	Description	Default
`scale`	`float > 0`	Scale factor for MAD normalization. Default value makes MAD consistent with standard deviation for normal distributions.	`1.4826`

Attributes¶

Name	Type	Description
`median_`	`dict[str, float]`	Median values for each column (excluding time).
`mad_`	`dict[str, float]`	Scaled MAD values for each column (excluding time).

Examples¶

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.stationarity import ASinhTransformer
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
...     "value": [1.0, 10.0, 100.0, 1000.0, 10000.0],
... })
>>> transformer = ASinhTransformer()
>>> transformer.fit(X)
ASinhTransformer(...)
>>> X_t = transformer.transform(X)
>>> "time" in X_t.columns
True

References¶

[1] Johnson, N.L. (1949). "Systems of frequency curves generated by methods of translation." Biometrika, 36(1-2), 149-176. https://doi.org/10.1093/biomet/36.1-2.149

Source Code¶

View on GitHub

Show/Hide sourceclass ASinhTransformer(BaseTransformer):
    r"""Variance stabilization through arcsinh transform.

    Applies the transformation:

    $$y = \operatorname{asinh}\!\left(\frac{X - \tilde{X}}{\text{MAD}}\right)$$

    where $\tilde{X}$ is the median and
    $\text{MAD} = c \cdot \text{median}(|X - \tilde{X}|)$ with scale factor
    $c = 1.4826$ by default to match the standard deviation for normally
    distributed data.

    This transformation is useful for:

    - Stabilizing variance in heteroscedastic time series
    - Handling data with outliers (asinh is less sensitive than log)
    - Data that can be negative (unlike log transform)

    Parameters
    ----------
    scale : float > 0, default=1.4826
        Scale factor for MAD normalization. Default value makes MAD
        consistent with standard deviation for normal distributions.

    Attributes
    ----------
    median_ : dict[str, float]
        Median values for each column (excluding time).

    mad_ : dict[str, float]
        Scaled MAD values for each column (excluding time).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.stationarity import ASinhTransformer
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
    ...     "value": [1.0, 10.0, 100.0, 1000.0, 10000.0],
    ... })
    >>> transformer = ASinhTransformer()
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    ASinhTransformer(...)
    >>> X_t = transformer.transform(X)
    >>> "time" in X_t.columns
    True

    References
    ----------
    [1] Johnson, N.L. (1949). "Systems of frequency curves generated
        by methods of translation." Biometrika, 36(1-2), 149-176.
        https://doi.org/10.1093/biomet/36.1-2.149

    See Also
    --------
    - [`BoxCoxTransformer`][yohou.stationarity.transformers.BoxCoxTransformer] : Power transform for variance stabilization.
    - [`LogTransformer`][yohou.stationarity.transformers.LogTransformer] : Simpler variance stabilization for positive data.
    `sklearn.preprocessing.PowerTransformer` : sklearn's power transforms.

    """

    _parameter_constraints: dict = {
        "scale": [Interval(numbers.Real, 0, None, closed="neither")],
    }

    _tags = {"invertible": True}

    def __init__(self, scale: StrictFloat = 1.4826):
        self.scale = scale

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        # Compute median and MAD for each column (excluding time)
        X_numeric = X.select(~cs.by_name("time"))

        self.median_: dict[str, float] = {}
        self.mad_: dict[str, float] = {}

        for col in X_numeric.columns:
            col_data = X_numeric.get_column(col)
            median_val = col_data.median()
            # Cast to float for numeric operations (polars median returns numeric type)
            self.median_[col] = (
                float(median_val) if median_val is not None else 0.0  # ty: ignore[invalid-argument-type]
            )

            # Compute MAD: median(|X - median(X)|) * scale
            abs_dev = (col_data - self.median_[col]).abs()
            mad_val = abs_dev.median()
            mad_scaled = (
                float(mad_val) * self.scale if mad_val is not None else 1.0  # ty: ignore[invalid-argument-type]
            )

            # Avoid division by zero
            self.mad_[col] = mad_scaled if mad_scaled != 0.0 else 1.0

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Transform the input time series."""
        time = X.select(cs.by_name("time"))
        X_numeric = X.select(~cs.by_name("time"))

        # Apply asinh((X - median) / MAD) for each column
        transformed_cols = []
        for col in X_numeric.columns:
            col_data = X_numeric.get_column(col).to_numpy()
            normalized = (col_data - self.median_[col]) / self.mad_[col]
            transformed = pl.Series(panel_aware_prefix(col, "asinh"), np.arcsinh(normalized))
            transformed_cols.append(transformed)

        X_t = pl.DataFrame(transformed_cols)
        X_t = pl.concat([time, X_t], how="horizontal")

        return X_t

    def _inverse_transform(self, X_t: pl.DataFrame, X_p: pl.DataFrame | None = None) -> pl.DataFrame:
        """Inverse-transform the time series."""
        X_t, _ = validate_transformer_data(
            self,
            X=X_t,
            reset=False,
            inverse=True,
            X_p=X_p,
            observation_horizon=self.observation_horizon,
        )

        time = X_t.select(cs.by_name("time"))
        X_t_numeric = X_t.select(~cs.by_name("time"))

        # Apply inverse: sinh(X_t) * MAD + median for each column
        inverse_cols = []
        for i, col in enumerate(X_t_numeric.columns):
            # Get original column name from feature_names_in_
            orig_col = self.feature_names_in_[i]
            col_data = X_t_numeric.get_column(col).to_numpy()
            sinh_val = np.sinh(col_data)
            inverse_val = sinh_val * self.mad_[orig_col] + self.median_[orig_col]
            inverse_cols.append(pl.Series(orig_col, inverse_val))

        X = pl.DataFrame(inverse_cols)
        X = pl.concat([time, X], how="horizontal")

        return X

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        input_features = _check_feature_names_in(self, input_features)
        return [panel_aware_prefix(col, "asinh") for col in input_features]

Methods¶

`get_feature_names_out(input_features=None)` ¶

Get output feature names for transformation.

Parameters¶

Name	Type	Description	Default
`input_features`	`array-like of str or None`	Column names of the input features. If `None`, uses the feature names seen during `fit`.	`None`

Returns¶

Type	Description
`list of str`	Output feature names after transformation.

Source Code¶

View on GitHub

Show/Hide sourcedef get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    input_features = _check_feature_names_in(self, input_features)
    return [panel_aware_prefix(col, "asinh") for col in input_features]

Tutorials¶

The following example notebooks use this component:

How to Apply Stationarity Transforms

Data-Features

Catalogue of variance-stabilising and detrending transforms: LogTransformer, BoxCox, SeasonalDifferencing, SeasonalReturn, and ASinh with inverse verification.

View · Open in marimo

ASinhTransformer¶

yohou.stationarity.transformers.ASinhTransformer ¶

Parameters¶

Attributes¶

Examples¶

References¶

See Also¶

Source Code¶

Methods¶

get_feature_names_out(input_features=None) ¶

Parameters¶

Returns¶

Source Code¶

Tutorials¶

`yohou.stationarity.transformers.ASinhTransformer` ¶

`get_feature_names_out(input_features=None)` ¶