Skip to content

ASinhTransformer

yohou.stationarity.transformers.ASinhTransformer

Bases: BaseTransformer

Variance stabilization through arcsinh transform.

Applies the transformation:

\[y = \operatorname{asinh}\!\left(\frac{X - \tilde{X}}{\text{MAD}}\right)\]

where \(\tilde{X}\) is the median and \(\text{MAD} = c \cdot \text{median}(|X - \tilde{X}|)\) with scale factor \(c = 1.4826\) by default to match the standard deviation for normally distributed data.

This transformation is useful for:

  • Stabilizing variance in heteroscedastic time series
  • Handling data with outliers (asinh is less sensitive than log)
  • Data that can be negative (unlike log transform)

Parameters

Name Type Description Default
scale float > 0

Scale factor for MAD normalization. Default value makes MAD consistent with standard deviation for normal distributions.

1.4826

Attributes

Name Type Description
median_ dict[str, float]

Median values for each column (excluding time).

mad_ dict[str, float]

Scaled MAD values for each column (excluding time).

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.stationarity import ASinhTransformer
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
...     "value": [1.0, 10.0, 100.0, 1000.0, 10000.0],
... })
>>> transformer = ASinhTransformer()
>>> transformer.fit(X)
ASinhTransformer(...)
>>> X_t = transformer.transform(X)
>>> "time" in X_t.columns
True

References

[1] Johnson, N.L. (1949). "Systems of frequency curves generated by methods of translation." Biometrika, 36(1-2), 149-176. https://doi.org/10.1093/biomet/36.1-2.149

See Also

  • BoxCoxTransformer : Power transform for variance stabilization.
  • LogTransformer : Simpler variance stabilization for positive data. sklearn.preprocessing.PowerTransformer : sklearn's power transforms.

Source Code

Show/Hide source
class ASinhTransformer(BaseTransformer):
    r"""Variance stabilization through arcsinh transform.

    Applies the transformation:

    $$y = \operatorname{asinh}\!\left(\frac{X - \tilde{X}}{\text{MAD}}\right)$$

    where $\tilde{X}$ is the median and
    $\text{MAD} = c \cdot \text{median}(|X - \tilde{X}|)$ with scale factor
    $c = 1.4826$ by default to match the standard deviation for normally
    distributed data.

    This transformation is useful for:

    - Stabilizing variance in heteroscedastic time series
    - Handling data with outliers (asinh is less sensitive than log)
    - Data that can be negative (unlike log transform)

    Parameters
    ----------
    scale : float > 0, default=1.4826
        Scale factor for MAD normalization. Default value makes MAD
        consistent with standard deviation for normal distributions.

    Attributes
    ----------
    median_ : dict[str, float]
        Median values for each column (excluding time).

    mad_ : dict[str, float]
        Scaled MAD values for each column (excluding time).

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.stationarity import ASinhTransformer
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
    ...     "value": [1.0, 10.0, 100.0, 1000.0, 10000.0],
    ... })
    >>> transformer = ASinhTransformer()
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    ASinhTransformer(...)
    >>> X_t = transformer.transform(X)
    >>> "time" in X_t.columns
    True

    References
    ----------
    [1] Johnson, N.L. (1949). "Systems of frequency curves generated
        by methods of translation." Biometrika, 36(1-2), 149-176.
        https://doi.org/10.1093/biomet/36.1-2.149

    See Also
    --------
    - [`BoxCoxTransformer`][yohou.stationarity.transformers.BoxCoxTransformer] : Power transform for variance stabilization.
    - [`LogTransformer`][yohou.stationarity.transformers.LogTransformer] : Simpler variance stabilization for positive data.
    `sklearn.preprocessing.PowerTransformer` : sklearn's power transforms.

    """

    _parameter_constraints: dict = {
        "scale": [Interval(numbers.Real, 0, None, closed="neither")],
    }

    _tags = {"invertible": True}

    def __init__(self, scale: StrictFloat = 1.4826):
        self.scale = scale

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        # Compute median and MAD for each column (excluding time)
        X_numeric = X.select(~cs.by_name("time"))

        self.median_: dict[str, float] = {}
        self.mad_: dict[str, float] = {}

        for col in X_numeric.columns:
            col_data = X_numeric.get_column(col)
            median_val = col_data.median()
            # Cast to float for numeric operations (polars median returns numeric type)
            self.median_[col] = (
                float(median_val) if median_val is not None else 0.0  # ty: ignore[invalid-argument-type]
            )

            # Compute MAD: median(|X - median(X)|) * scale
            abs_dev = (col_data - self.median_[col]).abs()
            mad_val = abs_dev.median()
            mad_scaled = (
                float(mad_val) * self.scale if mad_val is not None else 1.0  # ty: ignore[invalid-argument-type]
            )

            # Avoid division by zero
            self.mad_[col] = mad_scaled if mad_scaled != 0.0 else 1.0

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Transform the input time series."""
        time = X.select(cs.by_name("time"))
        X_numeric = X.select(~cs.by_name("time"))

        # Apply asinh((X - median) / MAD) for each column
        transformed_cols = []
        for col in X_numeric.columns:
            col_data = X_numeric.get_column(col).to_numpy()
            normalized = (col_data - self.median_[col]) / self.mad_[col]
            transformed = pl.Series(panel_aware_prefix(col, "asinh"), np.arcsinh(normalized))
            transformed_cols.append(transformed)

        X_t = pl.DataFrame(transformed_cols)
        X_t = pl.concat([time, X_t], how="horizontal")

        return X_t

    def _inverse_transform(self, X_t: pl.DataFrame, X_p: pl.DataFrame | None = None) -> pl.DataFrame:
        """Inverse-transform the time series."""
        X_t, _ = validate_transformer_data(
            self,
            X=X_t,
            reset=False,
            inverse=True,
            X_p=X_p,
            observation_horizon=self.observation_horizon,
        )

        time = X_t.select(cs.by_name("time"))
        X_t_numeric = X_t.select(~cs.by_name("time"))

        # Apply inverse: sinh(X_t) * MAD + median for each column
        inverse_cols = []
        for i, col in enumerate(X_t_numeric.columns):
            # Get original column name from feature_names_in_
            orig_col = self.feature_names_in_[i]
            col_data = X_t_numeric.get_column(col).to_numpy()
            sinh_val = np.sinh(col_data)
            inverse_val = sinh_val * self.mad_[orig_col] + self.median_[orig_col]
            inverse_cols.append(pl.Series(orig_col, inverse_val))

        X = pl.DataFrame(inverse_cols)
        X = pl.concat([time, X], how="horizontal")

        return X

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        input_features = _check_feature_names_in(self, input_features)
        return [panel_aware_prefix(col, "asinh") for col in input_features]

Methods

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features array-like of str or None

Column names of the input features. If None, uses the feature names seen during fit.

None
Returns
Type Description
list of str

Output feature names after transformation.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    input_features = _check_feature_names_in(self, input_features)
    return [panel_aware_prefix(col, "asinh") for col in input_features]

Tutorials

The following example notebooks use this component:

  • How to Apply Stationarity Transforms


    Data-Features

    Catalogue of variance-stabilising and detrending transforms: LogTransformer, BoxCox, SeasonalDifferencing, SeasonalReturn, and ASinh with inverse verification.

    View · Open in marimo