Skip to content

BoxCoxTransformer

yohou.stationarity.transformers.BoxCoxTransformer

Bases: BaseTransformer

Box-Cox power transformation time series transformer.

The Box-Cox transformation is a parametric transformation that stabilizes variance and makes the data more normally distributed:

\[y = \begin{cases} \frac{(x + \text{offset})^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \ln(x + \text{offset}) & \text{if } \lambda = 0 \end{cases}\]

Parameters

Name Type Description Default
lmbda float

The transformation parameter. If 0, applies log transform. Common values: 0 (log), 0.5 (square root), 1 (no transform), 2 (square).

0.0
offset float >= 0.0

Offset to apply to the input time series before the Box-Cox transform. Useful for ensuring data is strictly positive.

0.0

Attributes

Name Type Description
n_features_in_ int

Number of features seen during fit.

feature_names_in_ list of str

Names of features seen during fit (excluding "time" column).

Notes

Box-Cox requires strictly positive input data.

References

[1] Box, G.E.P., & Cox, D.R. (1964). "An analysis of transformations." Journal of the Royal Statistical Society: Series B, 26(2), 211-252. [2] Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting: principles and practice," 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Chapter 3.1.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> from yohou.stationarity import BoxCoxTransformer
>>> X = pl.DataFrame({
...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
...     "value": [1.0, 4.0, 9.0, 16.0, 25.0],
... })
>>> transformer = BoxCoxTransformer(lmbda=0.5)  # Square root transform
>>> transformer.fit(X)
BoxCoxTransformer(...)
>>> X_t = transformer.transform(X)
>>> "time" in X_t.columns
True

See Also

  • LogTransformer : Logarithmic transformation (Box-Cox with lambda=0).
  • ASinhTransformer : Inverse hyperbolic sine transformation for data with zeros. sklearn.preprocessing.PowerTransformer : sklearn's power transformations.

Source Code

Show/Hide source
class BoxCoxTransformer(BaseTransformer):
    r"""Box-Cox power transformation time series transformer.

    The Box-Cox transformation is a parametric transformation that stabilizes
    variance and makes the data more normally distributed:

    $$y = \begin{cases} \frac{(x + \text{offset})^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \ln(x + \text{offset}) & \text{if } \lambda = 0 \end{cases}$$

    Parameters
    ----------
    lmbda : float, default=0.0
        The transformation parameter. If 0, applies log transform.
        Common values: 0 (log), 0.5 (square root), 1 (no transform), 2 (square).

    offset : float >= 0.0, default=0.0
        Offset to apply to the input time series before the Box-Cox transform.
        Useful for ensuring data is strictly positive.

    Attributes
    ----------
    n_features_in_ : int
        Number of features seen during fit.
    feature_names_in_ : list of str
        Names of features seen during fit (excluding "time" column).

    Notes
    -----
    Box-Cox requires strictly positive input data.

    References
    ----------
    [1] Box, G.E.P., & Cox, D.R. (1964). "An analysis of
        transformations." Journal of the Royal Statistical Society:
        Series B, 26(2), 211-252.
    [2] Hyndman, R.J., & Athanasopoulos, G. (2021). "Forecasting:
        principles and practice," 3rd edition, OTexts: Melbourne, Australia.
        OTexts.com/fpp3. Chapter 3.1.


    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> from yohou.stationarity import BoxCoxTransformer
    >>> X = pl.DataFrame({
    ...     "time": [datetime(2024, 1, i) for i in range(1, 6)],
    ...     "value": [1.0, 4.0, 9.0, 16.0, 25.0],
    ... })
    >>> transformer = BoxCoxTransformer(lmbda=0.5)  # Square root transform
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    BoxCoxTransformer(...)
    >>> X_t = transformer.transform(X)
    >>> "time" in X_t.columns
    True

    See Also
    --------
    - [`LogTransformer`][yohou.stationarity.transformers.LogTransformer] : Logarithmic transformation (Box-Cox with lambda=0).
    - [`ASinhTransformer`][yohou.stationarity.transformers.ASinhTransformer] : Inverse hyperbolic sine transformation for data with zeros.
    `sklearn.preprocessing.PowerTransformer` : sklearn's power transformations.

    """

    _parameter_constraints: dict = {
        "lmbda": [Interval(numbers.Real, None, None, closed="neither")],
        "offset": [Interval(numbers.Real, 0, None, closed="left")],
    }

    _tags = {"invertible": True}

    def __init__(self, lmbda: StrictFloat = 0.0, offset: StrictFloat = 0.0):
        self.lmbda = lmbda
        self.offset = offset

    def __sklearn_tags__(self) -> Tags:
        """Get estimator tags.

        Returns
        -------
        Tags
            Estimator tags with yohou-specific attributes.

        """
        tags = super().__sklearn_tags__()
        assert tags.input_tags is not None
        # Box-Cox requires positive data (after offset)
        tags.input_tags.min_value = -self.offset if self.offset > 0.0 else 0.0
        return tags

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Transform the input time series."""
        time = X.select(cs.by_name("time"))
        X_shifted = X.select(~cs.by_name("time")) + self.offset

        if self.lmbda == 0:
            X_t = X_shifted.with_columns(pl.all().log())
        else:
            # Box-Cox: (x^lambda - 1) / lambda
            X_t = X_shifted.with_columns((pl.all().pow(self.lmbda) - 1) / self.lmbda)

        feature_names = self.get_feature_names_out()
        X_t = X_t.rename(dict(zip(X_t.columns, feature_names, strict=False)))
        X_t = pl.concat([time, X_t], how="horizontal")

        return X_t

    def _inverse_transform(self, X_t: pl.DataFrame, X_p: pl.DataFrame | None = None) -> pl.DataFrame:
        """Inverse-transform the time series.

        Parameters
        ----------
        X_t : pl.DataFrame
            Transformed time series.
        X_p : pl.DataFrame or None
            Past observations.

        Returns
        -------
        pl.DataFrame
            Inverse-transformed time series.

        """
        X_t, _ = validate_transformer_data(
            self,
            X=X_t,
            reset=False,
            inverse=True,
            X_p=X_p,
            observation_horizon=self.observation_horizon,
        )

        time = X_t.select(cs.by_name("time"))

        if self.lmbda == 0:
            X = X_t.select(~cs.by_name("time")).with_columns(pl.all().exp()) - self.offset
        else:
            # Inverse Box-Cox: (lambda * y + 1)^(1/lambda)
            X = (
                X_t.select(~cs.by_name("time")).with_columns((pl.all() * self.lmbda + 1).pow(1 / self.lmbda))
                - self.offset
            )

        X = X.rename(dict(zip(X.columns, self.feature_names_in_, strict=False)))
        X = pl.concat([time, X], how="horizontal")

        return X

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        input_features = _check_feature_names_in(self, input_features)
        feature_names = [panel_aware_prefix(col, f"boxcox_l_{self.lmbda}_off_{self.offset}") for col in input_features]

        return feature_names

Methods

__sklearn_tags__()

Get estimator tags.

Returns
Type Description
Tags

Estimator tags with yohou-specific attributes.

Source Code
Show/Hide source
def __sklearn_tags__(self) -> Tags:
    """Get estimator tags.

    Returns
    -------
    Tags
        Estimator tags with yohou-specific attributes.

    """
    tags = super().__sklearn_tags__()
    assert tags.input_tags is not None
    # Box-Cox requires positive data (after offset)
    tags.input_tags.min_value = -self.offset if self.offset > 0.0 else 0.0
    return tags

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features array-like of str or None

Column names of the input features. If None, uses the feature names seen during fit.

None
Returns
Type Description
list of str

Output feature names after transformation.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    input_features = _check_feature_names_in(self, input_features)
    feature_names = [panel_aware_prefix(col, f"boxcox_l_{self.lmbda}_off_{self.offset}") for col in input_features]

    return feature_names

Tutorials

The following example notebooks use this component:

  • How to Apply Stationarity Transforms


    Data-Features

    Catalogue of variance-stabilising and detrending transforms: LogTransformer, BoxCox, SeasonalDifferencing, SeasonalReturn, and ASinh with inverse verification.

    View · Open in marimo