Skip to content

NumericalFilter

yohou.preprocessing.signal.NumericalFilter

Bases: BaseTransformer

Apply digital IIR or FIR filters to time series data.

Applies standard digital filters (Butterworth, Chebyshev, Bessel, etc.) for lowpass, highpass, bandpass, or bandstop filtering. Useful for noise removal, drift correction, and signal preprocessing.

Parameters

Name Type Description Default
design (butterworth, chebyshev1, chebyshev2, elliptic, bessel)

Filter design method: - "butterworth": Butterworth (maximally flat passband) - "chebyshev1": Chebyshev Type I (passband ripple) - "chebyshev2": Chebyshev Type II (stopband ripple) - "elliptic": Elliptic/Cauer (passband and stopband ripple) - "bessel": Bessel (linear phase)

"butterworth"
mode (lowpass, highpass, bandpass, bandstop)

Filter mode. For bandpass/bandstop, cutoff_frequency should be a 2-tuple.

"lowpass"
order int

Filter order. Higher order = sharper cutoff but more phase distortion.

4
cutoff_frequency float or tuple of float

Cutoff frequency as fraction of Nyquist (0 to 1). For bandpass/bandstop, provide (low_freq, high_freq).

0.1
passband_ripple float or None

Passband ripple in dB (for chebyshev1, elliptic). Defaults to 1.0 if required.

None
stopband_attenuation float or None

Stopband attenuation in dB (for chebyshev2, elliptic). Defaults to 40.0 if required.

None

Attributes

Name Type Description
b_ ndarray

Numerator coefficients of the filter.

a_ ndarray

Denominator coefficients of the filter.

zi_ dict of ndarray

Filter delay state per column. Updated after each transform call to enable streaming.

Notes

Statefulness: The filter maintains internal state (delay line values) between transform calls. This enables streaming/chunked processing without transients at chunk boundaries.

Use rewind() to clear the filter state and start fresh.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> import numpy as np
>>> from yohou.preprocessing import NumericalFilter
>>> # Generate noisy signal
>>> times = pl.datetime_range(
...     start=datetime(2020, 1, 1), end=datetime(2020, 1, 1, 0, 1), interval="1s", eager=True
... )
>>> t = np.arange(len(times))
>>> signal = np.sin(2 * np.pi * 0.05 * t) + 0.5 * np.random.randn(len(t))
>>> X = pl.DataFrame({"time": times, "signal": signal.tolist()})
>>> # Apply lowpass filter (causal, stateful)
>>> transformer = NumericalFilter(design="butterworth", mode="lowpass", order=4, cutoff_frequency=0.2)
>>> transformer.fit(X)
NumericalFilter(...)
>>> X_filtered = transformer.transform(X)
>>> "time" in X_filtered.columns
True
>>> # Filter state preserved for subsequent chunks
>>> # Use transformer.rewind() to clear state

See Also

Source Code

Show/Hide source
class NumericalFilter(BaseTransformer):
    """Apply digital IIR or FIR filters to time series data.

    Applies standard digital filters (Butterworth, Chebyshev, Bessel, etc.)
    for lowpass, highpass, bandpass, or bandstop filtering. Useful for noise
    removal, drift correction, and signal preprocessing.

    Parameters
    ----------
    design : {"butterworth", "chebyshev1", "chebyshev2", "elliptic", "bessel"}, default="butterworth"
        Filter design method:
        - "butterworth": Butterworth (maximally flat passband)
        - "chebyshev1": Chebyshev Type I (passband ripple)
        - "chebyshev2": Chebyshev Type II (stopband ripple)
        - "elliptic": Elliptic/Cauer (passband and stopband ripple)
        - "bessel": Bessel (linear phase)
    mode : {"lowpass", "highpass", "bandpass", "bandstop"}, default="lowpass"
        Filter mode. For bandpass/bandstop, cutoff_frequency should be a 2-tuple.
    order : int, default=4
        Filter order. Higher order = sharper cutoff but more phase distortion.
    cutoff_frequency : float or tuple of float, default=0.1
        Cutoff frequency as fraction of Nyquist (0 to 1). For bandpass/bandstop,
        provide (low_freq, high_freq).
    passband_ripple : float or None, default=None
        Passband ripple in dB (for chebyshev1, elliptic). Defaults to 1.0 if required.
    stopband_attenuation : float or None, default=None
        Stopband attenuation in dB (for chebyshev2, elliptic). Defaults to 40.0 if required.

    Attributes
    ----------
    b_ : ndarray
        Numerator coefficients of the filter.
    a_ : ndarray
        Denominator coefficients of the filter.
    zi_ : dict of ndarray
        Filter delay state per column. Updated after each transform call
        to enable streaming.

    Notes
    -----
    **Statefulness**: The filter maintains internal state (delay line values)
    between transform calls. This enables streaming/chunked processing without
    transients at chunk boundaries.

    Use ``rewind()`` to clear the filter state and start fresh.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> import numpy as np
    >>> from yohou.preprocessing import NumericalFilter

    >>> # Generate noisy signal
    >>> times = pl.datetime_range(
    ...     start=datetime(2020, 1, 1), end=datetime(2020, 1, 1, 0, 1), interval="1s", eager=True
    ... )
    >>> t = np.arange(len(times))
    >>> signal = np.sin(2 * np.pi * 0.05 * t) + 0.5 * np.random.randn(len(t))
    >>> X = pl.DataFrame({"time": times, "signal": signal.tolist()})

    >>> # Apply lowpass filter (causal, stateful)
    >>> transformer = NumericalFilter(design="butterworth", mode="lowpass", order=4, cutoff_frequency=0.2)
    >>> transformer.fit(X)
    NumericalFilter(...)
    >>> X_filtered = transformer.transform(X)
    >>> "time" in X_filtered.columns
    True
    >>> # Filter state preserved for subsequent chunks
    >>> # Use transformer.rewind() to clear state

    See Also
    --------
    - [`NumericalIntegrator`][yohou.preprocessing.signal.NumericalIntegrator] : Numerical integration.
    - [`NumericalDifferentiator`][yohou.preprocessing.signal.NumericalDifferentiator] : Numerical differentiation.
    `scipy.signal.butter` : Butterworth filter design.

    """

    _valid_designs = {"butterworth", "chebyshev1", "chebyshev2", "elliptic", "bessel"}
    _valid_modes = {"lowpass", "highpass", "bandpass", "bandstop"}

    _parameter_constraints: dict = {
        "design": [StrOptions(_valid_designs)],
        "mode": [StrOptions(_valid_modes)],
        "order": [Interval(numbers.Integral, 1, None, closed="left")],
        "cutoff_frequency": [Interval(numbers.Real, 0.0, 1.0, closed="neither"), tuple, list],
        "passband_ripple": [Interval(numbers.Real, 0.0, None, closed="neither"), None],
        "stopband_attenuation": [Interval(numbers.Real, 0.0, None, closed="neither"), None],
    }

    _tags = {"stateful": True}

    def __init__(
        self,
        design: str = "butterworth",
        mode: str = "lowpass",
        order: int = 4,
        cutoff_frequency: float | tuple[float, float] = 0.1,
        passband_ripple: float | None = None,
        stopband_attenuation: float | None = None,
    ):
        self.design = design
        self.mode = mode
        self.order = order
        self.cutoff_frequency = cutoff_frequency
        self.passband_ripple = passband_ripple
        self.stopband_attenuation = stopband_attenuation

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        # Get filter design function
        filter_funcs = {
            "butterworth": scipy.signal.butter,
            "chebyshev1": scipy.signal.cheby1,
            "chebyshev2": scipy.signal.cheby2,
            "elliptic": scipy.signal.ellip,
            "bessel": scipy.signal.bessel,
        }
        filter_func = filter_funcs[self.design]

        # Build filter kwargs
        kwargs = {
            "N": self.order,
            "Wn": self.cutoff_frequency,
            "btype": self.mode,
            "output": "ba",
        }

        # Add ripple parameters for specific filter types
        if self.design == "chebyshev1":
            kwargs["rp"] = self.passband_ripple if self.passband_ripple is not None else 1.0
        elif self.design == "chebyshev2":
            kwargs["rs"] = self.stopband_attenuation if self.stopband_attenuation is not None else 40.0
        elif self.design == "elliptic":
            kwargs["rp"] = self.passband_ripple if self.passband_ripple is not None else 1.0
            kwargs["rs"] = self.stopband_attenuation if self.stopband_attenuation is not None else 40.0

        # Design filter
        self.b_, self.a_ = filter_func(**kwargs)

        # Initialize filter state dict
        self.zi_: dict[str, np.ndarray] = {}

    def rewind(self, X: pl.DataFrame) -> "NumericalFilter":
        """Rewind the filter state and observation horizon.

        Clears the stored filter delay state and rewinds the observation
        window, so the next transform call starts fresh.

        Parameters
        ----------
        X : pl.DataFrame
            Input time series to set new observation window.

        Returns
        -------
        self

        """
        # Rewind filter delay state
        self.zi_ = {}
        # Call parent rewind
        BaseTransformer.rewind(self, X)
        return self

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Apply digital filter to time series.

        Parameters
        ----------
        X : pl.DataFrame
            Validated input time series.

        Returns
        -------
        pl.DataFrame
            Filtered time series.

        """
        # Get data columns
        data_cols = [c for c in X.columns if c != "time"]

        # Apply filter to each column
        result_cols = {"time": X["time"]}

        for col_name in data_cols:
            signal_data = X[col_name].to_numpy()

            # Causal filtering with state preservation
            if col_name in self.zi_:
                # Use stored state from previous transform
                zi = self.zi_[col_name]
            else:
                # Initialize state from first sample
                zi = scipy.signal.lfilter_zi(self.b_, self.a_) * signal_data[0]

            filtered, zf = scipy.signal.lfilter(self.b_, self.a_, signal_data, zi=zi)
            # Store final state for next transform
            self.zi_[col_name] = zf

            result_cols[col_name] = pl.Series(filtered)

        return pl.DataFrame(result_cols)

    def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : list of str or None, default=None
            Column names of the input features.  If ``None``, uses the
            feature names seen during ``fit``.

        Returns
        -------
        list of str
            Output feature names after transformation.

        """
        check_is_fitted(self, ["feature_names_in_"])
        input_features = _check_feature_names_in(self, input_features)
        return list(input_features)

Methods

rewind(X)

Rewind the filter state and observation horizon.

Clears the stored filter delay state and rewinds the observation window, so the next transform call starts fresh.

Parameters
Name Type Description Default
X DataFrame

Input time series to set new observation window.

required
Returns
Type Description
self
Source Code
Show/Hide source
def rewind(self, X: pl.DataFrame) -> "NumericalFilter":
    """Rewind the filter state and observation horizon.

    Clears the stored filter delay state and rewinds the observation
    window, so the next transform call starts fresh.

    Parameters
    ----------
    X : pl.DataFrame
        Input time series to set new observation window.

    Returns
    -------
    self

    """
    # Rewind filter delay state
    self.zi_ = {}
    # Call parent rewind
    BaseTransformer.rewind(self, X)
    return self

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features list of str or None

Column names of the input features. If None, uses the feature names seen during fit.

None
Returns
Type Description
list of str

Output feature names after transformation.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features: list[str] | None = None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : list of str or None, default=None
        Column names of the input features.  If ``None``, uses the
        feature names seen during ``fit``.

    Returns
    -------
    list of str
        Output feature names after transformation.

    """
    check_is_fitted(self, ["feature_names_in_"])
    input_features = _check_feature_names_in(self, input_features)
    return list(input_features)

Tutorials

The following example notebooks use this component:

  • How to Apply Signal Processing Filters


    Data-Features

    Apply NumericalFilter (Butterworth, Chebyshev, Bessel), NumericalDifferentiator, and NumericalIntegrator for signal smoothing and rate-of-change extraction.

    View · Open in marimo