Skip to content

HolidayFeatureTransformer

yohou.preprocessing.calendar.HolidayFeatureTransformer

Bases: BaseTransformer

Extract holiday indicator features from a user-provided holiday calendar.

Produces a binary holiday_indicator column (and optional proximity features) by matching the "time" column against a provided DataFrame of holiday dates. Output columns are prefixed with holiday_.

Given a set of holiday dates \(H = \{h_1, h_2, \ldots\}\), the indicator for each timestamp \(t\) is:

\[\text{is_holiday}(t) = \mathbb{1}[\text{date}(t) \in H]\]

When proximity=True, the transformer also computes:

\[\text{days_to_next}(t) = \min_{h \in H, h \geq t} (h - t) \quad \text{(null if no future holiday)}\]
\[\text{days_since_last}(t) = \min_{h \in H, h \leq t} (t - h) \quad \text{(null if no past holiday)}\]

Parameters

Name Type Description Default
holidays DataFrame or None

DataFrame with a "date" column containing holiday dates. The column must be of Date or Datetime type.

None
days_to_next bool

If True, produce a holiday_days_to_next integer column with the number of days until the next holiday (null if none).

False
days_since_last bool

If True, produce a holiday_days_since_last integer column with the number of days since the last holiday (null if none).

False

Attributes

Name Type Description
holiday_dates_ list of date

Sorted list of holiday dates used for matching.

See Also

Examples

>>> import polars as pl
>>> from datetime import datetime, date
>>> time = pl.datetime_range(
...     start=datetime(2020, 12, 23), end=datetime(2020, 12, 27), interval="1d", eager=True
... )
>>> X = pl.DataFrame({"time": time, "value": range(len(time))})
>>> holidays = pl.DataFrame({"date": [date(2020, 12, 25)]})
>>> transformer = HolidayFeatureTransformer(holidays=holidays)
>>> transformer.fit(X)
HolidayFeatureTransformer(holidays=shape: (1, 1)...)
>>> X_t = transformer.transform(X)
>>> X_t["holiday_indicator"].to_list()
[0, 0, 1, 0, 0]

Source Code

Show/Hide source
class HolidayFeatureTransformer(BaseTransformer):
    r"""Extract holiday indicator features from a user-provided holiday calendar.

    Produces a binary ``holiday_indicator`` column (and optional proximity
    features) by matching the ``"time"`` column against a provided DataFrame
    of holiday dates. Output columns are prefixed with ``holiday_``.

    Given a set of holiday dates $H = \{h_1, h_2, \ldots\}$, the indicator
    for each timestamp $t$ is:

    $$\text{is_holiday}(t) = \mathbb{1}[\text{date}(t) \in H]$$

    When ``proximity=True``, the transformer also computes:

    $$\text{days_to_next}(t) = \min_{h \in H, h \geq t} (h - t) \quad \text{(null if no future holiday)}$$

    $$\text{days_since_last}(t) = \min_{h \in H, h \leq t} (t - h) \quad \text{(null if no past holiday)}$$

    Parameters
    ----------
    holidays : pl.DataFrame or None, default=None
        DataFrame with a ``"date"`` column containing holiday dates.
        The column must be of ``Date`` or ``Datetime`` type.
    days_to_next : bool, default=False
        If ``True``, produce a ``holiday_days_to_next`` integer column
        with the number of days until the next holiday (null if none).
    days_since_last : bool, default=False
        If ``True``, produce a ``holiday_days_since_last`` integer column
        with the number of days since the last holiday (null if none).

    Attributes
    ----------
    holiday_dates_ : list of date
        Sorted list of holiday dates used for matching.

    See Also
    --------
    - [`CalendarFeatureTransformer`][yohou.preprocessing.calendar.CalendarFeatureTransformer] : Calendar features (month, day of week, etc.).
    - [`FourierFeatureTransformer`][yohou.preprocessing.time_features.FourierFeatureTransformer] : Sin/cos harmonics for cyclical encoding.
    - [`TimeIndexTransformer`][yohou.preprocessing.time_features.TimeIndexTransformer] : Numeric time index for trend features.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime, date
    >>> time = pl.datetime_range(
    ...     start=datetime(2020, 12, 23), end=datetime(2020, 12, 27), interval="1d", eager=True
    ... )
    >>> X = pl.DataFrame({"time": time, "value": range(len(time))})
    >>> holidays = pl.DataFrame({"date": [date(2020, 12, 25)]})
    >>> transformer = HolidayFeatureTransformer(holidays=holidays)
    >>> transformer.fit(X)  # doctest: +ELLIPSIS
    HolidayFeatureTransformer(holidays=shape: (1, 1)...)
    >>> X_t = transformer.transform(X)
    >>> X_t["holiday_indicator"].to_list()
    [0, 0, 1, 0, 0]

    """

    _parameter_constraints: dict = {
        "holidays": "no_validation",
        "days_to_next": ["boolean"],
        "days_since_last": ["boolean"],
    }

    _PREFIX = "holiday"

    def __init__(
        self,
        holidays: pl.DataFrame | None = None,
        days_to_next: bool = False,
        days_since_last: bool = False,
    ):
        self.holidays = holidays
        self.days_to_next = days_to_next
        self.days_since_last = days_since_last

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        if self.holidays is None:
            raise ValueError("holidays must be provided as a polars DataFrame, got None")
        if not isinstance(self.holidays, pl.DataFrame):
            raise ValueError(f"holidays must be a polars DataFrame, got {type(self.holidays).__name__}")
        if "date" not in self.holidays.columns:
            raise ValueError("holidays DataFrame must have a 'date' column")
        date_dtype = self.holidays["date"].dtype
        if date_dtype not in (pl.Date, pl.Datetime, pl.Datetime("us"), pl.Datetime("ns"), pl.Datetime("ms")):
            raise ValueError(f"holidays 'date' column must be Date or Datetime type, got {date_dtype}")

        dates_series = self.holidays["date"].cast(pl.Date)
        self.holiday_dates_ = sorted(dates_series.to_list())

        generated_names = [f"{self._PREFIX}_indicator"]
        if self.days_to_next:
            generated_names.append(f"{self._PREFIX}_days_to_next")
        if self.days_since_last:
            generated_names.append(f"{self._PREFIX}_days_since_last")
        existing = set(X.columns) - {"time"}
        conflicts = set(generated_names) & existing
        if conflicts:
            raise ValueError(f"Generated column names {sorted(conflicts)} conflict with existing columns in X.")

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Extract holiday features from the time column.

        Parameters
        ----------
        X : pl.DataFrame
            Validated input time series.

        Returns
        -------
        pl.DataFrame
            DataFrame with ``"time"`` column and holiday indicator features.

        """
        dates = X["time"].cast(pl.Date)
        holiday_set = set(self.holiday_dates_)
        is_holiday = dates.map_elements(lambda d: 1 if d in holiday_set else 0, return_dtype=pl.Int32).alias(
            f"{self._PREFIX}_indicator"
        )

        new_cols = [is_holiday]

        if self.days_to_next or self.days_since_last:
            holiday_arr = np.array(self.holiday_dates_, dtype="datetime64[D]")
            dates_arr = dates.to_numpy().astype("datetime64[D]")

            if len(holiday_arr) == 0:
                if self.days_to_next:
                    new_cols.append(
                        pl.Series(
                            f"{self._PREFIX}_days_to_next",
                            [None] * len(X),
                            dtype=pl.Int32,
                        )
                    )
                if self.days_since_last:
                    new_cols.append(
                        pl.Series(
                            f"{self._PREFIX}_days_since_last",
                            [None] * len(X),
                            dtype=pl.Int32,
                        )
                    )
            else:
                idx_next = np.searchsorted(holiday_arr, dates_arr, side="left")
                idx_prev = idx_next - 1

                if self.days_to_next:
                    to_next = []
                    for i, idx in enumerate(idx_next):
                        if idx < len(holiday_arr):
                            delta = int((holiday_arr[idx] - dates_arr[i]) / np.timedelta64(1, "D"))
                            to_next.append(delta)
                        else:
                            to_next.append(None)
                    new_cols.append(pl.Series(f"{self._PREFIX}_days_to_next", to_next, dtype=pl.Int32))

                if self.days_since_last:
                    since_last = []
                    for i, idx in enumerate(idx_prev):
                        if idx >= 0:
                            delta = int((dates_arr[i] - holiday_arr[idx]) / np.timedelta64(1, "D"))
                            since_last.append(delta)
                        else:
                            since_last.append(None)
                    new_cols.append(pl.Series(f"{self._PREFIX}_days_since_last", since_last, dtype=pl.Int32))

        return X.select(pl.col("time")).with_columns(*new_cols)

    def get_feature_names_out(self, input_features=None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Input feature names (unused, for API compatibility).

        Returns
        -------
        list of str
            Generated holiday feature column names.

        """
        check_is_fitted(self, ["holiday_dates_"])
        generated = [f"{self._PREFIX}_indicator"]
        if self.days_to_next:
            generated.append(f"{self._PREFIX}_days_to_next")
        if self.days_since_last:
            generated.append(f"{self._PREFIX}_days_since_last")
        return generated

Methods

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features array-like of str or None

Input feature names (unused, for API compatibility).

None
Returns
Type Description
list of str

Generated holiday feature column names.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features=None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Input feature names (unused, for API compatibility).

    Returns
    -------
    list of str
        Generated holiday feature column names.

    """
    check_is_fitted(self, ["holiday_dates_"])
    generated = [f"{self._PREFIX}_indicator"]
    if self.days_to_next:
        generated.append(f"{self._PREFIX}_days_to_next")
    if self.days_since_last:
        generated.append(f"{self._PREFIX}_days_since_last")
    return generated

Tutorials

The following example notebooks use this component:

  • How to Add Calendar, Fourier, and Holiday Features


    Data-Features

    Enrich your feature matrix with time-derived signals using CalendarFeatureTransformer, FourierFeatureTransformer, and HolidayFeatureTransformer.

    View · Open in marimo