Skip to content

CalendarFeatureTransformer

yohou.preprocessing.calendar.CalendarFeatureTransformer

Bases: BaseTransformer

Extract calendar-based features from the time column.

Creates new integer feature columns derived from the datetime index, useful for capturing seasonal and calendar effects in reduction forecasters. Output columns are prefixed with cal_.

Each feature is a deterministic function of the timestamp \(t\):

\[f_j(t) = \text{calendar}_{j}(t) \quad \text{for } j \in \{\text{month}, \text{day_of_week}, \ldots\}\]

For example, \(f_{\text{month}}(t) \in \{1, \ldots, 12\}\) and \(f_{\text{is_weekend}}(t) \in \{0, 1\}\).

Parameters

Name Type Description Default
features list of str or None

Calendar features to extract. If None, extracts all features applicable to the detected time interval. Valid options: "year", "month", "week", "day_of_week", "day_of_month", "day_of_year", "hour", "minute", "quarter", "is_weekend", "is_month_start", "is_month_end", "is_quarter_start", "is_quarter_end", "is_year_start", "is_year_end".

None

Attributes

Name Type Description
applicable_features_ list of str

Calendar features that will be extracted during transform.

See Also

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> time = pl.datetime_range(
...     start=datetime(2020, 1, 1), end=datetime(2020, 3, 1), interval="1d", eager=True
... )
>>> X = pl.DataFrame({"time": time, "value": range(len(time))})
>>> transformer = CalendarFeatureTransformer(features=["month", "day_of_week"])
>>> transformer.fit(X)
CalendarFeatureTransformer(features=['month', 'day_of_week'])
>>> X_t = transformer.transform(X)
>>> "cal_month" in X_t.columns
True

Source Code

Show/Hide source
class CalendarFeatureTransformer(BaseTransformer):
    r"""Extract calendar-based features from the time column.

    Creates new integer feature columns derived from the datetime index,
    useful for capturing seasonal and calendar effects in reduction
    forecasters. Output columns are prefixed with ``cal_``.

    Each feature is a deterministic function of the timestamp $t$:

    $$f_j(t) = \text{calendar}_{j}(t) \quad \text{for } j \in \{\text{month}, \text{day_of_week}, \ldots\}$$

    For example, $f_{\text{month}}(t) \in \{1, \ldots, 12\}$ and
    $f_{\text{is_weekend}}(t) \in \{0, 1\}$.

    Parameters
    ----------
    features : list of str or None, default=None
        Calendar features to extract. If ``None``, extracts all features
        applicable to the detected time interval. Valid options:
        ``"year"``, ``"month"``, ``"week"``, ``"day_of_week"``,
        ``"day_of_month"``, ``"day_of_year"``, ``"hour"``, ``"minute"``,
        ``"quarter"``, ``"is_weekend"``, ``"is_month_start"``,
        ``"is_month_end"``, ``"is_quarter_start"``, ``"is_quarter_end"``,
        ``"is_year_start"``, ``"is_year_end"``.

    Attributes
    ----------
    applicable_features_ : list of str
        Calendar features that will be extracted during transform.

    See Also
    --------
    - [`HolidayFeatureTransformer`][yohou.preprocessing.calendar.HolidayFeatureTransformer] : Binary holiday indicator from user-provided dates.
    - [`FourierFeatureTransformer`][yohou.preprocessing.time_features.FourierFeatureTransformer] : Sin/cos harmonics for cyclical encoding.
    - [`TimeIndexTransformer`][yohou.preprocessing.time_features.TimeIndexTransformer] : Numeric time index for trend features.
    - [`FunctionTransformer`][yohou.preprocessing.function.FunctionTransformer] : Custom function-based transforms.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> time = pl.datetime_range(
    ...     start=datetime(2020, 1, 1), end=datetime(2020, 3, 1), interval="1d", eager=True
    ... )
    >>> X = pl.DataFrame({"time": time, "value": range(len(time))})
    >>> transformer = CalendarFeatureTransformer(features=["month", "day_of_week"])
    >>> transformer.fit(X)
    CalendarFeatureTransformer(features=['month', 'day_of_week'])
    >>> X_t = transformer.transform(X)
    >>> "cal_month" in X_t.columns
    True

    """

    _parameter_constraints: dict = {
        "features": [list, None],
    }

    def __init__(
        self,
        features: list[str] | None = None,
    ):
        self.features = features

    def _fit(self, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
        """Fit the internal model."""
        if self.features is not None:
            invalid = set(self.features) - set(ALL_FEATURES)
            if invalid:
                raise ValueError(f"Unknown features: {sorted(invalid)}. Valid features: {list(ALL_FEATURES)}")
            inapplicable = [f for f in self.features if not _interval_supports_feature(self.interval_, f)]
            if inapplicable:
                raise ValueError(
                    f"Features {inapplicable} are not applicable to data with "
                    f"interval '{self.interval_}'. These features require "
                    f"sub-daily data."
                )
            self.applicable_features_ = list(self.features)
        else:
            self.applicable_features_ = [f for f in ALL_FEATURES if _interval_supports_feature(self.interval_, f)]

        generated_names = [f"cal_{f}" for f in self.applicable_features_]
        existing = set(X.columns) - {"time"}
        conflicts = set(generated_names) & existing
        if conflicts:
            raise ValueError(
                f"Generated column names {sorted(conflicts)} conflict with "
                f"existing columns in X. Rename input columns or select "
                f"different features."
            )

    def _transform(self, X: pl.DataFrame) -> pl.DataFrame:
        """Extract calendar features from the time column.

        Parameters
        ----------
        X : pl.DataFrame
            Validated input time series.

        Returns
        -------
        pl.DataFrame
            DataFrame with ``"time"`` column and extracted calendar features.

        """
        feature_exprs = [_extract_feature(f) for f in self.applicable_features_]

        return X.select(pl.col("time"), *feature_exprs)

    def get_feature_names_out(self, input_features=None) -> list[str]:
        """Get output feature names for transformation.

        Parameters
        ----------
        input_features : array-like of str or None, default=None
            Input feature names (unused, for API compatibility).

        Returns
        -------
        list of str
            All non-time output column names.

        """
        check_is_fitted(self, ["applicable_features_"])
        return [f"cal_{f}" for f in self.applicable_features_]

Methods

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters
Name Type Description Default
input_features array-like of str or None

Input feature names (unused, for API compatibility).

None
Returns
Type Description
list of str

All non-time output column names.

Source Code
Show/Hide source
def get_feature_names_out(self, input_features=None) -> list[str]:
    """Get output feature names for transformation.

    Parameters
    ----------
    input_features : array-like of str or None, default=None
        Input feature names (unused, for API compatibility).

    Returns
    -------
    list of str
        All non-time output column names.

    """
    check_is_fitted(self, ["applicable_features_"])
    return [f"cal_{f}" for f in self.applicable_features_]

Tutorials

The following example notebooks use this component:

  • How to Add Calendar, Fourier, and Holiday Features


    Data-Features

    Enrich your feature matrix with time-derived signals using CalendarFeatureTransformer, FourierFeatureTransformer, and HolidayFeatureTransformer.

    View · Open in marimo