Skip to content

TemporalSimilarity

yohou.interval.similarity.TemporalSimilarity

Bases: BaseSimilarity

Temporal similarity using Fourier features for weighting observations.

Computes observation weights by measuring the distance between cyclic temporal features extracted from prediction timestamps. Observations at similar seasonal positions (e.g. same day of week, same month of year) receive higher weights.

Timestamps are converted to step indices relative to the first observed timestamp, then encoded as sin/cos pairs at the specified seasonal periods. Distances in this feature space are converted to weights using the same softmax formula as DistanceSimilarity.

Parameters

Name Type Description Default
seasonalities list of float

Seasonal periods in time steps (e.g. [7.0, 365.25] for weekly and yearly cycles on daily data).

None
harmonics dict mapping float to list of int, or None

Harmonics to include per seasonality period. Keys must match entries in seasonalities. Each value is a list of positive integers specifying which harmonics to use. Defaults to {s: [1]} for each s in seasonalities.

None
metric str

Distance metric for scipy.spatial.distance.cdist.

"euclidean"
metric_params dict or None

Additional keyword arguments forwarded to the distance metric.

None

Attributes

Name Type Description
first_time_ datetime

Reference timestamp from the first calibration prediction.

interval_td_ timedelta

Time interval between consecutive timestamps, auto-detected from calibration data.

Notes

Sin/cos encoding ensures that cyclic distances are correctly captured (e.g. December 31 is close to January 1). Multiple seasonalities combine naturally by concatenating feature vectors.

The weight normalisation matches DistanceSimilarity exactly:

\[w_{ji} = \frac{\exp(-d(x_j, x_i))}{\sum_k \exp(-d(x_j, x_k))} \cdot n_{\text{features}}\]

followed by

\[w_{ji} \leftarrow \frac{w_{ji}}{1 + \sum_k w_{jk}}\]

This reserves probability mass for a uniform component, following the conformal prediction literature.

See Also

Examples

>>> from datetime import datetime, timedelta
>>> import polars as pl
>>> import numpy as np
>>> from yohou.interval.similarity import TemporalSimilarity
>>>
>>> # Daily data with 3 weeks of calibration
>>> dates = [datetime(2021, 1, 1) + timedelta(days=i) for i in range(21)]
>>> y = pl.DataFrame({"time": dates, "value": np.random.randn(21)})
>>> y_pred = pl.DataFrame({"time": dates, "value": np.random.randn(21)})
>>>
>>> # Fit with weekly seasonality
>>> sim = TemporalSimilarity(seasonalities=[7.0])
>>> _ = sim.fit(y, y_pred)
>>>
>>> # Predict weights for a new Monday
>>> new_date = [datetime(2021, 1, 22)]
>>> y_pred_new = pl.DataFrame({"time": new_date, "value": [0.5]})
>>> weights = sim.predict(y_pred_new)
>>> weights.shape
(1, 21)

Source Code

Show/Hide source
class TemporalSimilarity(BaseSimilarity):
    r"""Temporal similarity using Fourier features for weighting observations.

    Computes observation weights by measuring the distance between
    cyclic temporal features extracted from prediction timestamps.
    Observations at similar seasonal positions (e.g. same day of week,
    same month of year) receive higher weights.

    Timestamps are converted to step indices relative to the first
    observed timestamp, then encoded as sin/cos pairs at the specified
    seasonal periods. Distances in this feature space are converted to
    weights using the same softmax formula as ``DistanceSimilarity``.

    Parameters
    ----------
    seasonalities : list of float
        Seasonal periods in time steps (e.g. ``[7.0, 365.25]`` for
        weekly and yearly cycles on daily data).
    harmonics : dict mapping float to list of int, or None, default=None
        Harmonics to include per seasonality period. Keys must match
        entries in ``seasonalities``. Each value is a list of positive
        integers specifying which harmonics to use. Defaults to
        ``{s: [1]}`` for each ``s`` in ``seasonalities``.
    metric : str, default="euclidean"
        Distance metric for ``scipy.spatial.distance.cdist``.
    metric_params : dict or None, default=None
        Additional keyword arguments forwarded to the distance metric.

    Attributes
    ----------
    first_time_ : datetime
        Reference timestamp from the first calibration prediction.
    interval_td_ : timedelta
        Time interval between consecutive timestamps, auto-detected
        from calibration data.

    Notes
    -----
    Sin/cos encoding ensures that cyclic distances are correctly
    captured (e.g. December 31 is close to January 1). Multiple
    seasonalities combine naturally by concatenating feature vectors.

    The weight normalisation matches ``DistanceSimilarity`` exactly:

    $$w_{ji} = \frac{\exp(-d(x_j, x_i))}{\sum_k \exp(-d(x_j, x_k))} \cdot n_{\text{features}}$$

    followed by

    $$w_{ji} \leftarrow \frac{w_{ji}}{1 + \sum_k w_{jk}}$$

    This reserves probability mass for a uniform component, following
    the conformal prediction literature.

    See Also
    --------
    - [`DistanceSimilarity`][yohou.interval.similarity.DistanceSimilarity] : Value-based distance similarity.
    - [`BaseSimilarity`][yohou.interval.base.BaseSimilarity] : Abstract similarity base class.

    Examples
    --------
    >>> from datetime import datetime, timedelta
    >>> import polars as pl
    >>> import numpy as np
    >>> from yohou.interval.similarity import TemporalSimilarity
    >>>
    >>> # Daily data with 3 weeks of calibration
    >>> dates = [datetime(2021, 1, 1) + timedelta(days=i) for i in range(21)]
    >>> y = pl.DataFrame({"time": dates, "value": np.random.randn(21)})
    >>> y_pred = pl.DataFrame({"time": dates, "value": np.random.randn(21)})
    >>>
    >>> # Fit with weekly seasonality
    >>> sim = TemporalSimilarity(seasonalities=[7.0])
    >>> _ = sim.fit(y, y_pred)
    >>>
    >>> # Predict weights for a new Monday
    >>> new_date = [datetime(2021, 1, 22)]
    >>> y_pred_new = pl.DataFrame({"time": new_date, "value": [0.5]})
    >>> weights = sim.predict(y_pred_new)
    >>> weights.shape
    (1, 21)

    """

    _parameter_constraints: dict = {
        "seasonalities": [list],
        "harmonics": [dict, None],
        "metric": [str],
        "metric_params": [dict, None],
    }

    def __init__(
        self,
        seasonalities: list[float] | None = None,
        harmonics: dict[float, list[int]] | None = None,
        metric: str = "euclidean",
        metric_params: dict[str, object] | None = None,
    ) -> None:
        self.seasonalities = seasonalities
        self.harmonics = harmonics
        self.metric = metric
        self.metric_params = metric_params if metric_params is not None else {}

    def _resolve_harmonics(self) -> dict[float, list[int]]:
        """Resolve harmonics, defaulting to first harmonic per seasonality.

        Returns
        -------
        dict[float, list[int]]
            Mapping from seasonality period to list of harmonic indices.

        """
        if self.harmonics is not None:
            return self.harmonics
        return {s: [1] for s in self.seasonalities}  # ty: ignore[not-iterable]

    def _extract_features(self, times: pl.Series) -> np.ndarray:
        """Extract Fourier features from a datetime series.

        Parameters
        ----------
        times : pl.Series
            Datetime series from which to compute features.

        Returns
        -------
        np.ndarray
            Feature matrix of shape ``(len(times), n_features)``.

        """
        time_diff = times - self.first_time_
        t = time_diff.dt.total_seconds().to_numpy().astype(np.float64)
        interval_seconds = self.interval_td_.total_seconds()
        if interval_seconds != 0:
            t = t / interval_seconds

        harmonics = self._resolve_harmonics()
        features = []
        for s in self.seasonalities:  # ty: ignore[not-iterable]
            for k in harmonics.get(s, [1]):
                angle = 2.0 * math.pi * k * t / s
                features.append(np.sin(angle))
                features.append(np.cos(angle))

        return np.column_stack(features)

    def fit(
        self,
        y: pl.DataFrame,
        y_pred: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
    ) -> "TemporalSimilarity":
        """Fit the temporal similarity from calibration predictions.

        Auto-detects the time interval from consecutive timestamps in
        ``y_pred`` and stores a reference timestamp and Fourier feature
        matrix for later distance computation.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series (unused, accepted for API consistency).
        y_pred : pl.DataFrame
            Point forecast time series with a ``"time"`` column.
        X_actual : pl.DataFrame or None, default=None
            Exogenous features (unused, accepted for API consistency).

        Returns
        -------
        self

        """
        if self.seasonalities is None or len(self.seasonalities) == 0:
            raise ValueError("seasonalities must be a non-empty list of floats")

        times = y_pred["time"]
        self.first_time_ = times[0]

        if len(times) > 1:
            self.interval_td_ = times[1] - times[0]
        else:
            self.interval_td_ = timedelta(0)

        self._features_observed = self._extract_features(times)
        self._n_features = self._features_observed.shape[1]

        return self

    def observe(
        self,
        y: pl.DataFrame,
        y_pred: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
    ) -> "TemporalSimilarity":
        """Observe new data and extend the reference feature matrix.

        Parameters
        ----------
        y : pl.DataFrame
            New target observations (unused, accepted for API
            consistency).
        y_pred : pl.DataFrame
            New predictions with a ``"time"`` column.
        X_actual : pl.DataFrame or None, default=None
            Exogenous features (unused, accepted for API consistency).

        Returns
        -------
        self

        """
        new_features = self._extract_features(y_pred["time"])
        self._features_observed = np.vstack([self._features_observed, new_features])
        return self

    def rewind(
        self,
        y: pl.DataFrame,
        y_pred: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
    ) -> "TemporalSimilarity":
        """Rewind the most recently observed data.

        Removes the last ``len(y)`` rows from the internal feature
        matrix, reversing the effect of the corresponding ``observe()``
        call.

        Parameters
        ----------
        y : pl.DataFrame
            Target observations to rewind (used only for row count).
        y_pred : pl.DataFrame
            Predictions to rewind (used only for row count).
        X_actual : pl.DataFrame or None, default=None
            Exogenous features to rewind (unused).

        Returns
        -------
        self

        """
        n_rewind = len(y)
        self._features_observed = self._features_observed[: len(self._features_observed) - n_rewind]
        return self

    def predict(
        self,
        y_pred: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
    ) -> np.ndarray[tuple[int, int], np.dtype[np.floating[Any]]]:
        """Compute temporal similarity weights for new predictions.

        Parameters
        ----------
        y_pred : pl.DataFrame
            New predictions with a ``"time"`` column.
        X_actual : pl.DataFrame or None, default=None
            Exogenous features (unused).

        Returns
        -------
        np.ndarray
            Weight matrix of shape ``(n_predictions, n_calibration)``.

        """
        new_features = self._extract_features(y_pred["time"])

        distances: np.ndarray = cdist(
            new_features,
            self._features_observed,
            metric=self.metric,
            **self.metric_params,
        )  # ty: ignore[no-matching-overload]
        weights = np.exp(-distances)

        weights = weights / np.sum(weights, axis=1)[:, np.newaxis] * self._n_features
        weights = weights / (1 + np.sum(weights, axis=1)[:, np.newaxis])

        return weights

Methods

fit(y, y_pred, X_actual=None)

Fit the temporal similarity from calibration predictions.

Auto-detects the time interval from consecutive timestamps in y_pred and stores a reference timestamp and Fourier feature matrix for later distance computation.

Parameters
Name Type Description Default
y DataFrame

Target time series (unused, accepted for API consistency).

required
y_pred DataFrame

Point forecast time series with a "time" column.

required
X_actual DataFrame or None

Exogenous features (unused, accepted for API consistency).

None
Returns
Type Description
self
Source Code
Show/Hide source
def fit(
    self,
    y: pl.DataFrame,
    y_pred: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
) -> "TemporalSimilarity":
    """Fit the temporal similarity from calibration predictions.

    Auto-detects the time interval from consecutive timestamps in
    ``y_pred`` and stores a reference timestamp and Fourier feature
    matrix for later distance computation.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series (unused, accepted for API consistency).
    y_pred : pl.DataFrame
        Point forecast time series with a ``"time"`` column.
    X_actual : pl.DataFrame or None, default=None
        Exogenous features (unused, accepted for API consistency).

    Returns
    -------
    self

    """
    if self.seasonalities is None or len(self.seasonalities) == 0:
        raise ValueError("seasonalities must be a non-empty list of floats")

    times = y_pred["time"]
    self.first_time_ = times[0]

    if len(times) > 1:
        self.interval_td_ = times[1] - times[0]
    else:
        self.interval_td_ = timedelta(0)

    self._features_observed = self._extract_features(times)
    self._n_features = self._features_observed.shape[1]

    return self

observe(y, y_pred, X_actual=None)

Observe new data and extend the reference feature matrix.

Parameters
Name Type Description Default
y DataFrame

New target observations (unused, accepted for API consistency).

required
y_pred DataFrame

New predictions with a "time" column.

required
X_actual DataFrame or None

Exogenous features (unused, accepted for API consistency).

None
Returns
Type Description
self
Source Code
Show/Hide source
def observe(
    self,
    y: pl.DataFrame,
    y_pred: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
) -> "TemporalSimilarity":
    """Observe new data and extend the reference feature matrix.

    Parameters
    ----------
    y : pl.DataFrame
        New target observations (unused, accepted for API
        consistency).
    y_pred : pl.DataFrame
        New predictions with a ``"time"`` column.
    X_actual : pl.DataFrame or None, default=None
        Exogenous features (unused, accepted for API consistency).

    Returns
    -------
    self

    """
    new_features = self._extract_features(y_pred["time"])
    self._features_observed = np.vstack([self._features_observed, new_features])
    return self

rewind(y, y_pred, X_actual=None)

Rewind the most recently observed data.

Removes the last len(y) rows from the internal feature matrix, reversing the effect of the corresponding observe() call.

Parameters
Name Type Description Default
y DataFrame

Target observations to rewind (used only for row count).

required
y_pred DataFrame

Predictions to rewind (used only for row count).

required
X_actual DataFrame or None

Exogenous features to rewind (unused).

None
Returns
Type Description
self
Source Code
Show/Hide source
def rewind(
    self,
    y: pl.DataFrame,
    y_pred: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
) -> "TemporalSimilarity":
    """Rewind the most recently observed data.

    Removes the last ``len(y)`` rows from the internal feature
    matrix, reversing the effect of the corresponding ``observe()``
    call.

    Parameters
    ----------
    y : pl.DataFrame
        Target observations to rewind (used only for row count).
    y_pred : pl.DataFrame
        Predictions to rewind (used only for row count).
    X_actual : pl.DataFrame or None, default=None
        Exogenous features to rewind (unused).

    Returns
    -------
    self

    """
    n_rewind = len(y)
    self._features_observed = self._features_observed[: len(self._features_observed) - n_rewind]
    return self

predict(y_pred, X_actual=None)

Compute temporal similarity weights for new predictions.

Parameters
Name Type Description Default
y_pred DataFrame

New predictions with a "time" column.

required
X_actual DataFrame or None

Exogenous features (unused).

None
Returns
Type Description
ndarray

Weight matrix of shape (n_predictions, n_calibration).

Source Code
Show/Hide source
def predict(
    self,
    y_pred: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
) -> np.ndarray[tuple[int, int], np.dtype[np.floating[Any]]]:
    """Compute temporal similarity weights for new predictions.

    Parameters
    ----------
    y_pred : pl.DataFrame
        New predictions with a ``"time"`` column.
    X_actual : pl.DataFrame or None, default=None
        Exogenous features (unused).

    Returns
    -------
    np.ndarray
        Weight matrix of shape ``(n_predictions, n_calibration)``.

    """
    new_features = self._extract_features(y_pred["time"])

    distances: np.ndarray = cdist(
        new_features,
        self._features_observed,
        metric=self.metric,
        **self.metric_params,
    )  # ty: ignore[no-matching-overload]
    weights = np.exp(-distances)

    weights = weights / np.sum(weights, axis=1)[:, np.newaxis] * self._n_features
    weights = weights / (1 + np.sum(weights, axis=1)[:, np.newaxis])

    return weights