ExpandingWindowSplitter¶

`yohou.model_selection.split.ExpandingWindowSplitter` ¶

Bases: BaseSplitter

Expanding window time series cross-validation splitter.

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.

The training set grows with each split (expanding window), meaning successive training sets are supersets of those that come before them. This is useful when more data generally leads to better models and when you want to simulate accumulating historical data over time.

Parameters¶

Name	Type	Description	Default
`n_splits`	`int`	Number of splits. Must be at least 2.	`3`
`max_train_size`	`int`	Maximum size for a single training set. If None, all available training data is used.	`None`
`test_size`	`int`	Used to limit the size of the test set. Defaults to `n_samples // (n_splits + 1)`, which is the maximum allowed value with no overlap between test sets.	`None`

Examples¶

>>> import polars as pl
>>> from datetime import datetime, timedelta
>>> from yohou.model_selection import ExpandingWindowSplitter
>>>
>>> # Create time series
>>> time = [datetime(2020, 1, 1) + timedelta(days=i) for i in range(100)]
>>> y = pl.DataFrame({"time": time, "value": range(100)})
>>>
>>> # 3 splits with 10-day test windows
>>> splitter = ExpandingWindowSplitter(n_splits=3, test_size=10)
>>> splits = list(splitter.split(y))
>>> len(splits)
3
>>>
>>> # First split: train on [0:70], test on [70:80]
>>> train, test = splits[0]
>>> len(train), len(test)
(70, 10)
>>>
>>> # Second split: train on [0:80], test on [80:90] (training set grows)
>>> train, test = splits[1]
>>> len(train), len(test)
(80, 10)
>>>

Notes¶

Training sets grow with each split (expanding window)
Test sets do not overlap
All data is used in temporal order
For panel data, splits all groups together using row indices

Source Code¶

View on GitHub

Show/Hide sourceclass ExpandingWindowSplitter(BaseSplitter):
    """Expanding window time series cross-validation splitter.

    Provides train/test indices to split time series data samples
    that are observed at fixed time intervals, in train/test sets.
    In each split, test indices must be higher than before, and thus
    shuffling in cross validator is inappropriate.

    The training set grows with each split (expanding window), meaning
    successive training sets are supersets of those that come before them.
    This is useful when more data generally leads to better models and
    when you want to simulate accumulating historical data over time.

    Parameters
    ----------
    n_splits : int, default=3
        Number of splits. Must be at least 2.
    max_train_size : int, default=None
        Maximum size for a single training set. If None, all available
        training data is used.
    test_size : int, default=None
        Used to limit the size of the test set. Defaults to
        ``n_samples // (n_splits + 1)``, which is the maximum allowed
        value with no overlap between test sets.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime, timedelta
    >>> from yohou.model_selection import ExpandingWindowSplitter
    >>>
    >>> # Create time series
    >>> time = [datetime(2020, 1, 1) + timedelta(days=i) for i in range(100)]
    >>> y = pl.DataFrame({"time": time, "value": range(100)})
    >>>
    >>> # 3 splits with 10-day test windows
    >>> splitter = ExpandingWindowSplitter(n_splits=3, test_size=10)
    >>> splits = list(splitter.split(y))
    >>> len(splits)
    3
    >>>
    >>> # First split: train on [0:70], test on [70:80]
    >>> train, test = splits[0]
    >>> len(train), len(test)
    (70, 10)
    >>>
    >>> # Second split: train on [0:80], test on [80:90] (training set grows)
    >>> train, test = splits[1]
    >>> len(train), len(test)
    (80, 10)
    >>>

    Notes
    -----
    - Training sets grow with each split (expanding window)
    - Test sets do not overlap
    - All data is used in temporal order
    - For panel data, splits all groups together using row indices

    See Also
    --------
    - [`SlidingWindowSplitter`][yohou.model_selection.split.SlidingWindowSplitter] : Fixed-size rolling window splitter

    """

    _parameter_constraints: dict = {
        "n_splits": [Interval(numbers.Integral, 2, None, closed="left")],
        "max_train_size": [Interval(numbers.Integral, 1, None, closed="left"), None],
        "test_size": [Interval(numbers.Integral, 1, None, closed="left"), None],
    }

    _tags: ClassVar[dict[str, Any]] = {"splitter_type": "expanding"}

    def __init__(
        self,
        n_splits: int = 3,
        *,
        max_train_size: int | None = None,
        test_size: int | None = None,
    ) -> None:
        self.n_splits = n_splits
        self.max_train_size = max_train_size
        self.test_size = test_size

        # Validate parameters
        self._validate_params()

    def split(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
    ) -> Iterator[tuple[np.ndarray[Any, np.dtype[np.intp]], np.ndarray[Any, np.dtype[np.intp]]]]:
        """Generate indices to split time series data with expanding windows.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series used to generate train/test split indices.
            Must have a ``"time"`` column.
        X_actual : pl.DataFrame or None, default=None
            Actual features.  Not used for splitting but accepted for
            API consistency.

        Yields
        ------
        train : ndarray
            Training set row indices for that split.
        test : ndarray
            Test set row indices for that split.

        """
        # Validate data
        y, X_actual = validate_splitter_data(self, y=y, X_actual=X_actual)

        n_samples = len(y)
        indices = np.arange(n_samples)
        max_train_size = self.max_train_size

        # Delegate to concrete implementation
        for test_index in self._iter_test_indices(y, X_actual):
            train_end = test_index[0]
            train_index = indices[indices < train_end]

            # Apply max_train_size if specified
            if max_train_size is not None and len(train_index) > max_train_size:
                train_index = train_index[-max_train_size:]

            yield train_index, test_index

    def _iter_test_indices(
        self,
        y: pl.DataFrame,
        X_actual: pl.DataFrame | None = None,
    ) -> Iterator[np.ndarray[Any, np.dtype[np.intp]]]:
        """Generate test indices for expanding window splits.

        Parameters
        ----------
        y : pl.DataFrame
            Target time series.
        X_actual : pl.DataFrame or None, default=None
            Actual features. Not used for splitting but accepted for
            API consistency.

        Yields
        ------
        test : ndarray
            Test set indices for this split.

        """
        n_samples = len(y)
        n_splits = self.n_splits
        n_folds = n_splits + 1
        test_size = self.test_size if self.test_size is not None else n_samples // n_folds

        if n_folds > n_samples:
            raise ValueError(f"Cannot have number of folds={n_folds} greater than the number of samples={n_samples}.")

        if test_size >= n_samples:
            raise ValueError(f"test_size={test_size} should be less than the number of samples={n_samples}.")

        test_starts = range(n_samples - n_splits * test_size, n_samples, test_size)

        for test_start in test_starts:
            if test_start < 0:
                continue
            yield np.arange(test_start, test_start + test_size, dtype=np.intp)

    def get_n_splits(
        self,
        y: pl.DataFrame | None = None,
        X_actual: pl.DataFrame | None = None,
    ) -> int:
        """Return the number of cross-validation folds.

        Parameters
        ----------
        y : pl.DataFrame or None, default=None
            Not used.  Accepted for API consistency.
        X_actual : pl.DataFrame or None, default=None
            Not used.  Accepted for API consistency.

        Returns
        -------
        int
            The number of cross-validation folds.

        """
        return self.n_splits

Methods¶

`split(y, X_actual=None)` ¶

Generate indices to split time series data with expanding windows.

Parameters¶

Name	Type	Description	Default
`y`	`DataFrame`	Target time series used to generate train/test split indices. Must have a `"time"` column.	required
`X_actual`	`DataFrame or None`	Actual features. Not used for splitting but accepted for API consistency.	`None`

Yields:

Name	Type	Description
`train`	`ndarray`	Training set row indices for that split.
`test`	`ndarray`	Test set row indices for that split.

Source Code¶

View on GitHub

Show/Hide sourcedef split(
    self,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None = None,
) -> Iterator[tuple[np.ndarray[Any, np.dtype[np.intp]], np.ndarray[Any, np.dtype[np.intp]]]]:
    """Generate indices to split time series data with expanding windows.

    Parameters
    ----------
    y : pl.DataFrame
        Target time series used to generate train/test split indices.
        Must have a ``"time"`` column.
    X_actual : pl.DataFrame or None, default=None
        Actual features.  Not used for splitting but accepted for
        API consistency.

    Yields
    ------
    train : ndarray
        Training set row indices for that split.
    test : ndarray
        Test set row indices for that split.

    """
    # Validate data
    y, X_actual = validate_splitter_data(self, y=y, X_actual=X_actual)

    n_samples = len(y)
    indices = np.arange(n_samples)
    max_train_size = self.max_train_size

    # Delegate to concrete implementation
    for test_index in self._iter_test_indices(y, X_actual):
        train_end = test_index[0]
        train_index = indices[indices < train_end]

        # Apply max_train_size if specified
        if max_train_size is not None and len(train_index) > max_train_size:
            train_index = train_index[-max_train_size:]

        yield train_index, test_index

`get_n_splits(y=None, X_actual=None)` ¶

Return the number of cross-validation folds.

Parameters¶

Name	Type	Description	Default
`y`	`DataFrame or None`	Not used. Accepted for API consistency.	`None`
`X_actual`	`DataFrame or None`	Not used. Accepted for API consistency.	`None`

Returns¶

Type	Description
`int`	The number of cross-validation folds.

Source Code¶

View on GitHub

Show/Hide sourcedef get_n_splits(
    self,
    y: pl.DataFrame | None = None,
    X_actual: pl.DataFrame | None = None,
) -> int:
    """Return the number of cross-validation folds.

    Parameters
    ----------
    y : pl.DataFrame or None, default=None
        Not used.  Accepted for API consistency.
    X_actual : pl.DataFrame or None, default=None
        Not used.  Accepted for API consistency.

    Returns
    -------
    int
        The number of cross-validation folds.

    """
    return self.n_splits

Tutorials¶

The following example notebooks use this component:

How to Tune Fourier Seasonality Terms

Data-Features

Explore how Fourier harmonic count affects seasonal fit quality, compare Fourier vs Pattern seasonality, and tune harmonics jointly with GridSearchCV.

View · Open in marimo
How to Handle Short Series

Data-Features

Use Fourier seasonality, simple train/test splits, and panel pooling when individual series are too short for standard approaches.

View · Open in marimo
Cross-Validation for Time Series

Evaluation-Search

Evaluate forecasters with cross_val_score, cross_validate, and cross_val_predict using temporal splitters.

View · Open in marimo
How to Run Panel Cross-Validation

Panel-Data

Time series cross-validation on panel data with GridSearchCV, selective group observation, rewind operations, and groupwise performance comparison.

View · Open in marimo

ExpandingWindowSplitter¶

yohou.model_selection.split.ExpandingWindowSplitter ¶

Parameters¶

Examples¶

Notes¶

See Also¶

Source Code¶

Methods¶

split(y, X_actual=None) ¶

Parameters¶

Source Code¶

get_n_splits(y=None, X_actual=None) ¶

Parameters¶

Returns¶

Source Code¶

Tutorials¶

`yohou.model_selection.split.ExpandingWindowSplitter` ¶

`split(y, X_actual=None)` ¶

`get_n_splits(y=None, X_actual=None)` ¶