Cross-Validation Splitters¶
In this tutorial, we will create temporal cross-validation folds that respect time ordering, visualize them with plot_splits, and compare how expanding and sliding window strategies affect fold geometry. Along the way, we will control training set growth with max_train_size and adjust fold overlap with stride.
Try it interactively
Demonstrate ExpandingWindowSplitter and SlidingWindowSplitter for temporal cross-validation with configurable test_size, stride, and fold visualisation.
ViewOpen in marimoPrerequisites¶
- Completed Getting Started
1. Prepare Data¶
We use a single series from the Tourism Monthly dataset: 187 months of visitor arrivals from January 1979 to July 1994.
from yohou.datasets import fetch_tourism_monthly
bunch = fetch_tourism_monthly(n_series=1)
y = bunch.frame
print(f"Total rows: {len(y)}")
print(y.head(3))
Total rows: 187
shape: (3, 2)
┌─────────────────────┬──────────┐
│ time ┆ tourists │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪══════════╡
│ 1979-01-01 00:00:00 ┆ 1149.87 │
│ 1979-02-01 00:00:00 ┆ 1053.8 │
│ 1979-03-01 00:00:00 ┆ 1388.88 │
└─────────────────────┴──────────┘
2. Expanding Window¶
ExpandingWindowSplitter grows the training set with each fold while keeping the test window fixed. Each successive training set is a superset of the previous one:
from yohou.model_selection import ExpandingWindowSplitter
from yohou.plotting import plot_splits
expanding = ExpandingWindowSplitter(n_splits=4, test_size=12)
print(f"Number of folds: {expanding.get_n_splits()}")
Visualize the folds with plot_splits:
Notice that each fold's training region (left bar) grows longer while the test region (right bar) stays the same width. Fold 1 uses the least training data; fold 4 uses the most. The test windows do not overlap, so each fold evaluates a different time period.
3. Sliding Window¶
SlidingWindowSplitter uses a fixed-size training window that slides forward. This is useful when older data may no longer be representative (concept drift):
Concept drift
Many real-world time series change their statistical properties over time. Consumer behavior shifts, markets evolve, and sensor characteristics degrade. When the data-generating process changes, older observations can mislead the model. A sliding window limits training to recent data, allowing the model to adapt to the current regime rather than averaging across regimes that may no longer be relevant.
from yohou.model_selection import SlidingWindowSplitter
sliding = SlidingWindowSplitter(n_splits=4, train_size=60, test_size=12)
fig = plot_splits(y, sliding)
fig.show()
Notice that every fold's training region is the same width (60 rows). Both the training and test windows move forward together. The oldest data drops off as newer data enters, keeping the model focused on recent patterns.
4. Control the Training Size¶
ExpandingWindowSplitter can cap the maximum training size with max_train_size. This combines the benefits of using recent data with some expansion:
expanding_capped = ExpandingWindowSplitter(
n_splits=4, test_size=12, max_train_size=80
)
fig = plot_splits(y, expanding_capped)
fig.show()
Notice that the early folds grow as before, but once the training region reaches 80 rows it stops expanding. This gives you some of the recency benefit of a sliding window while still using more data than a fixed window in the early folds.
5. Sliding Window with Stride¶
By default, SlidingWindowSplitter moves forward by test_size rows between folds. The stride parameter lets you control the step size independently:
sliding_stride = SlidingWindowSplitter(
n_splits=4, train_size=60, test_size=12, stride=6
)
fig = plot_splits(y, sliding_stride)
fig.show()
Notice that consecutive test windows now overlap: each fold advances by only 6 rows instead of 12. Overlapping folds produce more evaluation points from the same data, but the fold scores will be correlated because they share test observations.
What You Built¶
In this tutorial, we:
- Created temporal cross-validation folds with
ExpandingWindowSplitterandSlidingWindowSplitter - Visualized splits with
plot_splitsto verify temporal ordering and fold geometry - Capped training growth with
max_train_sizeto balance data volume against recency - Controlled fold overlap with
strideto trade more evaluation points for correlated scores
Next Steps¶
- Forecasting Workflow to use splitters inside
GridSearchCV - Model Selection for the conceptual background on temporal CV
- How to Tune Hyperparameters for using splitters inside grid search and randomized search