Forecasting Workflow¶
In this tutorial, we will evaluate two forecasters using temporal cross-validation, search for the best hyperparameters, and inspect residuals to diagnose model weaknesses.
Try it interactively
Evaluate forecasters with cross-validation, search hyperparameters with GridSearchCV, and inspect residuals to diagnose model weaknesses.
ViewOpen in marimoPrerequisites¶
- Completed Getting Started
Setup¶
We use the monthly tourism dataset: 187 months of visitor arrivals to a single Australian region (T1). First, load the data and define a 12-month forecasting horizon:
from yohou.datasets import fetch_tourism_monthly
from yohou.model_selection import train_test_split
bunch = fetch_tourism_monthly()
y = (
bunch.frame
.select("time", "T1__tourists")
.drop_nulls()
.rename({"T1__tourists": "tourists"})
)
forecasting_horizon = 12
y_train, y_test = train_test_split(y, test_size=forecasting_horizon)
Next, fit a SeasonalNaive baseline:
from yohou.point import SeasonalNaive
baseline = SeasonalNaive(seasonality=12)
baseline.fit(y_train, forecasting_horizon=forecasting_horizon)
y_pred_baseline = baseline.predict(forecasting_horizon=forecasting_horizon)
Now build a Ridge pipeline with SeasonalDifferencing and lag features. If the pipeline looks unfamiliar, see Getting Started for a step-by-step walkthrough:
from sklearn.linear_model import Ridge
from yohou.compose import FeaturePipeline
from yohou.point import PointReductionForecaster
from yohou.preprocessing import LagTransformer
from yohou.stationarity import SeasonalDifferencing
forecaster = PointReductionForecaster(
estimator=Ridge(),
target_transformer=SeasonalDifferencing(seasonality=12),
feature_transformer=FeaturePipeline([
("lags", LagTransformer(lag=[1, 2, 3, 12])),
]),
)
forecaster.fit(y_train, forecasting_horizon=forecasting_horizon)
y_pred_ridge = forecaster.predict(forecasting_horizon=forecasting_horizon)
Score with Multiple Metrics¶
Now score both models on the single train/test split. Scorers in Yohou are stateful: call fit(y_train) first so that scale-dependent metrics like MeanAbsoluteScaledError can normalise correctly:
from yohou.metrics import MeanAbsoluteError, MeanAbsoluteScaledError
mae = MeanAbsoluteError()
mae.fit(y_train)
mase = MeanAbsoluteScaledError(seasonality=12)
mase.fit(y_train)
for name, y_pred in [("SeasonalNaive", y_pred_baseline), ("Ridge", y_pred_ridge)]:
print(f"{name:15s} MAE={mae.score(y_test, y_pred):.2f} MASE={mase.score(y_test, y_pred):.2f}")
Notice that both MASE values are above 1.0, meaning neither model outperforms the seasonal naive baseline on this single holdout. Cross-validation across multiple folds will tell us whether this pattern holds.
Evaluate with Cross-Validation and Hyperparameter Search¶
ExpandingWindowSplitter creates multiple temporal train/test folds by growing the training window. GridSearchCV evaluates each parameter combination across all folds and selects the best:
from yohou.model_selection import ExpandingWindowSplitter, GridSearchCV
cv = ExpandingWindowSplitter(n_splits=3, test_size=forecasting_horizon)
search = GridSearchCV(
forecaster=forecaster,
param_grid={"estimator__alpha": [0.1, 1.0, 10.0, 100.0]},
scoring=MeanAbsoluteScaledError(seasonality=12),
cv=cv,
)
search.fit(y, forecasting_horizon=forecasting_horizon)
print(f"Best params: {search.best_params_}")
print(f"CV MASE: {-search.best_score_:.2f}")
Notice that best_score_ is negative. Yohou follows scikit-learn's convention of negating scores so that higher is always better. Negate it to recover the actual MASE.
The CV MASE of 0.87 is below 1.0, confirming that Ridge consistently outperforms the seasonal naive baseline across all three folds. The single holdout was harder than average.
Inspect Residuals¶
Let's refit the best forecaster from the search on the training data and inspect what the model gets wrong with plot_residuals:
from yohou.plotting import plot_residuals
best = search.best_forecaster_
best.fit(y_train, forecasting_horizon=forecasting_horizon)
y_pred_tuned = best.predict(forecasting_horizon=forecasting_horizon)
plot_residuals(y_pred_tuned, y_test, title="Residuals: Ridge (Tuned)")
You should see a scatter of residuals over the test period. If the residuals cluster near zero with no obvious pattern, the model is capturing the main signal. Spikes at seasonal lags or a visible trend suggest missing structure. See Residual Diagnostics for a full interpretation guide.
Compare Models Visually¶
Now plot both forecasts against the actual test values:
from yohou.plotting import plot_forecast, plot_score_summary
plot_forecast(
y_test,
{"SeasonalNaive": y_pred_baseline, "Ridge (Tuned)": y_pred_tuned},
y_train=y_train,
n_history=36,
title="Model Comparison: Tourism Forecast",
y_label="Monthly visitors",
)
plot_score_summary(
{"MAE": mae, "MASE": mase},
y_test,
{"SeasonalNaive": y_pred_baseline, "Ridge (Tuned)": y_pred_tuned},
title="Score Comparison",
)
Notice how plot_forecast overlays predicted and actual values so you can spot where each model over- or under-shoots. plot_score_summary condenses the comparison into a single bar chart.
What You Built¶
You have completed the full evaluation workflow:
- Scored models with
MeanAbsoluteErrorandMeanAbsoluteScaledErroron a single train/test split - Used
ExpandingWindowSplitterandGridSearchCVto evaluate across temporal folds and tune hyperparameters - Refitted the best forecaster and inspected residuals with
plot_residuals - Compared forecasts visually with
plot_forecastandplot_score_summary
Next Steps¶
- Exogenous Features: Add external regressors to your forecasting pipeline
- Model Selection: Expanding vs. sliding windows, fold design, and when CV estimates are trustworthy
- Forecast Accuracy: When to use MAE, MASE, or percentage metrics
- Choose a Forecasting Method: Try nonlinear regressors and compare estimator families
- Interval Forecasting: Add prediction intervals with
SplitConformalForecaster