How to Forecast with Class Probabilities¶
Point forecasters can predict categorical targets as hard labels. When you also need per-class probabilities (for risk-aware decisions, calibration analysis, or proper scoring rules), use a class-probability forecaster instead. This guide covers the probability-aware workflow.
Prerequisites¶
- Familiarity with the fit-predict workflow (Getting Started)
- Familiarity with train/test evaluation (Evaluate Forecast Accuracy)
Try it interactively
Evaluate categorical forecasts with LogLoss, BrierScore, and Accuracy. Covers per-timestep scoring, aggregation modes, and reliability diagrams.
ViewOpen in marimoUse ClassProbaReductionForecaster to produce calibrated probability forecasts and evaluate them with Brier score, log loss, and accuracy.
ViewOpen in marimoPrepare Data and Train/Test Split¶
Use one of the built-in classification datasets, or prepare your own DataFrame
with a "time" column and one or more string-valued target columns. Split
the data before fitting so the evaluation later reflects true out-of-sample
performance:
from yohou.datasets import fetch_air_quality_classification
from yohou.model_selection import train_test_split
data = fetch_air_quality_classification()
y = data.y
# DataFrame with "time" and "air_quality" columns
# air_quality values: "good", "moderate", "unhealthy", "hazardous"
y_train, y_test = train_test_split(y, test_size=24)
Fit a Class-Probability Forecaster¶
ClassProbaReductionForecaster wraps any scikit-learn classifier that supports
predict_proba(). The default estimator is
LogisticRegression;
any classifier with fit(), predict(), and predict_proba() works:
from sklearn.ensemble import GradientBoostingClassifier
from yohou.class_proba import ClassProbaReductionForecaster
forecaster = ClassProbaReductionForecaster(
estimator=GradientBoostingClassifier(n_estimators=100),
)
forecaster.fit(y_train, forecasting_horizon=24)
Get Predictions¶
Soft probabilities (recommended for decision-making):
y_proba = forecaster.predict_class_proba()
# Columns: time, vintage_time, air_quality_proba_good, air_quality_proba_moderate, ...
Hard labels (argmax of probabilities):
Evaluate with Classification Metrics¶
Proper scoring rules give reliable model comparisons because they reward
calibrated probabilities. Prefer LogLoss or BrierScore over
Accuracy for model selection:
from yohou.metrics import LogLoss, BrierScore, Accuracy
log_loss = LogLoss().fit(y_train).score(y_test, y_proba)
brier = BrierScore().fit(y_train).score(y_test, y_proba)
accuracy = Accuracy().fit(y_train).score(y_test, y_pred)
Visualize Results¶
plot_forecast auto-detects categorical and probability columns:
from yohou.plotting import plot_forecast
# Hard labels: step chart
plot_forecast(y_test, y_pred)
# Probabilities: stacked area chart
plot_forecast(y_test, y_proba)
Use plot_calibration to assess whether predicted probabilities match observed
frequencies:
See Also¶
- Class-Probability Forecasting for theory and mathematical details
- Evaluate Forecast Accuracy for the complete metrics guide
- API Reference: yohou.class_proba