Class-Probability Forecasting¶
Time series forecasting conventionally targets numeric values, but many real-world problems involve categorical outcomes: air quality levels (Good, Moderate, Unhealthy), demand categories (Low, Normal, High), or equipment states (Running, Idle, Fault). Class-probability forecasting extends Yohou's fit-observe-predict workflow to these settings, producing a probability distribution over categories at each future timestep rather than a single numeric value. The result is a per-timestep probability simplex: a vector of non-negative values summing to one, representing the model's belief about the likelihood of each class.
Prediction Types¶
Yohou supports three kinds of predictions, each suited to a different target type:
- Point predictions: \(\hat{y}_t \in \mathbb{R}\), a single numeric value per timestep.
- Interval predictions: \([\hat{y}_t^L, \hat{y}_t^U]\), bounds on a numeric value with coverage guarantees.
- Class-probability predictions: \(\hat{p}_t \in \Delta^{K-1}\), a probability simplex over \(K\) categorical classes.
where \(\Delta^{K-1} = \{p \in \mathbb{R}^K : p_k \geq 0, \sum_{k=1}^K p_k = 1\}\).
Point and interval forecasters operate on continuous numeric targets.
Class-probability forecasters operate on categorical targets (string or integer
labels). Internally, the categories are label-encoded to integers for model
training and decoded back to the original labels at prediction time. The encoding
sorts classes alphabetically, so for classes ["sunny", "rainy", "cloudy"] the
internal mapping is {"cloudy": 0, "rainy": 1, "sunny": 2}. This mapping is
stored in the label_to_code_ fitted attribute and remains fixed for the
lifetime of the forecaster.
The Reduction Approach¶
ClassProbaReductionForecaster applies the same reduction pattern described in
Reduction Forecasting: it tabularizes a time series into
feature rows and trains a scikit-learn estimator on the result. The difference is that
the estimator is a classifier (any scikit-learn classifier implementing
predict_proba()), and the target is categorical rather than numeric.
The pipeline at fit time adds two steps before the standard tabularization:
- Class discovery: unique class labels are extracted from the target column(s) and sorted alphabetically.
- Label encoding: categorical targets are converted to integer codes (e.g.,
{"cloudy": 0, "rainy": 1, "sunny": 2}). - Tabularization and fitting: the encoded series is tabularized and the classifier is trained, following the same mechanics as
PointReductionForecaster.
At prediction time, the classifier's predict_proba() output is mapped back to
columns named after the original class labels.
All reduction concepts (target and feature transformers, target_as_feature,
step_feature_alignment, sample weighting) work the same way as for point
forecasters. See Reduction Forecasting for the full
treatment.
Multi-step strategies¶
ClassProbaReductionForecaster supports the "multi-output" and "direct"
reduction strategies. The "dir-rec" strategy available for point and interval
forecasters is not supported. Recursive probability chaining requires feeding
predicted class labels back as features, and errors in early steps compound
through the chain. For numeric targets this is manageable, but for categorical
targets a single misclassified step can shift the entire downstream feature
distribution. The observe() method provides a manual alternative for
step-by-step forecasting when needed.
Predictions: Hard Labels vs. Soft Probabilities¶
A fitted class-probability forecaster offers two prediction methods:
predict_class_proba() returns a DataFrame with probability columns for each
class and target. For a target column "weather" with classes
["cloudy", "rainy", "sunny"], the output contains columns
weather_proba_cloudy, weather_proba_rainy, and weather_proba_sunny.
Each row's probabilities sum to 1.
predict() returns hard class labels by taking the argmax of the probability
distribution: \(\hat{y}_t = \arg\max_k \hat{p}_{t,k}\). This discards
calibration information and returns a single class label per timestep.
Prefer predict_class_proba() when downstream decisions depend on confidence
levels. A weather routing system might treat a 51% chance of rain very differently
from a 95% chance, even though both produce the same hard label.
# Soft probabilities: preserves uncertainty
y_proba = forecaster.predict_class_proba()
# Columns: time, vintage_time, weather_proba_cloudy, weather_proba_rainy, weather_proba_sunny
# Hard labels: argmax only
y_pred = forecaster.predict()
# Columns: time, vintage_time, weather
Multi-target outputs¶
When the training data contains multiple categorical columns, probability columns are produced for each target independently:
Panel data outputs¶
For panel data, group prefixes are prepended with the __ separator:
location_1__weather_proba_cloudy, location_1__weather_proba_rainy,
location_2__weather_proba_cloudy, location_2__weather_proba_rainy
All panel groups sharing a base target name (e.g., weather) must have the
same set of classes.
Streaming with Observe and Rewind¶
Like other Yohou forecasters, class-probability forecasters support streaming
predictions through observe() and rewind(). After fitting, call observe()
to feed new ground-truth observations into the forecaster's buffer, then call
predict_class_proba() or predict() to generate updated predictions that
incorporate the new data. Call rewind() to roll back observations.
This is particularly useful when you need step-by-step forecasting with intermediate decisions: predict one step, observe the outcome, then predict the next step with the updated history.
Scoring and Evaluation¶
Yohou provides three families of scorers for class-probability forecasts:
Proper scoring rules operate directly on predicted probability distributions.
LogLoss,
BrierScore, and
RankedProbabilityScore
are all uniquely minimized when the predicted probabilities match the true class
frequencies. This property makes them the most reliable choice for model
selection. Among these:
LogLosspenalizes confident wrong predictions most harshly (predicting 0.01 for the true class is catastrophic).BrierScoremeasures the mean squared difference between predicted probabilities and one-hot encoded true labels, making it more forgiving of near-misses.RankedProbabilityScorecompares cumulative distributions and respects ordinal class ordering. An optionalclass_orderparameter specifies the ordering explicitly.
Hard-label scorers convert probabilities to class labels via argmax, then
compute standard classification metrics.
Accuracy,
Precision,
Recall, and
FBetaScore
all discard confidence information. Precision, Recall, and FBetaScore
support average modes ("macro", "micro", "weighted") for multiclass
targets.
Ranking scorers evaluate how well predicted probabilities separate classes
across decision thresholds.
ROCAuC and
PRAuC measure
discrimination ability (whether the model assigns higher probabilities to
correct classes) without requiring well-calibrated probability values. Both use
a one-vs-rest strategy for multiclass problems.
For model selection, prefer proper scoring rules when calibration matters. Use hard-label scorers when only the final class assignment matters. Use ranking scorers when you care about the model's ability to distinguish between classes regardless of calibration. See Forecast Accuracy for the mathematical definitions and a broader discussion of proper scoring rules.
Calibration¶
A forecaster is well-calibrated if, across all timesteps where it predicts class \(k\) with probability \(p\), the class \(k\) actually occurs roughly \(p\) fraction of the time. Calibration is distinct from discrimination (the ability to rank likely outcomes higher). A model can discriminate well while producing systematically overconfident or underconfident probabilities.
Calibration matters because consumers of probability forecasts take the numbers at face value. A logistics planner who sees 80% probability of high demand allocates resources accordingly. If the model is overconfident and the true rate is closer to 50%, those resource decisions are systematically wrong.
plot_calibration() produces reliability diagrams that plot predicted
probabilities against observed frequencies. A perfectly calibrated model follows
the diagonal. Deviations above the diagonal indicate underconfidence (predicted
60%, observed 80%); deviations below indicate overconfidence.
The reduction approach inherits the calibration properties of its backbone
classifier. Some classifiers like GradientBoostingClassifier produce
well-calibrated probabilities by default, while others like
RandomForestClassifier may benefit from post-hoc calibration
(e.g., scikit-learn's CalibratedClassifierCV).
Panel Data¶
Class-probability forecasters support panel data natively through the
panel_strategy parameter. With "global" (the default), a single estimator is
trained across all groups. Classes are discovered per base target name (after
stripping group prefixes), so group_0__weather and group_1__weather share
the same class set. Scorers support per-group filtering and weighting through
the groups parameter and "groupwise" aggregation.
Ensembles¶
VotingClassProbaForecaster combines multiple class-probability forecasters using
two methods. Soft voting (the default) averages class probabilities across
base forecasters, optionally with custom weights. It preserves calibration
better than hard voting because it operates on the full probability simplex.
Hard voting lets each base forecaster vote for its argmax class, and the
majority wins. All base forecasters must discover the same classes for a given
target. See Ensemble Forecasting for the general
theory.
Connections¶
- Preprocessing: transformers such as
CalendarFeatureTransformer,FourierFeatureTransformer, andHolidayFeatureTransformersupply exogenous features derived from the time column. - Hyperparameter search:
GridSearchCVandRandomizedSearchCVaccept class-proba scorers such asLogLossas thescoringparameter. - Theory: Forecast Accuracy covers metric theory including proper scoring rules.
- Tutorial: Class-Probability Forecasting walks through a complete classification workflow.
- Practice: How to Forecast with Class Probabilities provides step-by-step recipes.
- API: the full reference is at yohou.class_proba.
- Examples: interactive notebooks are available in the Class-Probability Examples.