Skip to content

make_exogenous_regression

yohou.datasets._generators.make_exogenous_regression(*, n_samples=200, forecasting_horizon=6, noise=0.1, forecast_bias=0.5, random_state=42)

Generate a synthetic regression dataset with exogenous features.

Creates hourly electricity prices driven by temperature and a holiday indicator with a known linear relationship: price = 50 + 2 * temperature + 10 * is_holiday + noise.

Three exogenous feature types are produced:

  • X_actual (observation features): realized temperature readings with a 24 hour sinusoidal cycle.
  • X_future (known future): a deterministic is_holiday indicator (Sundays = 1.0) covering the full time range.
  • X_forecast (external forecasts): weather temperature forecasts with one vintage per observation, each covering the next forecasting_horizon steps. Forecasts carry a small systematic bias relative to actuals.

Parameters

Name Type Description Default
n_samples int

Number of hourly observations.

200
forecasting_horizon int

Number of forward steps per X_forecast vintage.

6
noise float

Standard deviation of the target noise term.

0.1
forecast_bias float

Systematic bias added to weather forecasts relative to actuals.

0.5
random_state int

Seed for reproducibility.

42

Returns

Type Description
Bunch

Dictionary-like object with the following attributes:

y : pl.DataFrame Target with columns ["time", "price"]. X_actual : pl.DataFrame Observation features with columns ["time", "temperature"]. X_future : pl.DataFrame Known future features with columns ["time", "is_holiday"]. X_forecast : pl.DataFrame External forecasts with columns ["vintage_time", "time", "wx_temp"]. One vintage per observation from index forecasting_horizon onward. frame : pl.DataFrame y, X_actual, and X_future joined on "time". X_forecast is excluded because it has a different schema. feature_names : list of str ["temperature", "is_holiday", "wx_temp"]. target_names : list of str ["price"]. frequency : str "1h". DESCR : str Human readable description.

See Also

Examples

>>> from yohou.datasets import make_exogenous_regression
>>> data = make_exogenous_regression(n_samples=100)
>>> data.y.columns
['time', 'price']
>>> data.X_forecast.columns
['vintage_time', 'time', 'wx_temp']

Source Code

Show/Hide source
def make_exogenous_regression(
    *,
    n_samples: int = 200,
    forecasting_horizon: int = 6,
    noise: float = 0.1,
    forecast_bias: float = 0.5,
    random_state: int = 42,
) -> Bunch:
    """Generate a synthetic regression dataset with exogenous features.

    Creates hourly electricity prices driven by temperature and a holiday
    indicator with a known linear relationship:
    ``price = 50 + 2 * temperature + 10 * is_holiday + noise``.

    Three exogenous feature types are produced:

    - **X_actual** (observation features): realized temperature readings
      with a 24 hour sinusoidal cycle.
    - **X_future** (known future): a deterministic ``is_holiday`` indicator
      (Sundays = 1.0) covering the full time range.
    - **X_forecast** (external forecasts): weather temperature forecasts
      with one vintage per observation, each covering the next
      ``forecasting_horizon`` steps. Forecasts carry a small systematic
      bias relative to actuals.

    Parameters
    ----------
    n_samples : int, default=200
        Number of hourly observations.
    forecasting_horizon : int, default=6
        Number of forward steps per X_forecast vintage.
    noise : float, default=0.1
        Standard deviation of the target noise term.
    forecast_bias : float, default=0.5
        Systematic bias added to weather forecasts relative to actuals.
    random_state : int, default=42
        Seed for reproducibility.

    Returns
    -------
    Bunch
        Dictionary-like object with the following attributes:

        y : pl.DataFrame
            Target with columns ``["time", "price"]``.
        X_actual : pl.DataFrame
            Observation features with columns ``["time", "temperature"]``.
        X_future : pl.DataFrame
            Known future features with columns ``["time", "is_holiday"]``.
        X_forecast : pl.DataFrame
            External forecasts with columns
            ``["vintage_time", "time", "wx_temp"]``. One vintage per
            observation from index ``forecasting_horizon`` onward.
        frame : pl.DataFrame
            ``y``, ``X_actual``, and ``X_future`` joined on ``"time"``.
            ``X_forecast`` is excluded because it has a different schema.
        feature_names : list of str
            ``["temperature", "is_holiday", "wx_temp"]``.
        target_names : list of str
            ``["price"]``.
        frequency : str
            ``"1h"``.
        DESCR : str
            Human readable description.

    See Also
    --------
    - [`make_exogenous_classification`][yohou.datasets._generators.make_exogenous_classification] : Classification variant with categorical target.
    - [`fetch_tourism_monthly`][yohou.datasets._fetchers.fetch_tourism_monthly] : Real monthly tourism dataset (univariate).

    Examples
    --------
    >>> from yohou.datasets import make_exogenous_regression
    >>> data = make_exogenous_regression(n_samples=100)
    >>> data.y.columns
    ['time', 'price']
    >>> data.X_forecast.columns
    ['vintage_time', 'time', 'wx_temp']

    """
    rng = np.random.default_rng(random_state)
    times = pl.Series(
        "time",
        [datetime(2024, 1, 1) + timedelta(hours=i) for i in range(n_samples)],
    )
    t = np.arange(n_samples, dtype=float)

    actual_temp = 15.0 + 5.0 * np.sin(2 * np.pi * t / 24) + rng.normal(0, 0.5, n_samples)
    holidays = np.array([
        1.0 if (datetime(2024, 1, 1) + timedelta(hours=i)).weekday() == 6 else 0.0 for i in range(n_samples)
    ])
    price = 50.0 + 2.0 * actual_temp + 10.0 * holidays + rng.normal(0, noise, n_samples)

    y = pl.DataFrame({"time": times, "price": price})
    X_actual = pl.DataFrame({"time": times, "temperature": actual_temp})
    X_future = pl.DataFrame({"time": times, "is_holiday": holidays})

    forecast_rows: list[dict[str, object]] = []
    for i in range(forecasting_horizon, n_samples):
        for step in range(1, forecasting_horizon + 1):
            if i + step < n_samples:
                forecast_rows.append({
                    "vintage_time": times[i],
                    "time": times[i + step],
                    "wx_temp": float(actual_temp[i + step] + forecast_bias + rng.normal(0, 0.3)),
                })
    X_forecast = pl.DataFrame(forecast_rows)

    frame = y.join(X_actual, on="time").join(X_future, on="time")

    return Bunch(
        y=y,
        X_actual=X_actual,
        X_future=X_future,
        X_forecast=X_forecast,
        frame=frame,
        feature_names=["temperature", "is_holiday", "wx_temp"],
        target_names=["price"],
        frequency="1h",
        DESCR=(
            "Synthetic hourly electricity prices with exogenous features.\n"
            "Target: price = 50 + 2 * temperature + 10 * is_holiday + noise.\n"
            "X_actual: realized temperature (sinusoidal 24h cycle + noise).\n"
            "X_future: is_holiday indicator (Sundays = 1.0).\n"
            "X_forecast: weather temperature forecasts with systematic bias."
        ),
    )

Tutorials

The following example notebooks use this component:

  • How to Align Exogenous Features Across Pipeline Steps


    Data-Features

    Control which step-indexed columns each direct-strategy estimator sees using the step_feature_alignment parameter of PointReductionForecaster.

    View · Open in marimo

  • How to Produce Multi-Vintage Predictions


    Forecasting-Models

    Generate multiple predictions from different weather forecast vintages without refitting, using the X_forecast predict-time override.

    View · Open in marimo

  • Exogenous Features (X_actual, X_future, X_forecast)


    Getting-Started

    Build a forecasting model with actual observations, known-future indicators, and multi-vintage external forecasts on synthetic electricity price data.

    View · Open in marimo