Skip to content

fetch_demand_classification

yohou.datasets._fetchers.fetch_demand_classification(*, data_home=None, download_if_missing=True, n_retries=3, delay=1.0)

Fetch a categorical electricity demand dataset from Monash/Zenodo.

Downloads the Australian Electricity Demand dataset and bins Victoria's half-hourly demand into three levels (low, medium, high) using tercile thresholds. The remaining four states (NSW, QLD, SA, TAS) become exogenous features. Rows with null Victoria demand are dropped.

Parameters

Name Type Description Default
data_home str, PathLike, or None

Specify another download and cache folder for the datasets. By default all yohou data is stored in ~/yohou_data/.

None
download_if_missing bool

If False, raise an OSError if the data is not locally available instead of trying to download it.

True
n_retries int

Number of retries when HTTP errors are encountered.

3
delay float

Number of seconds between retries.

1.0

Returns

Type Description
Bunch

Dictionary-like object with the following attributes:

y : pl.DataFrame DataFrame with "time" (Datetime) and "demand_level" (Utf8) columns. The "demand_level" column contains one of "low", "medium", or "high". X_actual : pl.DataFrame DataFrame with "time" and 4 state demand columns ("nsw__demand", "qun__demand", "sa__demand", "tas__demand"). feature_names : list of str Feature column names (excludes "time"). target_names : list of str ["demand_level"]. classes : list of str ["high", "low", "medium"] (sorted). DESCR : str Human-readable dataset description.

See Also

Examples

>>> from yohou.datasets import fetch_demand_classification
>>> data = fetch_demand_classification()
>>> data.y.columns
['time', 'demand_level']
>>> sorted(data.classes)
['high', 'low', 'medium']

Source Code

Show/Hide source
def fetch_demand_classification(
    *,
    data_home: str | os.PathLike | None = None,
    download_if_missing: bool = True,
    n_retries: int = 3,
    delay: float = 1.0,
) -> Bunch:
    """Fetch a categorical electricity demand dataset from Monash/Zenodo.

    Downloads the Australian Electricity Demand dataset and bins
    Victoria's half-hourly demand into three levels (low, medium, high)
    using tercile thresholds. The remaining four states (NSW, QLD, SA,
    TAS) become exogenous features. Rows with null Victoria demand are
    dropped.

    Parameters
    ----------
    data_home : str, PathLike, or None
        Specify another download and cache folder for the datasets.
        By default all yohou data is stored in ``~/yohou_data/``.
    download_if_missing : bool, default=True
        If ``False``, raise an ``OSError`` if the data is not locally
        available instead of trying to download it.
    n_retries : int, default=3
        Number of retries when HTTP errors are encountered.
    delay : float, default=1.0
        Number of seconds between retries.

    Returns
    -------
    Bunch
        Dictionary-like object with the following attributes:

        y : pl.DataFrame
            DataFrame with ``"time"`` (Datetime) and ``"demand_level"``
            (Utf8) columns. The ``"demand_level"`` column contains one
            of ``"low"``, ``"medium"``, or ``"high"``.
        X_actual : pl.DataFrame
            DataFrame with ``"time"`` and 4 state demand columns
            (``"nsw__demand"``, ``"qun__demand"``, ``"sa__demand"``,
            ``"tas__demand"``).
        feature_names : list of str
            Feature column names (excludes ``"time"``).
        target_names : list of str
            ``["demand_level"]``.
        classes : list of str
            ``["high", "low", "medium"]`` (sorted).
        DESCR : str
            Human-readable dataset description.

    See Also
    --------
    - [`fetch_electricity_demand`][yohou.datasets._fetchers.fetch_electricity_demand] : Full Australian electricity demand dataset.
    - [`fetch_air_quality_classification`][yohou.datasets._fetchers.fetch_air_quality_classification] : Categorical air quality dataset.

    Examples
    --------
    >>> from yohou.datasets import fetch_demand_classification
    >>> data = fetch_demand_classification()  # doctest: +SKIP
    >>> data.y.columns  # doctest: +SKIP
    ['time', 'demand_level']
    >>> sorted(data.classes)  # doctest: +SKIP
    ['high', 'low', 'medium']

    """
    if _is_wasm():
        return _fetch_classification_wasm("demand_classification")

    bunch = fetch_electricity_demand(
        data_home=data_home,
        download_if_missing=download_if_missing,
        n_retries=n_retries,
        delay=delay,
    )
    frame = bunch.frame

    target_col = "vic__demand"
    feature_cols = ["nsw__demand", "qun__demand", "sa__demand", "tas__demand"]

    # Drop rows with null target
    frame = frame.drop_nulls(subset=[target_col])

    # Compute tercile thresholds
    q33 = frame[target_col].quantile(1 / 3)
    q66 = frame[target_col].quantile(2 / 3)

    demand_level = (
        pl
        .when(pl.col(target_col) < q33)
        .then(pl.lit("low"))
        .when(pl.col(target_col) < q66)
        .then(pl.lit("medium"))
        .otherwise(pl.lit("high"))
    )

    y = frame.select("time", demand_level.alias("demand_level"))
    X_actual = frame.select("time", *feature_cols)

    classes = sorted(set(y["demand_level"].to_list()))

    return Bunch(
        y=y,
        X_actual=X_actual,
        feature_names=[c for c in X_actual.columns if c != "time"],
        target_names=["demand_level"],
        classes=classes,
        DESCR=(
            "Electricity demand classification dataset derived from the "
            "Australian Electricity Demand dataset (Monash/Zenodo). "
            "Victoria's half-hourly demand is binned into three tercile-based "
            "levels: low, medium, high. Features are the demand series from "
            "the remaining four Australian states (NSW, QLD, SA, TAS)."
        ),
    )