Skip to content

fetch_dominick

yohou.datasets._fetchers.fetch_dominick(*, n_series=50, data_home=None, download_if_missing=True, n_retries=3, delay=1.0)

Fetch the Dominick dataset from Monash/Zenodo.

Weekly time series representing the profit of individual stock keeping units from a retailer (Dominick's Finer Foods). The full dataset contains 115 704 series; by default only the first 50 are loaded to keep memory usage reasonable.

Parameters

Name Type Description Default
n_series int or None

Maximum number of series to include. None loads all 115 704 series (several GB of memory). The default of 50 keeps the dataset small enough for interactive examples.

50
data_home str, PathLike, or None

Specify another download and cache folder for the datasets. By default all yohou data is stored in ~/yohou_data/.

None
download_if_missing bool

If False, raise an OSError if the data is not locally available instead of trying to download it.

True
n_retries int

Number of retries when HTTP errors are encountered.

3
delay float

Number of seconds between retries.

1.0

Returns

Type Description
Bunch

Dictionary-like object with the following attributes:

frame : pl.DataFrame DataFrame with "time" (Datetime) and up to 115 704 series columns using the __ separator convention (e.g. "T1__profit"). feature_names : list of str Non-time column names. DESCR : str Full description of the dataset. frequency : str "1w". n_series : int Number of series actually loaded. filename : str Path to the cached parquet file.

See Also

References

[1] Godahewa, R., Bergmeir, C., Webb, G. I., Hyndman, R. J., & Montero-Manso, P. (2021). "Monash Time Series Forecasting Archive." Neural Information Processing Systems Track on Datasets and Benchmarks. https://doi.org/10.5281/zenodo.4654802

Examples

>>> from yohou.datasets import fetch_dominick
>>> bunch = fetch_dominick()
>>> bunch.frame.columns[:2]
['time', 'T1__profit']

Source Code

Show/Hide source
def fetch_dominick(
    *,
    n_series: int | None = 50,
    data_home: str | os.PathLike | None = None,
    download_if_missing: bool = True,
    n_retries: int = 3,
    delay: float = 1.0,
) -> Bunch:
    """Fetch the Dominick dataset from Monash/Zenodo.

    Weekly time series representing the profit of individual stock
    keeping units from a retailer (Dominick's Finer Foods).  The full
    dataset contains 115 704 series; by default only the first 50 are
    loaded to keep memory usage reasonable.

    Parameters
    ----------
    n_series : int or None, default=50
        Maximum number of series to include.  ``None`` loads all
        115 704 series (several GB of memory).  The default of 50
        keeps the dataset small enough for interactive examples.
    data_home : str, PathLike, or None
        Specify another download and cache folder for the datasets.
        By default all yohou data is stored in ``~/yohou_data/``.
    download_if_missing : bool, default=True
        If ``False``, raise an ``OSError`` if the data is not locally
        available instead of trying to download it.
    n_retries : int, default=3
        Number of retries when HTTP errors are encountered.
    delay : float, default=1.0
        Number of seconds between retries.

    Returns
    -------
    Bunch
        Dictionary-like object with the following attributes:

        frame : pl.DataFrame
            DataFrame with ``"time"`` (Datetime) and up to 115 704
            series columns using the ``__`` separator convention
            (e.g. ``"T1__profit"``).
        feature_names : list of str
            Non-time column names.
        DESCR : str
            Full description of the dataset.
        frequency : str
            ``"1w"``.
        n_series : int
            Number of series actually loaded.
        filename : str
            Path to the cached parquet file.

    See Also
    --------
    - [`fetch_tourism_monthly`][yohou.datasets._fetchers.fetch_tourism_monthly] : Monthly tourism series.
    - [`fetch_hospital`][yohou.datasets._fetchers.fetch_hospital] : Monthly hospital patient count series.
    - [`get_data_home`][yohou.datasets._fetchers.get_data_home] : Return the path of the data directory.

    References
    ----------
    [1] Godahewa, R., Bergmeir, C., Webb, G. I., Hyndman, R. J., &
        Montero-Manso, P. (2021). "Monash Time Series Forecasting Archive."
        Neural Information Processing Systems Track on Datasets and
        Benchmarks. https://doi.org/10.5281/zenodo.4654802

    Examples
    --------
    >>> from yohou.datasets import fetch_dominick
    >>> bunch = fetch_dominick()  # doctest: +SKIP
    >>> bunch.frame.columns[:2]  # doctest: +SKIP
    ['time', 'T1__profit']

    """
    return _fetch_dataset(
        metadata=DOMINICK,
        dataset_name="dominick",
        value_column_name="profit",
        n_series=n_series,
        data_home=data_home,
        download_if_missing=download_if_missing,
        n_retries=n_retries,
        delay=delay,
    )