Skip to content

check_interval_consistency

yohou.utils.validation.check_interval_consistency(df)

Validate that a time series has uniform time spacing.

Checks that all consecutive time steps in the DataFrame have the same interval. Supports both fixed intervals (daily, hourly) and variable-length intervals (monthly, quarterly, yearly).

Parameters

Name Type Description Default
df DataFrame

Time series DataFrame with a "time" column containing datetime values.

required

Returns

Type Description
str

String representation of the interval. Examples: "1d", "1h", "1w", "1mo", "3mo", "1q", "1y"

Raises

Type Description
ValueError

If the time intervals are not consistent throughout the DataFrame.

Examples

>>> import polars as pl
>>> from datetime import datetime
>>> # Valid: uniform 1-day intervals
>>> df = pl.DataFrame({
...     "time": pl.datetime_range(
...         start=datetime(2020, 1, 1), end=datetime(2020, 1, 5), interval="1d", eager=True
...     ),
...     "value": [10, 20, 30, 40, 50],
... })
>>> interval = check_interval_consistency(df)
>>> interval
'1d'
>>> # Invalid: inconsistent intervals
>>> df_bad = pl.DataFrame({
...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 4)],
...     "value": [10, 20, 30],
... })
>>> check_interval_consistency(df_bad)
ValueError

See Also

Source Code

Show/Hide source
def check_interval_consistency(df: pl.DataFrame) -> str:
    """Validate that a time series has uniform time spacing.

    Checks that all consecutive time steps in the DataFrame have the same interval.
    Supports both fixed intervals (daily, hourly) and variable-length intervals
    (monthly, quarterly, yearly).

    Parameters
    ----------
    df : pl.DataFrame
        Time series DataFrame with a "time" column containing datetime values.

    Returns
    -------
    str
        String representation of the interval.
        Examples: "1d", "1h", "1w", "1mo", "3mo", "1q", "1y"

    Raises
    ------
    ValueError
        If the time intervals are not consistent throughout the DataFrame.

    Examples
    --------
    >>> import polars as pl
    >>> from datetime import datetime
    >>> # Valid: uniform 1-day intervals
    >>> df = pl.DataFrame({
    ...     "time": pl.datetime_range(
    ...         start=datetime(2020, 1, 1), end=datetime(2020, 1, 5), interval="1d", eager=True
    ...     ),
    ...     "value": [10, 20, 30, 40, 50],
    ... })
    >>> interval = check_interval_consistency(df)
    >>> interval
    '1d'

    >>> # Invalid: inconsistent intervals
    >>> df_bad = pl.DataFrame({
    ...     "time": [datetime(2020, 1, 1), datetime(2020, 1, 2), datetime(2020, 1, 4)],
    ...     "value": [10, 20, 30],
    ... })
    >>> check_interval_consistency(df_bad)  # doctest: +SKIP
    ValueError

    See Also
    --------
    - [`check_inputs`][yohou.utils.validation.check_inputs] : Validates multiple DataFrames have matching intervals
    - [`check_continuity`][yohou.utils.validation.check_continuity] : Validates temporal continuity between DataFrames
    - [`add_interval`][yohou.utils.validation.add_interval] : Add intervals to datetime values

    """
    if df is None:
        raise ValueError("DataFrame cannot be None")

    time_series = df["time"].to_list()

    if len(time_series) < 2:
        raise ValueError("Need at least 2 time points to infer interval")

    # Calculate deltas
    deltas = [time_series[i + 1] - time_series[i] for i in range(len(time_series) - 1)]
    unique_deltas = sorted(set(deltas))

    # Check if deltas are all similar (within small tolerance for rounding)
    delta_days = [d.days for d in unique_deltas]
    max_delta = max(delta_days)

    # Sub-day intervals with small variation (e.g., hourly with DST)
    if max_delta == 0:
        # All deltas are sub-day
        delta_seconds = [d.total_seconds() for d in unique_deltas]
        if max(delta_seconds) - min(delta_seconds) <= 3600:  # ±1 hour tolerance
            median_seconds = sorted(delta_seconds)[len(delta_seconds) // 2]
            return _timedelta_to_string(timedelta(seconds=median_seconds))

    # Infer based on delta distribution
    freq = _infer_freq_from_deltas(time_series, unique_deltas)
    if freq is not None:
        return freq

    # Could not infer - raise detailed error
    raise ValueError(
        f"Time series has inconsistent intervals. "
        f"Found {len(unique_deltas)} different intervals: {unique_deltas}. "
        f"Cannot infer a regular frequency pattern."
    )