Skip to content

check_time_column

yohou.utils.validation.check_time_column(df, df_name='DataFrame')

Validate that time column exists, has proper dtype, no nulls, and is sorted.

Parameters

Name Type Description Default
df DataFrame

DataFrame to validate.

required
df_name str

Name of DataFrame in error message.

"DataFrame"

Raises

Type Description
ValueError

If time column is missing, has wrong dtype, contains nulls, or is not sorted.

See Also

Source Code

Show/Hide source
def check_time_column(df: pl.DataFrame, df_name: str = "DataFrame") -> None:
    """Validate that time column exists, has proper dtype, no nulls, and is sorted.

    Parameters
    ----------
    df : pl.DataFrame
        DataFrame to validate.
    df_name : str, default="DataFrame"
        Name of DataFrame in error message.

    Raises
    ------
    ValueError
        If time column is missing, has wrong dtype, contains nulls, or is not sorted.

    See Also
    --------
    - [`check_interval_consistency`][yohou.utils.validation.check_interval_consistency] : Validate uniform time spacing.
    - [`check_continuity`][yohou.utils.validation.check_continuity] : Validate temporal continuity between DataFrames.
    - [`check_inputs`][yohou.utils.validation.check_inputs] : Validate consistent intervals across y and X_actual.

    """
    if "time" not in df.columns:
        raise ValueError(f"{df_name} must contain a 'time' column. Found columns: {list(df.columns)}")

    time_col = df["time"]
    # Check dtype
    if not isinstance(time_col.dtype, pl.Datetime | pl.Date):
        raise ValueError(f"'time' column in {df_name} must have dtype pl.Datetime or pl.Date, but got {time_col.dtype}")

    # Check for nulls
    if time_col.null_count() > 0:
        raise ValueError(
            f"'time' column in {df_name} contains {time_col.null_count()} null values. "
            "'time' column must not have missing values."
        )

    # Check sorting (ascending)
    if not time_col.is_sorted():
        raise ValueError(f"'time' column in {df_name} must be sorted in ascending order. Call df.sort('time') to fix.")