Skip to content

validate_splitter_data

yohou.utils.validate_data.validate_splitter_data(splitter, y, X_actual)

validate_splitter_data(
    splitter: BaseSplitter,
    y: pl.DataFrame,
    X_actual: pl.DataFrame | None,
) -> tuple[pl.DataFrame, pl.DataFrame | None]
validate_splitter_data(
    splitter: BaseSplitter,
    y: None,
    X_actual: pl.DataFrame | None,
) -> tuple[None, pl.DataFrame | None]

Validate and prepare input data for time series splitters.

Checks that y and X_actual have valid "time" columns, consistent panel structure, and matching panel groups. When y is provided the time interval is inferred and stored on the splitter as interval_.

Parameters

Name Type Description Default
splitter BaseSplitter

The splitter instance. interval_ will be set on it when y is not None.

required
y DataFrame or None

Target time series. When None, only X_actual is validated.

required
X_actual DataFrame or None

Exogenous features. When None, only y is validated.

required

Returns

Type Description
tuple of (pl.DataFrame or None, pl.DataFrame or None)

Validated (y, X_actual) pair.

Raises

Type Description
ValueError

If time columns are missing, panel groups are inconsistent, or input validation fails.

See Also

Source Code

Show/Hide source
def validate_splitter_data(
    splitter: BaseSplitter, y: pl.DataFrame | None, X_actual: pl.DataFrame | None
) -> tuple[pl.DataFrame | None, pl.DataFrame | None]:
    """Validate and prepare input data for time series splitters.

    Checks that ``y`` and ``X_actual`` have valid ``"time"`` columns, consistent
    panel structure, and matching panel groups.  When ``y`` is provided the
    time interval is inferred and stored on the splitter as ``interval_``.

    Parameters
    ----------
    splitter : BaseSplitter
        The splitter instance.  ``interval_`` will be set on it when
        ``y`` is not ``None``.
    y : pl.DataFrame or None
        Target time series.  When ``None``, only ``X_actual`` is validated.
    X_actual : pl.DataFrame or None
        Exogenous features.  When ``None``, only ``y`` is validated.

    Returns
    -------
    tuple of (pl.DataFrame or None, pl.DataFrame or None)
        Validated ``(y, X_actual)`` pair.

    Raises
    ------
    ValueError
        If time columns are missing, panel groups are inconsistent, or
        input validation fails.

    See Also
    --------
    - [`BaseSplitter`][yohou.model_selection.split.BaseSplitter] : Base class for time series CV splitters.
    - [`check_inputs`][yohou.utils.validation.check_inputs] : Low-level input validation helper.

    """
    if y is not None:
        check_time_column(y)
        check_panel_internal_consistency(y, "y")

    if X_actual is not None:
        check_time_column(X_actual)
        check_panel_internal_consistency(X_actual, "X_actual")

        if y is not None:
            check_panel_groups_match(y, X_actual)

    # Type narrowing: check_inputs requires non-None y
    if y is not None:
        interval = check_inputs(y, X_actual)
        splitter.interval_ = interval

    return y, X_actual