Skip to content

check_insufficient_data_raises

yohou.testing.transformer.check_insufficient_data_raises(transformer, X, y=None)

Check behavior when data length < observation_horizon.

Transformers should either raise appropriate errors or gracefully handle insufficient data when given data shorter than their observation_horizon.

Parameters

Name Type Description Default
transformer BaseTransformer

Unfitted transformer

required
X DataFrame

Test data (will be truncated)

required
y DataFrame

Target data

None

Notes

This check verifies that transformers don't crash unexpectedly with insufficient data. They may either raise a clear error or return an empty/truncated result.

Source Code

Show/Hide source
def check_insufficient_data_raises(transformer, X: pl.DataFrame, y: pl.DataFrame | None = None) -> None:
    """Check behavior when data length < observation_horizon.

    Transformers should either raise appropriate errors or gracefully handle
    insufficient data when given data shorter than their observation_horizon.

    Parameters
    ----------
    transformer : BaseTransformer
        Unfitted transformer
    X : pl.DataFrame
        Test data (will be truncated)
    y : pl.DataFrame, optional
        Target data

    Notes
    -----
    This check verifies that transformers don't crash unexpectedly with
    insufficient data. They may either raise a clear error or return
    an empty/truncated result.

    """
    transformer_clone = clone(transformer)
    transformer_clone.fit(X, y)

    horizon = transformer_clone.observation_horizon

    if horizon == 0:
        # Stateless transformers don't need minimum data
        return

    if len(X) <= horizon:
        # Need longer X to test
        return

    # Create data shorter than horizon
    X_short = X.head(horizon - 1) if horizon > 1 else X.head(0)

    try:
        result = transformer_clone.transform(X_short)
        # If it succeeds, verify result is valid (has time column, non-negative length)
        assert "time" in result.columns, "Result must have time column"
        assert len(result) >= 0, "Result length must be non-negative"
        # Graceful handling is acceptable (e.g., returning empty dataframe)
    except (ValueError, IndexError, pl.exceptions.ShapeError, pl.exceptions.ComputeError):
        # Expected behavior - transformer raises appropriate error
        pass