check_schema¶
yohou.utils.validation.check_schema(df, expected_schema, groups=None)
¶
Validate DataFrame schema and return with proper column ordering.
Ensures that data has the same column names and dtypes as expected, and returns the DataFrame with columns in the correct order (time column first, followed by schema columns in order).
Parameters¶
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to validate (should include "time" column). |
required |
expected_schema
|
dict[str, DataType]
|
Expected schema for non-time columns. For panel data, this should contain unprefixed column names. |
required |
groups
|
list[str] or None
|
Group prefixes for panel data. If provided, constructs expected schema with prefixes (e.g., "panel__series_0"). None for global data. |
None
|
Returns¶
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns in proper order: ["time"] + schema columns. |
Raises¶
| Type | Description |
|---|---|
ValueError
|
If incoming schema doesn't match expected schema. |
Examples¶
>>> import polars as pl
>>> # Non-panel data validation
>>> df = pl.DataFrame({"value": [10, 20], "time": [1, 2]})
>>> expected_schema = {"value": pl.Int64}
>>> result = check_schema(df, expected_schema)
>>> list(result.columns)
['time', 'value']
>>> # Schema mismatch raises error
>>> df_wrong = pl.DataFrame({"time": [1, 2], "value": [10.0, 20.0]}) # Float64
>>> check_schema(df_wrong, expected_schema)
Traceback (most recent call last):
...
ValueError: Schema mismatch. Expected: {'value': Int64}, got: {'value': Float64}
>>> # Panel data validation (constructs prefixed schema automatically)
>>> df_panel = pl.DataFrame({"panel__s1": [15, 25], "time": [1, 2], "panel__s0": [10, 20]})
>>> expected_schema = {"s0": pl.Int64, "s1": pl.Int64}
>>> result = check_schema(df_panel, expected_schema, groups=["panel"])
>>> list(result.columns)
['time', 'panel__s0', 'panel__s1']
See Also¶
check_inputs: Validates time intervalsBaseForecaster: Uses this function to validate incoming data
Notes¶
For panel data, this function automatically constructs the expected schema with prefixes (e.g., "sales__store_1") from the unprefixed expected_schema. The returned DataFrame has columns ordered consistently with the schema.
Source Code¶
Show/Hide source
998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 | |