Skip to content

yohou.utils

Validation, panel data, weighting, tags, discovery, and other utility functions.

Discovery

Name Description
all_estimators Get a list of all estimators from yohou.
all_displays Get a list of all displays from yohou.
all_functions Get a list of all functions from yohou.

Panel data

Name Description
inspect_panel Inspect DataFrame columns to distinguish global and local (panel) data.
get_group_df Extract and rename columns for a specific panel group.
dict_to_panel Convert a dict of group DataFrames to a single DataFrame with prefixed columns.
select_panel_columns Select panel group columns and optionally global columns of a DataFrame.
panel_aware_rename Apply a rename function to a column name while preserving the panel group prefix.
panel_aware_prefix Add a prefix to a column name while preserving the panel group prefix.
panel_aware_suffix Add a suffix to a column name while preserving the panel group prefix.
check_groups Validate and normalize panel group names for forecaster operations.
check_groups_exist Validate all requested panel groups exist in fitted forecaster.
check_panel_groups_match Validate that y and X have matching panel group structures.
check_panel_internal_consistency Validate that all panel groups in a DataFrame have the same local column structure.

Data validation

Name Description
validate_forecaster_data Validate data for forecasters.
validate_transformer_data Validate data for transformers.
validate_scorer_data Validate and prepare scorer input data.
validate_splitter_data Validate data for splitters.
validate_plotting_data Validate a DataFrame for plotting and resolve columns.
validate_plotting_params Validate common plotting function parameters.
validate_search_data Validate input data for hyperparameter search (GridSearchCV, RandomizedSearchCV).
validate_time_weight Validate time_weight parameter for forecasters and scorers.
validate_column_names Validate that __ separator is used only for panel data group names.

Time series validation

Name Description
check_time_column Validate that time column exists, has proper dtype, no nulls, and is sorted.
check_interval_consistency Validate that a time series has uniform time spacing.
check_continuity Validate temporal continuity between consecutive DataFrames.
check_sufficient_rows Validate DataFrame has sufficient rows for operation.
check_inputs Validate that target and feature DataFrames have consistent time intervals.
check_schema Validate DataFrame schema and return with proper column ordering.
check_X_actual_required Validate X_actual is provided when required for recursive prediction.
check_forecasting_horizon_positive Validate forecasting horizon is positive.
check_scorer_column_selection Subselect columns based on scorer configuration.

Weighting

Name Description
exponential_decay_weight Generate exponential decay weights giving more weight to recent times.
linear_decay_weight Generate linear decay weights giving more weight to recent times.
seasonal_emphasis_weight Generate weights emphasizing specific seasonal positions.
compose_weights Compose multiple weight functions by multiplication.
validate_callable_signature Validate that callable has valid signature for time weighting.
normalize_weights Normalize weights so they sum to the number of elements.
validate_weight_array Validate a resolved weight array for NaN, negatives, infinities, and all-zero.
resolve_dict_weights Map a {key: weight} dict to an aligned numpy array.
combine_weight_vectors Combine weight vectors multiplicatively and normalize.
resolve_weight_to_array Resolve a weight specification (callable, DataFrame, or dict) to a numpy array.

Time intervals

Name Description
add_interval Add n intervals to a datetime (handles variable-length intervals).
interval_to_timedelta Convert fixed interval to timedelta, or None for variable intervals.
parse_interval Parse interval string into (multiplier, unit).

Polars helpers

Name Description
cast Cast columns according to schema with integer rounding.
get_numeric_columns Get list of numeric column names from a DataFrame.
tabularize Convert time series to tabular format using lags.