Skip to content

inspect_panel

yohou.utils.panel.inspect_panel(df)

Inspect DataFrame columns to distinguish global and local (panel) data.

Global columns apply to all time series (e.g., single univariate series or features common across all panels). Local columns use the __ separator to indicate panel data groups following the pattern __ (e.g., sales__store_1, sales__store_2).

Parameters

Name Type Description Default
df DataFrame

Input DataFrame with potential mix of global and group columns. Must contain a "time" column (which is ignored in the output).

required

Returns

Name Type Description
global_names list of str

Names of columns without __ separator (excluding "time").

panel_groups dict of str to list of str

Mapping from group prefixes to their full column names. Example: {"store_1": ["store_1__sales", "store_1__returns"]}

Examples

>>> import polars as pl
>>> # Global time series (single series)
>>> df_global = pl.DataFrame({"time": [1, 2, 3], "value": [10, 20, 30]})
>>> global_names, panel_groups = inspect_panel(df_global)
>>> global_names
['value']
>>> panel_groups
{}
>>> # Panel data with __ separator (<entity>__<variable>)
>>> df_panel = pl.DataFrame({
...     "time": [1, 2, 3],
...     "store_1__sales": [100, 110, 120],
...     "store_2__sales": [150, 160, 170],
... })
>>> global_names, panel_groups = inspect_panel(df_panel)
>>> global_names
[]
>>> panel_groups
{'store_1': ['store_1__sales'], 'store_2': ['store_2__sales']}

See Also

Source Code

Show/Hide source
def inspect_panel(df: pl.DataFrame) -> tuple[list[str], dict[str, list[str]]]:
    """Inspect DataFrame columns to distinguish global and local (panel) data.

    Global columns apply to all time series (e.g., single univariate series or
    features common across all panels). Local columns use the __ separator to
    indicate panel data groups following the pattern <GROUP>__<SERIES>
    (e.g., sales__store_1, sales__store_2).

    Parameters
    ----------
    df : pl.DataFrame
        Input DataFrame with potential mix of global and group columns.
        Must contain a "time" column (which is ignored in the output).

    Returns
    -------
    global_names : list of str
        Names of columns without __ separator (excluding "time").

    panel_groups : dict of str to list of str
        Mapping from group prefixes to their full column names.
        Example: {"store_1": ["store_1__sales", "store_1__returns"]}

    Examples
    --------
    >>> import polars as pl
    >>> # Global time series (single series)
    >>> df_global = pl.DataFrame({"time": [1, 2, 3], "value": [10, 20, 30]})
    >>> global_names, panel_groups = inspect_panel(df_global)
    >>> global_names
    ['value']
    >>> panel_groups
    {}

    >>> # Panel data with __ separator (<entity>__<variable>)
    >>> df_panel = pl.DataFrame({
    ...     "time": [1, 2, 3],
    ...     "store_1__sales": [100, 110, 120],
    ...     "store_2__sales": [150, 160, 170],
    ... })
    >>> global_names, panel_groups = inspect_panel(df_panel)
    >>> global_names
    []
    >>> panel_groups
    {'store_1': ['store_1__sales'], 'store_2': ['store_2__sales']}

    See Also
    --------
    - [`select_panel_columns`][yohou.utils.panel.select_panel_columns] : Filter DataFrame to panel group columns and global columns
    """
    # Pattern to match <GROUP>__<SERIES> format
    # Non-greedy prefix allows group names with underscores (e.g., new_south_wales__trips)
    group_pattern = re.compile(r"^(.+?)__(.+)$")

    global_names = []
    panel_groups: dict[str, list[str]] = {}

    for col in df.columns:
        if col == "time":
            continue

        match = group_pattern.match(col)
        if match:
            # This is a panel data column
            group_prefix = match.group(1)
            if group_prefix not in panel_groups:
                panel_groups[group_prefix] = []
            panel_groups[group_prefix].append(col)
        else:
            # This is a global column
            global_names.append(col)

    # Validate that unprefixed panel column names don't conflict with global columns
    if panel_groups and global_names:
        # Extract unprefixed names from all panel columns
        unprefixed_panel_names = set()
        for group_cols in panel_groups.values():
            for col in group_cols:
                # Extract the part after __
                unprefixed_name = col.split("__", 1)[1]
                unprefixed_panel_names.add(unprefixed_name)

        # Check for conflicts with global column names
        conflicts = unprefixed_panel_names.intersection(global_names)
        if conflicts:
            raise ValueError(
                f"Panel column names (after removing group prefix) conflict with global column names: {sorted(conflicts)}. "
                f"Panel columns with __ separator cannot have the same name as global columns. "
                f"For example, if you have 'x__a' and a global column 'a', this creates ambiguity."
            )

    return global_names, panel_groups

Tutorials

The following example notebooks use this component:

  • How to Aggregate Scorer Results


    Evaluation-Search

    Demonstrate all scorer aggregation strategies (stepwise, vintagewise, componentwise, groupwise, coveragewise, all) on panel data with weighted group aggregation.

    View · Open in marimo

  • Panel Data Forecasting


    Getting-Started

    Forecast multiple related time series simultaneously using the __ naming convention, LocalPanelForecaster, and per-group scoring.

    View · Open in marimo

  • How to Configure LocalPanelForecaster


    Panel-Data

    Wrap any forecaster with LocalPanelForecaster for fully independent per-group clones, parallel fitting via n_jobs, and selective group operations.

    View · Open in marimo

  • How to Forecast Panel Prediction Intervals


    Panel-Data

    Combine conformal and quantile regression intervals on panel data with per-group coverage analysis, calibration plots, and groupwise interval scoring.

    View · Open in marimo

  • How to Preprocess Panel Data


    Panel-Data

    Automatic panel-aware transformation (StandardScaler, rolling stats, imputation) plus manual per-group workflows with get_group_df and dict_to_panel.

    View · Open in marimo

  • How to Apply Stationarity to Panel Data


    Panel-Data

    Apply per-group stationarity transforms on panel data with SeasonalDifferencing, DecompositionPipeline (polynomial trend + pattern seasonality), and residuals.

    View · Open in marimo

  • Quickstart


    Quickstart

    Comprehensive end-to-end tour of yohou beyond the Getting Started tutorials, covering data loading, baseline forecasting, preprocessing pipelines, decomposition, cross-validation search, and interval prediction.

    View · Open in marimo