Skip to content

select_panel_columns

yohou.utils.panel.select_panel_columns(df, groups, include_global=True)

Select panel group columns and optionally global columns of a DataFrame.

For panel data (DataFrames with columns using __ separator for groups), this function filters columns to keep only the "time" column, columns matching any of the panel group prefixes, and optionally global columns.

Parameters

Name Type Description Default
df DataFrame

Input DataFrame with potential mix of global and group columns. Must contain a "time" column.

required
groups list of str or None

List of all group prefixes in the dataset. All columns matching any __* pattern will be kept. If None, no filtering is performed.

required
include_global bool

Whether to keep global columns (without __) in addition to time and panel group columns. - True: Keep time + all panel groups + all global columns for X - False: Keep only time + all panel groups (for y target data)

True

Returns

Type Description
DataFrame

Filtered DataFrame containing "time", columns matching any panel group prefix, and optionally global columns.

Examples

>>> import polars as pl
>>> # Panel data with group columns and global column
>>> df = pl.DataFrame({
...     "time": [1, 2, 3],
...     "global_feature": [10.0, 20.0, 30.0],
...     "sales__store_1": [100, 110, 120],
...     "sales__store_2": [150, 160, 170],
...     "inventory__store_1": [50, 55, 60],
...     "inventory__store_2": [75, 80, 85],
... })
>>> # Filter for target (y) - exclude global features
>>> y_filtered = select_panel_columns(df, ["sales", "inventory"], include_global=False)
>>> set(y_filtered.columns) == {
...     "time",
...     "sales__store_1",
...     "sales__store_2",
...     "inventory__store_1",
...     "inventory__store_2",
... }
True
>>> # Filter for features (X) - include global features
>>> X_filtered = select_panel_columns(df, ["sales", "inventory"], include_global=True)
>>> set(X_filtered.columns) == {
...     "time",
...     "global_feature",
...     "sales__store_1",
...     "sales__store_2",
...     "inventory__store_1",
...     "inventory__store_2",
... }
True

See Also

  • inspect_panel : Inspect DataFrame to identify global and local columns

Source Code

Show/Hide source
def select_panel_columns(
    df: pl.DataFrame,
    groups: list[str] | None,
    include_global: bool = True,
) -> pl.DataFrame:
    """Select panel group columns and optionally global columns of a DataFrame.

    For panel data (DataFrames with columns using __ separator for groups),
    this function filters columns to keep only the "time" column, columns
    matching any of the panel group prefixes, and optionally global columns.

    Parameters
    ----------
    df : pl.DataFrame
        Input DataFrame with potential mix of global and group columns.
        Must contain a "time" column.

    groups : list of str or None
        List of all group prefixes in the dataset. All columns matching
        any <group>__* pattern will be kept. If None, no filtering is performed.

    include_global : bool, default=True
        Whether to keep global columns (without __) in addition to time and
        panel group columns.
        - True: Keep time + all panel groups + all global columns for X
        - False: Keep only time + all panel groups (for y target data)

    Returns
    -------
    pl.DataFrame
        Filtered DataFrame containing "time", columns matching any panel
        group prefix, and optionally global columns.

    Examples
    --------
    >>> import polars as pl
    >>> # Panel data with group columns and global column
    >>> df = pl.DataFrame({
    ...     "time": [1, 2, 3],
    ...     "global_feature": [10.0, 20.0, 30.0],
    ...     "sales__store_1": [100, 110, 120],
    ...     "sales__store_2": [150, 160, 170],
    ...     "inventory__store_1": [50, 55, 60],
    ...     "inventory__store_2": [75, 80, 85],
    ... })
    >>> # Filter for target (y) - exclude global features
    >>> y_filtered = select_panel_columns(df, ["sales", "inventory"], include_global=False)
    >>> set(y_filtered.columns) == {
    ...     "time",
    ...     "sales__store_1",
    ...     "sales__store_2",
    ...     "inventory__store_1",
    ...     "inventory__store_2",
    ... }
    True

    >>> # Filter for features (X) - include global features
    >>> X_filtered = select_panel_columns(df, ["sales", "inventory"], include_global=True)
    >>> set(X_filtered.columns) == {
    ...     "time",
    ...     "global_feature",
    ...     "sales__store_1",
    ...     "sales__store_2",
    ...     "inventory__store_1",
    ...     "inventory__store_2",
    ... }
    True

    See Also
    --------
    - [`inspect_panel`][yohou.utils.panel.inspect_panel] : Inspect DataFrame to identify global and local columns
    """
    # If no local groups, return DataFrame unchanged (no filtering needed)
    if groups is None:
        return df

    # Determine which columns to keep
    cols_to_keep = ["time"]

    for col in df.columns:
        if col == "time":
            continue

        # Check if this column belongs to any panel group
        is_panel = False
        for group_prefix in groups:
            if col.startswith(f"{group_prefix}__"):
                is_panel = True
                break

        if is_panel:
            cols_to_keep.append(col)
        elif include_global:
            # Global column (doesn't match any group prefix)
            cols_to_keep.append(col)

    return df.select(cols_to_keep)