Skip to content

plot_outliers

yohou.plotting.exploration.plot_outliers(df, *, columns=None, method='zscore', threshold=3.0, groups=None, facet_by='member', facet_n_cols=2, color_palette=None, show_legend=True, title=None, x_label=None, y_label=None, width=None, height=None, connect_gaps=False, resampler=None, line_width=2.0, line_opacity=0.7, outlier_color='#DC2626', outlier_size=8.0, outlier_symbol='x', show_bounds=True)

Plot time series with outlier points highlighted.

Overlays the original line plot with scatter markers at detected outlier positions. The detection method and threshold are configurable.

Parameters

Name Type Description Default
df DataFrame

Input DataFrame with 'time' column and numeric columns to plot.

required
columns str | list[str] | None

Column(s) to analyze. If None, uses all numeric columns except 'time'.

None
method (zscore, iqr, percentile)

Outlier detection method: - "zscore": points with |z-score| > threshold (default 3.0) - "iqr": points outside [Q1 - threshold*IQR, Q3 + threshold*IQR] (default 1.5) - "percentile": points above the threshold-th percentile or below the (100-threshold)-th percentile (default 95.0, flags outer 5%)

"zscore"
threshold float

Detection threshold. Interpretation depends on method.

3.0
groups list[str] | None

Panel group prefixes to plot.

None
facet_by Literal['group', 'member'] | None

Faceting axis for panel data. "group" creates one subplot per group, "member" one per member. None disables faceting. Ignored for non-panel data.

"member"
facet_n_cols int

Number of columns in facet grid.

2
color_palette list[str] | None

Custom color palette for the series lines.

None
show_legend bool

Whether to display the legend.

True
title str | None

Plot title.

None
x_label str | None

X-axis label.

None
y_label str | None

Y-axis label.

None
width int | None

Plot width in pixels.

None
height int | None

Plot height in pixels.

None
show_legend bool

Whether to display the legend.

True
connect_gaps bool

If True, connect lines across missing data gaps.

False
resampler bool | Literal['widget'] | None

Enable plotly-resampler for large datasets. True returns a FigureResampler, "widget" a FigureWidgetResampler, None reads from get_config.

None
line_width float

Width of the series line in pixels.

2.0
line_opacity float

Opacity of the series line.

0.7
outlier_color str

Color for outlier markers.

"#DC2626"
outlier_size float

Size of outlier markers in pixels.

8.0
outlier_symbol str

Marker symbol for outlier points.

"x"
show_bounds bool

Whether to show threshold boundary lines.

True

Returns

Type Description
Figure

Plotly figure object.

Raises

Type Description
TypeError

If df is not a Polars DataFrame.

ValueError

If method is unknown or threshold is invalid.

Examples

>>> import polars as pl
>>> from yohou.plotting import plot_outliers
>>> df = pl.DataFrame({
...     "time": pl.date_range(pl.date(2020, 1, 1), pl.date(2020, 12, 31), "1mo", eager=True),
...     "y": [100, 120, 115, 130, 140, 135, 150, 160, 155, 170, 180, 175],
... })
>>> fig = plot_outliers(df, columns="y", method="zscore")
>>> len(fig.data) > 0
True

See Also

plot_time_series : Plot basic time series. plot_boxplot : Plot boxplots grouped by time periods.

Source Code

Show/Hide source
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
def plot_outliers(
    df: pl.DataFrame,
    *,
    columns: str | list[str] | None = None,
    method: Literal["zscore", "iqr", "percentile"] = "zscore",
    threshold: float = 3.0,
    groups: list[str] | None = None,
    facet_by: Literal["group", "member"] | None = "member",
    facet_n_cols: int = 2,
    color_palette: list[str] | None = None,
    show_legend: bool = True,
    title: str | None = None,
    x_label: str | None = None,
    y_label: str | None = None,
    width: int | None = None,
    height: int | None = None,
    connect_gaps: bool = False,
    resampler: bool | Literal["widget"] | None = None,
    line_width: float = 2.0,
    line_opacity: float = 0.7,
    outlier_color: str = "#DC2626",
    outlier_size: float = 8.0,
    outlier_symbol: str = "x",
    show_bounds: bool = True,
) -> go.Figure:
    """
    Plot time series with outlier points highlighted.

    Overlays the original line plot with scatter markers at detected outlier
    positions. The detection method and threshold are configurable.

    Parameters
    ----------
    df : pl.DataFrame
        Input DataFrame with 'time' column and numeric columns to plot.
    columns : str | list[str] | None, default=None
        Column(s) to analyze. If None, uses all numeric columns except 'time'.
    method : {"zscore", "iqr", "percentile"}, default="zscore"
        Outlier detection method:
        - "zscore": points with |z-score| > threshold (default 3.0)
        - "iqr": points outside [Q1 - threshold*IQR, Q3 + threshold*IQR] (default 1.5)
        - "percentile": points above the threshold-th percentile or below
          the (100-threshold)-th percentile (default 95.0, flags outer 5%)
    threshold : float, default=3.0
        Detection threshold. Interpretation depends on *method*.
    groups : list[str] | None, default=None
        Panel group prefixes to plot.
    facet_by : Literal["group", "member"] | None, default="member"
        Faceting axis for panel data.  ``"group"`` creates one subplot per
        group, ``"member"`` one per member.  ``None`` disables faceting.
        Ignored for non-panel data.
    facet_n_cols : int, default=2
        Number of columns in facet grid.
    color_palette : list[str] | None, default=None
        Custom color palette for the series lines.
    show_legend : bool, default=True
        Whether to display the legend.
    title : str | None, default=None
        Plot title.
    x_label : str | None, default=None
        X-axis label.
    y_label : str | None, default=None
        Y-axis label.
    width : int | None, default=None
        Plot width in pixels.
    height : int | None, default=None
        Plot height in pixels.
    show_legend : bool, default=True
        Whether to display the legend.
    connect_gaps : bool, default=False
        If True, connect lines across missing data gaps.
    resampler : bool | Literal["widget"] | None, default=None
        Enable plotly-resampler for large datasets.  ``True`` returns a
        ``FigureResampler``, ``"widget"`` a ``FigureWidgetResampler``,
        ``None`` reads from `get_config`.
    line_width : float, default=2.0
        Width of the series line in pixels.
    line_opacity : float, default=0.7
        Opacity of the series line.
    outlier_color : str, default="#DC2626"
        Color for outlier markers.
    outlier_size : float, default=8.0
        Size of outlier markers in pixels.
    outlier_symbol : str, default="x"
        Marker symbol for outlier points.
    show_bounds : bool, default=True
        Whether to show threshold boundary lines.

    Returns
    -------
    go.Figure
        Plotly figure object.

    Raises
    ------
    TypeError
        If df is not a Polars DataFrame.
    ValueError
        If method is unknown or threshold is invalid.

    Examples
    --------
    >>> import polars as pl
    >>> from yohou.plotting import plot_outliers

    >>> df = pl.DataFrame({
    ...     "time": pl.date_range(pl.date(2020, 1, 1), pl.date(2020, 12, 31), "1mo", eager=True),
    ...     "y": [100, 120, 115, 130, 140, 135, 150, 160, 155, 170, 180, 175],
    ... })
    >>> fig = plot_outliers(df, columns="y", method="zscore")
    >>> len(fig.data) > 0
    True

    See Also
    --------
    [`plot_time_series`][yohou.plotting.plot_time_series] : Plot basic time series.
    [`plot_boxplot`][yohou.plotting.plot_boxplot] : Plot boxplots grouped by time periods.
    """
    # Validate inputs
    validate_plotting_data(df)

    if method not in ("zscore", "iqr", "percentile"):
        msg = f"Unknown method: {method}. Valid options: zscore, iqr, percentile"
        raise ValueError(msg)
    validate_plotting_params(width=width, height=height)

    def _compute_outlier_mask(series: pl.Series) -> tuple[pl.Series, float | None, float | None]:
        """Return (is_outlier_bool_series, lower_bound, upper_bound)."""
        if method == "zscore":
            mean = series.mean()
            std = series.std()
            if not isinstance(mean, (int, float)) or not isinstance(std, (int, float)) or std == 0:
                return pl.Series([False] * len(series)), None, None
            lower = mean - threshold * std
            upper = mean + threshold * std
            mask = ((series - mean).abs() / std) > threshold
        elif method == "iqr":
            q1 = series.quantile(0.25)
            q3 = series.quantile(0.75)
            if not isinstance(q1, (int, float)) or not isinstance(q3, (int, float)):
                return pl.Series([False] * len(series)), None, None
            iqr = q3 - q1
            lower = q1 - threshold * iqr
            upper = q3 + threshold * iqr
            mask = (series < lower) | (series > upper)
        else:  # percentile
            lower = series.quantile(1 - threshold / 100)
            upper = series.quantile(threshold / 100)
            if not isinstance(lower, (int, float)) or not isinstance(upper, (int, float)):
                return pl.Series([False] * len(series)), None, None
            mask = (series < lower) | (series > upper)
        return mask.fill_null(False), lower, upper

    if groups is None and columns is None and _auto_detect_panel(df):
        groups = []

    if groups is not None:
        _color_mgr = PanelColorManager(color_palette)
        _legend_tracker = LegendTracker(show_legend=show_legend)

        def _render_outlier(ctx: RenderContext) -> None:
            """Render time series with outlier highlights for a single panel."""
            base = [c for c in ctx.sub_df.columns if c != "time"][0]
            _c = _color_mgr.get_color(ctx.display_name)
            _lg = linked_legendgroup_kwargs(ctx.display_name, _legend_tracker, is_primary=True)
            # Line trace
            ctx.fig.add_trace(
                go.Scatter(
                    x=ctx.sub_df["time"],
                    y=ctx.sub_df[base],
                    mode="lines",
                    line={"color": _c, "width": line_width},
                    opacity=line_opacity,
                    connectgaps=connect_gaps,
                    hovertemplate=(f"<b>{ctx.display_name}</b><br>%{{x}}<br>%{{y:.2f}}<extra></extra>"),
                    **_lg,
                ),
                row=ctx.row,
                col=ctx.col,
            )
            # Outlier markers
            mask, lower, upper = _compute_outlier_mask(ctx.sub_df[base])
            df_out = ctx.sub_df.filter(mask)
            _lg_sec = linked_legendgroup_kwargs(ctx.display_name, _legend_tracker, is_primary=False)
            if len(df_out) > 0:
                ctx.fig.add_trace(
                    go.Scatter(
                        x=df_out["time"],
                        y=df_out[base],
                        mode="markers",
                        marker={"color": _c, "size": outlier_size, "symbol": outlier_symbol},
                        hovertemplate=(f"<b>{base} OUTLIER</b><br>%{{x}}<br>%{{y:.2f}}<extra></extra>"),
                        **_lg_sec,
                    ),
                    row=ctx.row,
                    col=ctx.col,
                )
            # Threshold bounds
            if show_bounds and lower is not None and upper is not None:
                for val in (lower, upper):
                    ctx.fig.add_trace(
                        go.Scatter(
                            x=[ctx.sub_df["time"].min(), ctx.sub_df["time"].max()],
                            y=[val, val],
                            mode="lines",
                            line={"dash": "dash", "color": _c, "width": 1},
                            hoverinfo="skip",
                            **_lg_sec,
                        ),
                        row=ctx.row,
                        col=ctx.col,
                    )

        effective_facet_by = facet_by or "member"
        fig = facet_figure(
            df,
            _render_outlier,
            groups=groups,
            columns=columns,
            facet_by=effective_facet_by,
            facet_n_cols=facet_n_cols,
            title=title or "Outlier Detection",
            x_label=x_label or "Time",
            y_label=y_label,
            width=width,
            height=height,
            resampler=resampler,
        )
        fig.update_layout(showlegend=show_legend)
        return fig

    # Non-panel case: column-mode facet_figure
    plot_columns = validate_plotting_data(df, columns=columns, exclude=["time"])
    _colors = resolve_color_palette(color_palette, len(plot_columns))
    _col_colors = dict(zip(plot_columns, _colors, strict=False))

    def _render_outlier(ctx: RenderContext) -> None:
        """Render outlier detection for one column into a subplot."""
        base = ctx.display_name
        col_color = _col_colors[base]
        series = ctx.sub_df[base]
        mask, lower, upper = _compute_outlier_mask(series)

        ctx.fig.add_trace(
            go.Scatter(
                x=ctx.sub_df["time"],
                y=series,
                mode="lines",
                name=base,
                line={"color": col_color, "width": line_width},
                opacity=line_opacity,
                connectgaps=connect_gaps,
                hovertemplate=f"<b>{base}</b><br>%{{x}}<br>%{{y:.2f}}<extra></extra>",
            ),
            row=ctx.row,
            col=ctx.col,
        )

        df_outliers = df.filter(mask)
        if len(df_outliers) > 0:
            ctx.fig.add_trace(
                go.Scatter(
                    x=df_outliers["time"],
                    y=df_outliers[base],
                    mode="markers",
                    showlegend=False,
                    marker={"color": col_color, "size": outlier_size, "symbol": outlier_symbol},
                    hovertemplate=f"<b>{base} OUTLIER</b><br>%{{x}}<br>%{{y:.2f}}<extra></extra>",
                ),
                row=ctx.row,
                col=ctx.col,
            )

        if show_bounds and lower is not None and upper is not None:
            time_min = ctx.sub_df["time"].min()
            time_max = ctx.sub_df["time"].max()
            for bound_val in (lower, upper):
                ctx.fig.add_trace(
                    go.Scatter(
                        x=[time_min, time_max],
                        y=[bound_val, bound_val],
                        mode="lines",
                        showlegend=False,
                        line={"dash": "dash", "color": col_color, "width": 1},
                        hoverinfo="skip",
                    ),
                    row=ctx.row,
                    col=ctx.col,
                )

    fig = facet_figure(
        df,
        _render_outlier,
        columns=plot_columns,
        facet_n_cols=facet_n_cols,
        title=title or "Outlier Detection",
        x_label=x_label or "Time",
        y_label=y_label,
        width=width,
        height=height,
        resampler=resampler,
    )
    fig.update_layout(showlegend=show_legend)

    return fig

Tutorials

The following example notebooks use this component:

  • Exploratory Visualization


    Visualization

    Exploratory time series visualisation with raw series plots, rolling statistics overlays, seasonal overlays, subseries diagnostics, distribution boxplots, missing data pattern auditing, outlier detection, and resampling comparison.

    View · Open in marimo