home / skills / benchflow-ai / skillsbench / time_series_anomaly_detection

time_series_anomaly_detection skill

/tasks/trend-anomaly-causal-inference/environment/skills/time_series_anomaly_detection

This skill detects time series anomalies using Prophet to quantify deviations and highlight impactful surges or slumps for tracking metrics.

npx playbooks add skill benchflow-ai/skillsbench --skill time_series_anomaly_detection

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

6.0 KB

---
name: time_series_anomaly_detection
description: Detect anomalies in time series data using Prophet Framework (Meta), which frames the seasonality, trend holiday effect and other needed regressors into its model, to identify unusual surges or slumps in trends. This is a general methodology analyst can use for understanding what changes of their tracking metrics are manifesting anomalies pattern.
---

# Time Series Anomaly Detection Framework

Comprehensive time series anomaly detection framework leverages Facebook Prophet's time-sereis trend prediction (counterfactural) in the target window and compare it with the actual data. An anomaly index that measures how much deviate the actual number is from the expected trend (counterfactural) for each of the categories or groups within that period. The anomaly index ranking can be used to identify outerstanding groups/categories that manifest unexpected surge and slump with quantitative index ranged -100 to 100.

## Core Components

### TimeSeriesAnomalyDetector
Main class for anomaly detection with configurable parameters.
#### Constructor Parameters
- `min_training_days` (int): Minimum days of the available training data required per category (default: 180). Groups have less than this number of days data would be skipped
- `confidence_interval` (float): Prophet confidence interval width (default: 0.68 = ±1 std) as the starting point to measure the unexpected deviation
- `changepoint_prior_scale` (float): Flexibility of trend changes (default: 0.05) However, if dataset is not in a big size, use 0.1
- `seasonality_prior_scale` (float): Strength of seasonality (default: 10.0)

#### Anomaly Index Calculation (This is CRITICAL and is the key to identify unsual pattern)
- **Within confidence interval**: Index = 0
- **Above upper bound**: Index = (actual - upper) / std
- **Below lower bound**: Index = (actual - lower) / std
- **Scaled Index**: 100 × tanh(raw_index) → maps to [-100, 100] so after this transformation
    - Index=1 → ~76 points
    - index=2 → ~96 points
    - index=3 → ~99.5 points (approaches ±100)

## Usage

```python
from anomaly_detection import TimeSeriesAnomalyDetector

# Initialize detector
detector = TimeSeriesAnomalyDetector(
    min_training_days=180, # depends on the dataset span (usually 50%+ of the dataset time span could be good threshold)
    confidence_interval=0.68 # 1 std for a good intepretability visually and statistically
)

# Run anomaly detection
results = detector.detect_anomalies(
    df=transaction_df,
    date_col='<temporal column>',
    category_col='<grouping column>',  
    value_col='<metric to analyze for anomalies>',
    cutoff_date='<training and prediction split date, format YYYY-MM-DD>',
    prediction_end='end of prediction window',
    agg_func='sum',    # Aggregation: 'sum', 'mean', 'count', or None (e.g if data is not transactional data and value_col value is ready for the modeling)
)

# Access results
anomaly_summary = results['anomaly_summary']  # All categories with anomaly indices

# Get top 10 postive anomaly groups
top_surge = results['anomaly_summary'].head(10)['<grouping column>'].tolist()

# Get top 10 negative anomaly groups (descendingly)
top_slump = results['anomaly_summary'].tail(10)['<grouping column>'].tolist()
```

## Input Format
**Required DataFrame columns:**
- Date column (e.g., `Order Date`, `Transaction Date`,  `Purchase Date`) - temporal dimension
- Category column (e.g., `Category`, `Region`) - grouping dimension | Granularity that the method will be applied in loops
- Value column (e.g., `Price`, `Spend`,`Revenue`, `Quantity`) - metric to detect anomalies in (This is very important as this, directly or indirectly, is the y of the modeling)
- Optional: Additional columns (e.g., `User ID`)

**Parameters:**
- `cutoff_date`: Divides training data (before) from prediction period (after)
- `prediction_end`: End of anomaly detection window
- `agg_func`: How to aggregate values per category per day
  - `'sum'`: Sum all values of the target column for each category-date (e.g user-level purchase data)
  - `'mean'`: Average values  
  - `'count'`: Count records
  - `None`: No aggregation (e.g non transactional data)

## Output

`detect_anomalies()` returns a dictionary with 3 keys:

```python
{
    'anomaly_summary': pd.DataFrame,         # Summary metrics per category
    'predictions': Dict[str, pd.DataFrame],  # Prophet forecasts per category  
    'models': Dict[str, Prophet]             # Trained models per category
}
```

### Output `anomaly_summary` Preview

| Column | Type | Description |
|--------|------|-------------|
| `{category_col}` | str | Category name (column name matches your input) |
| `Anomaly_Index` | float | Scaled anomaly score in [-100, 100] |
| `days_with_data` | int | Days with actual data in prediction window |
| `avg_daily_anomaly` | float | Average of daily scaled anomalies |
| `max_daily_anomaly` | float | Maximum daily scaled anomaly |
| `min_daily_anomaly` | float | Minimum daily scaled anomaly |
| `total_actual` | float | Sum of actual values in prediction window |
| `total_predicted` | float | Sum of predicted (yhat) values |

**Sorted by `Anomaly_Index` descending**

### `predictions` Dictionary

Key = category name, Value = DataFrame with columns:
- `ds`: Date  
- `yhat`: Prophet prediction
- `yhat_lower`: Lower confidence bound
- `yhat_upper`: Upper confidence bound

### `models` Dictionary

Key = category name, Value = Fitted Prophet model object


## Key Features
- Pure anomaly detection without business logic (no hardcoded top N)
- Automatic daily aggregation by grouping dimension
- Prophet model training with seasonality (daily, weekly, yearly)
- Counterfactual prediction for intervention analysis
- Scaled anomaly index [-100, 100] for interpretability

## Best Practices
- Ensure sufficient training data (180+ days recommended)
- Use `cutoff_date` to define pre/post intervention periods
- Use `agg_func=None` if data already at daily category level
- Apply your own filtering logic on `anomaly_summary` (top N, thresholds, etc.)
- Chain with DID analysis for causal inference on selected categories

Overview

This skill detects anomalies in time series data using the Prophet forecasting framework to build counterfactual trend, seasonality, and holiday-aware predictions. It compares actuals to predicted bounds and produces a scaled anomaly index in the range [-100, 100] to rank unusual surges or slumps across categories. Results include per-category forecasts, fitted models, and a concise anomaly summary for prioritization.

How this skill works

The detector trains a Prophet model per category using pre-cutoff historical data, generating yhat, yhat_lower, and yhat_upper for the prediction window. For each day it computes a raw deviation relative to the nearest confidence bound, normalizes by the forecast std, and maps the raw index through 100 × tanh(raw_index) to produce a scaled score in [-100, 100]. Daily scores are aggregated per category to produce the final anomaly index and summary metrics.

When to use it

Detect unexpected surges or drops in metrics split by category, region, product, or cohort.
Monitor post-launch or post-intervention windows using a cutoff date to compare counterfactuals to actuals.
Prioritize investigative efforts by ranking categories with largest positive or negative deviations.
Validate whether observed changes are outside normal seasonality and trend behavior.
Complement causal analysis (e.g., difference-in-differences) by pre-selecting candidate groups with large anomalies.

Best practices

Provide sufficient historical data per category (180+ days recommended) to let Prophet learn seasonality and trend.
Choose cutoff_date to cleanly separate training from prediction windows for the target event or intervention.
Select agg_func = 'sum'|'mean'|'count' or None to match your data granularity before modeling.
Tune changepoint_prior_scale and seasonality_prior_scale for small datasets or unusual trend flexibility.
Filter anomaly_summary by days_with_data and total_predicted before acting on extreme scores.

Example use cases

Identify geographic regions with unexpected revenue surges after a marketing campaign.
Detect product categories experiencing unexplained demand drops during a promotion.
Monitor daily active users by cohort to find abnormal engagement changes after a feature launch.
Spot sudden spikes in error rates or system metrics to trigger engineering investigations.
Shortlist candidate groups for causal analysis after detecting large counterfactual deviations.

FAQ

What does the anomaly index represent?

The index is a scaled measure of how far actuals deviate from the Prophet confidence bound, transformed by 100 × tanh(raw_index) to map values into [-100, 100] for interpretability.

How do I handle groups with little historical data?

Skip or exclude groups with fewer than min_training_days (default 180). For smaller datasets, lower min_training_days and increase changepoint_prior_scale to 0.1 for more flexible trends.