home / skills / benchflow-ai / skillsbench / conditioning

This skill pre-processes gravitational wave data by high-pass filtering, resampling, crop-wrapping, and PSD estimation to prepare for matched filtering.

npx playbooks add skill benchflow-ai/skillsbench --skill conditioning

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.2 KB
---
name: conditioning
description: Data conditioning techniques for gravitational wave detector data. Use when preprocessing raw detector strain data before matched filtering, including high-pass filtering, resampling, removing filter wraparound artifacts, and estimating power spectral density (PSD). Works with PyCBC TimeSeries data.
---

# Gravitational Wave Data Conditioning

Data conditioning is essential before matched filtering. Raw gravitational wave detector data contains low-frequency noise, instrumental artifacts, and needs proper sampling rates for computational efficiency.

## Overview

The conditioning pipeline typically involves:
1. High-pass filtering (remove low-frequency noise below ~15 Hz)
2. Resampling (downsample to appropriate sampling rate)
3. Crop filter wraparound (remove edge artifacts from filtering)
4. PSD estimation (calculate power spectral density for matched filtering)

## High-Pass Filtering

Remove low-frequency noise and instrumental artifacts:

```python
from pycbc.filter import highpass

# High-pass filter at 15 Hz (typical for LIGO/Virgo data)
strain_filtered = highpass(strain, 15.0)

# Common cutoff frequencies:
# 15 Hz: Standard for ground-based detectors
# 20 Hz: Higher cutoff, more aggressive noise removal
# 10 Hz: Lower cutoff, preserves more low-frequency content
```

**Why 15 Hz?** Ground-based detectors like LIGO/Virgo have significant low-frequency noise. High-pass filtering removes this noise while preserving the gravitational wave signal (typically >20 Hz for binary mergers).

## Resampling

Downsample the data to reduce computational cost:

```python
from pycbc.filter import resample_to_delta_t

# Resample to 2048 Hz (common for matched filtering)
delta_t = 1.0 / 2048
strain_resampled = resample_to_delta_t(strain_filtered, delta_t)

# Or to 4096 Hz for higher resolution
delta_t = 1.0 / 4096
strain_resampled = resample_to_delta_t(strain_filtered, delta_t)

# Common sampling rates:
# 2048 Hz: Standard, computationally efficient
# 4096 Hz: Higher resolution, better for high-mass systems
```

**Note**: Resampling should happen AFTER high-pass filtering to avoid aliasing. The Nyquist frequency (half the sampling rate) must be above the signal frequency of interest.

## Crop Filter Wraparound

Remove edge artifacts introduced by filtering:

```python
# Crop 2 seconds from both ends to remove filter wraparound
conditioned = strain_resampled.crop(2, 2)

# The crop() method removes time from start and end:
# crop(start_seconds, end_seconds)
# Common values: 2-4 seconds on each end

# Verify the duration
print(f"Original duration: {strain_resampled.duration} s")
print(f"Cropped duration: {conditioned.duration} s")
```

**Why crop?** Digital filters introduce artifacts at the edges of the time series. These artifacts can cause false triggers in matched filtering.

## Power Spectral Density (PSD) Estimation

Calculate the PSD needed for matched filtering:

```python
from pycbc.psd import interpolate, inverse_spectrum_truncation

# Estimate PSD using Welch's method
# seg_len: segment length in seconds (typically 4 seconds)
psd = conditioned.psd(4)

# Interpolate PSD to match data frequency resolution
psd = interpolate(psd, conditioned.delta_f)

# Inverse spectrum truncation for numerical stability
# This limits the effective filter length
psd = inverse_spectrum_truncation(
    psd,
    int(4 * conditioned.sample_rate),
    low_frequency_cutoff=15
)

# Check PSD properties
print(f"PSD length: {len(psd)}")
print(f"PSD delta_f: {psd.delta_f}")
print(f"PSD frequency range: {psd.sample_frequencies[0]:.2f} - {psd.sample_frequencies[-1]:.2f} Hz")
```

### PSD Parameters Explained

- **Segment length (4 seconds)**: Longer segments give better frequency resolution but fewer averages. 4 seconds is a good balance.
- **Low frequency cutoff (15 Hz)**: Should match your high-pass filter cutoff. Frequencies below this are not well-characterized.

## Best Practices

1. **Always high-pass filter first**: Remove low-frequency noise before resampling
2. **Choose appropriate sampling rate**: 2048 Hz is standard, 4096 Hz for high-mass systems
3. **Crop enough time**: 2 seconds is minimum, but may need more for longer templates
4. **Match PSD cutoff to filter**: PSD low-frequency cutoff should match high-pass filter frequency
5. **Verify data quality**: Plot the conditioned strain to check for issues

## Dependencies

```bash
pip install pycbc
```

## References

- [PyCBC Tutorial: Waveform & Matched Filter](https://colab.research.google.com/github/gwastro/pycbc-tutorials/blob/master/tutorial/3_WaveformMatchedFilter.ipynb) - Practical tutorial covering data conditioning, PSD estimation, and matched filtering
- [PyCBC Filter Documentation](https://pycbc.org/pycbc/latest/html/pycbc.filter.html)
- [PyCBC PSD Documentation](https://pycbc.org/pycbc/latest/html/pycbc.psd.html)

## Common Issues

**Problem**: PSD estimation fails with "must contain at least one sample" error
- **Solution**: Ensure data is long enough after cropping (need several segments for Welch method)

**Problem**: Filter wraparound artifacts in matched filtering
- **Solution**: Increase crop amount or check that filtering happened before cropping

**Problem**: Poor SNR due to low-frequency noise
- **Solution**: Increase high-pass filter cutoff frequency or check PSD inverse spectrum truncation

Overview

This skill provides data conditioning routines for gravitational-wave detector strain data prior to matched filtering. It standardizes high-pass filtering, resampling, crop of filter wraparound, and robust PSD estimation for PyCBC TimeSeries inputs. The goal is reliable, repeatable preprocessing that reduces noise and avoids common artifacts.

How this skill works

The skill applies a high-pass filter to remove low-frequency detector noise, then resamples the time series to a chosen sampling rate to balance resolution and cost. It crops the ends to remove filter wraparound artifacts and computes a PSD using Welch averaging, interpolation, and inverse-spectrum truncation for numerical stability. All steps use PyCBC TimeSeries-compatible operations and configurable cutoffs and segment lengths.

When to use it

  • Before matched filtering of LIGO/Virgo/KAGRA strain data to prepare clean inputs.
  • When downsampling data to 2048 Hz or 4096 Hz to reduce computational cost.
  • If low-frequency noise contaminates signal bands below ~15–20 Hz.
  • When PSD estimates are required for whitening or matched-filter SNR calculation.
  • To remove edge artifacts that could generate false triggers in pipelines.

Best practices

  • High-pass filter first (typical cutoff 15 Hz) to avoid aliasing on resampling.
  • Resample after filtering; 2048 Hz is standard, use 4096 Hz when higher resolution is needed.
  • Crop 2–4 seconds from each end (adjust upward for long templates) to remove wraparound artifacts.
  • Use 4 s PSD segments as a default; match PSD low-frequency cutoff to the filter cutoff.
  • Perform visual checks of conditioned strain and PSD to validate preprocessing steps.

Example use cases

  • Prepare detector strain for matched filtering to search for compact binary coalescences.
  • Downsample long-duration data for bulk template-bank filtering with reduced CPU cost.
  • Estimate stable PSDs for whitening and SNR calculations in parameter-estimation pipelines.
  • Remove edge artifacts before triggering candidate events to reduce false alarms.

FAQ

Why use 15 Hz for the high-pass cutoff?

Ground-based detectors have strong low-frequency noise; 15 Hz commonly balances noise removal while preserving typical binary merger signals above ~20 Hz.

How much should I crop after filtering?

Crop at least 2 seconds from each end; increase to 3–4 seconds or more if templates are long or if wraparound persists.