home / skills / benchflow-ai / skillsbench / conditioning
/tasks/gravitational-wave-detection/environment/skills/conditioning
This skill pre-processes gravitational wave data by high-pass filtering, resampling, crop-wrapping, and PSD estimation to prepare for matched filtering.
npx playbooks add skill benchflow-ai/skillsbench --skill conditioningReview the files below or copy the command above to add this skill to your agents.
---
name: conditioning
description: Data conditioning techniques for gravitational wave detector data. Use when preprocessing raw detector strain data before matched filtering, including high-pass filtering, resampling, removing filter wraparound artifacts, and estimating power spectral density (PSD). Works with PyCBC TimeSeries data.
---
# Gravitational Wave Data Conditioning
Data conditioning is essential before matched filtering. Raw gravitational wave detector data contains low-frequency noise, instrumental artifacts, and needs proper sampling rates for computational efficiency.
## Overview
The conditioning pipeline typically involves:
1. High-pass filtering (remove low-frequency noise below ~15 Hz)
2. Resampling (downsample to appropriate sampling rate)
3. Crop filter wraparound (remove edge artifacts from filtering)
4. PSD estimation (calculate power spectral density for matched filtering)
## High-Pass Filtering
Remove low-frequency noise and instrumental artifacts:
```python
from pycbc.filter import highpass
# High-pass filter at 15 Hz (typical for LIGO/Virgo data)
strain_filtered = highpass(strain, 15.0)
# Common cutoff frequencies:
# 15 Hz: Standard for ground-based detectors
# 20 Hz: Higher cutoff, more aggressive noise removal
# 10 Hz: Lower cutoff, preserves more low-frequency content
```
**Why 15 Hz?** Ground-based detectors like LIGO/Virgo have significant low-frequency noise. High-pass filtering removes this noise while preserving the gravitational wave signal (typically >20 Hz for binary mergers).
## Resampling
Downsample the data to reduce computational cost:
```python
from pycbc.filter import resample_to_delta_t
# Resample to 2048 Hz (common for matched filtering)
delta_t = 1.0 / 2048
strain_resampled = resample_to_delta_t(strain_filtered, delta_t)
# Or to 4096 Hz for higher resolution
delta_t = 1.0 / 4096
strain_resampled = resample_to_delta_t(strain_filtered, delta_t)
# Common sampling rates:
# 2048 Hz: Standard, computationally efficient
# 4096 Hz: Higher resolution, better for high-mass systems
```
**Note**: Resampling should happen AFTER high-pass filtering to avoid aliasing. The Nyquist frequency (half the sampling rate) must be above the signal frequency of interest.
## Crop Filter Wraparound
Remove edge artifacts introduced by filtering:
```python
# Crop 2 seconds from both ends to remove filter wraparound
conditioned = strain_resampled.crop(2, 2)
# The crop() method removes time from start and end:
# crop(start_seconds, end_seconds)
# Common values: 2-4 seconds on each end
# Verify the duration
print(f"Original duration: {strain_resampled.duration} s")
print(f"Cropped duration: {conditioned.duration} s")
```
**Why crop?** Digital filters introduce artifacts at the edges of the time series. These artifacts can cause false triggers in matched filtering.
## Power Spectral Density (PSD) Estimation
Calculate the PSD needed for matched filtering:
```python
from pycbc.psd import interpolate, inverse_spectrum_truncation
# Estimate PSD using Welch's method
# seg_len: segment length in seconds (typically 4 seconds)
psd = conditioned.psd(4)
# Interpolate PSD to match data frequency resolution
psd = interpolate(psd, conditioned.delta_f)
# Inverse spectrum truncation for numerical stability
# This limits the effective filter length
psd = inverse_spectrum_truncation(
psd,
int(4 * conditioned.sample_rate),
low_frequency_cutoff=15
)
# Check PSD properties
print(f"PSD length: {len(psd)}")
print(f"PSD delta_f: {psd.delta_f}")
print(f"PSD frequency range: {psd.sample_frequencies[0]:.2f} - {psd.sample_frequencies[-1]:.2f} Hz")
```
### PSD Parameters Explained
- **Segment length (4 seconds)**: Longer segments give better frequency resolution but fewer averages. 4 seconds is a good balance.
- **Low frequency cutoff (15 Hz)**: Should match your high-pass filter cutoff. Frequencies below this are not well-characterized.
## Best Practices
1. **Always high-pass filter first**: Remove low-frequency noise before resampling
2. **Choose appropriate sampling rate**: 2048 Hz is standard, 4096 Hz for high-mass systems
3. **Crop enough time**: 2 seconds is minimum, but may need more for longer templates
4. **Match PSD cutoff to filter**: PSD low-frequency cutoff should match high-pass filter frequency
5. **Verify data quality**: Plot the conditioned strain to check for issues
## Dependencies
```bash
pip install pycbc
```
## References
- [PyCBC Tutorial: Waveform & Matched Filter](https://colab.research.google.com/github/gwastro/pycbc-tutorials/blob/master/tutorial/3_WaveformMatchedFilter.ipynb) - Practical tutorial covering data conditioning, PSD estimation, and matched filtering
- [PyCBC Filter Documentation](https://pycbc.org/pycbc/latest/html/pycbc.filter.html)
- [PyCBC PSD Documentation](https://pycbc.org/pycbc/latest/html/pycbc.psd.html)
## Common Issues
**Problem**: PSD estimation fails with "must contain at least one sample" error
- **Solution**: Ensure data is long enough after cropping (need several segments for Welch method)
**Problem**: Filter wraparound artifacts in matched filtering
- **Solution**: Increase crop amount or check that filtering happened before cropping
**Problem**: Poor SNR due to low-frequency noise
- **Solution**: Increase high-pass filter cutoff frequency or check PSD inverse spectrum truncation
This skill provides data conditioning routines for gravitational-wave detector strain data prior to matched filtering. It standardizes high-pass filtering, resampling, crop of filter wraparound, and robust PSD estimation for PyCBC TimeSeries inputs. The goal is reliable, repeatable preprocessing that reduces noise and avoids common artifacts.
The skill applies a high-pass filter to remove low-frequency detector noise, then resamples the time series to a chosen sampling rate to balance resolution and cost. It crops the ends to remove filter wraparound artifacts and computes a PSD using Welch averaging, interpolation, and inverse-spectrum truncation for numerical stability. All steps use PyCBC TimeSeries-compatible operations and configurable cutoffs and segment lengths.
Why use 15 Hz for the high-pass cutoff?
Ground-based detectors have strong low-frequency noise; 15 Hz commonly balances noise removal while preserving typical binary merger signals above ~20 Hz.
How much should I crop after filtering?
Crop at least 2 seconds from each end; increase to 3–4 seconds or more if templates are long or if wraparound persists.