home / skills / benchflow-ai / skillsbench / obspy-data-api

obspy-data-api skill

safe

/tasks/seismic-phase-picking/environment/skills/obspy-data-api

This skill explains how to access and manipulate ObsPy waveform, event, and inventory data via the core Data API for seismology workflows.

npx playbooks add skill benchflow-ai/skillsbench --skill obspy-data-api

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.4 KB

---
name: obspy-data-api
description: An overview of the core data API of ObsPy, a Python framework for processing seismological data. It is useful for parsing common seismological file formats, or manipulating custom data into standard objects for downstream use cases such as ObsPy's signal processing routines or SeisBench's modeling API.
---

# ObsPy Data API

## Waveform Data

### Summary

Seismograms of various formats (e.g. SAC, MiniSEED, GSE2, SEISAN, Q, etc.) can be imported into a `Stream` object using the `read()` function.

Streams are list-like objects which contain multiple `Trace` objects, i.e. gap-less continuous time series and related header/meta information.

Each Trace object has the attribute `data` pointing to a NumPy `ndarray` of the actual time series and the attribute `stats` which contains all meta information in a dict-like `Stats` object. Both attributes `starttime` and `endtime` of the Stats object are `UTCDateTime` objects.

A multitude of helper methods are attached to `Stream` and `Trace` objects for handling and modifying the waveform data.

### Stream and Trace Class Structure

**Hierarchy:** `Stream` → `Trace` (multiple)

**Trace - DATA:**
- `data` → NumPy array
- `stats`:
  - `network`, `station`, `location`, `channel` — Determine physical location and instrument
  - `starttime`, `sampling_rate`, `delta`, `endtime`, `npts` — Interrelated

**Trace - METHODS:**
- `taper()` — Tapers the data.
- `filter()` — Filters the data.
- `resample()` — Resamples the data in the frequency domain.
- `integrate()` — Integrates the data with respect to time.
- `remove_response()` — Deconvolves the instrument response.

### Example

A `Stream` with an example seismogram can be created by calling `read()` without any arguments. Local files can be read by specifying the filename, files stored on http servers (e.g. at https://examples.obspy.org) can be read by specifying their URL.

```python
>>> from obspy import read
>>> st = read()
>>> print(st)
3 Trace(s) in Stream:
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - ... | 100.0 Hz, 3000 samples
BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - ... | 100.0 Hz, 3000 samples
BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - ... | 100.0 Hz, 3000 samples
>>> tr = st[0]
>>> print(tr)
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - ... | 100.0 Hz, 3000 samples
>>> tr.data
array([ 0.        ,  0.00694644,  0.07597424, ...,  1.93449584,
        0.98196204,  0.44196924])
>>> print(tr.stats)
         network: BW
         station: RJOB
        location:
         channel: EHZ
       starttime: 2009-08-24T00:20:03.000000Z
         endtime: 2009-08-24T00:20:32.990000Z
   sampling_rate: 100.0
           delta: 0.01
            npts: 3000
           calib: 1.0
           ...
>>> tr.stats.starttime
UTCDateTime(2009, 8, 24, 0, 20, 3)
```

## Event Metadata

Event metadata are handled in a hierarchy of classes closely modelled after the de-facto standard format [QuakeML](https://quake.ethz.ch/quakeml/). See `read_events()` and `Catalog.write()` for supported formats.

### Event Class Structure

**Hierarchy:** `Catalog` → `events` → `Event` (multiple)

**Event contains:**
- `origins` → `Origin` (multiple)
  - `latitude`, `longitude`, `depth`, `time`, ...
- `magnitudes` → `Magnitude` (multiple)
  - `mag`, `magnitude_type`, ...
- `picks`
- `focal_mechanisms`

## Station Metadata

Station metadata are handled in a hierarchy of classes closely modelled after the de-facto standard format [FDSN StationXML](https://www.fdsn.org/xml/station/) which was developed as a human readable XML replacement for Dataless SEED. See `read_inventory()` and `Inventory.write()` for supported formats.

### Inventory Class Structure

**Hierarchy:** `Inventory` → `networks` → `Network` → `stations` → `Station` → `channels` → `Channel`

**Network:**
- `code`, `description`, ...

**Station:**
- `code`, `latitude`, `longitude`, `elevation`, `start_date`, `end_date`, ...

**Channel:**
- `code`, `location_code`, `latitude`, `longitude`, `elevation`, `depth`, `dip`, `azimuth`, `sample_rate`, `start_date`, `end_date`, `response`, ...

## Classes & Functions

| Class/Function | Description |
|----------------|-------------|
| `read` | Read waveform files into an ObsPy `Stream` object. |
| `Stream` | List-like object of multiple ObsPy `Trace` objects. |
| `Trace` | An object containing data of a continuous series, such as a seismic trace. |
| `Stats` | A container for additional header information of an ObsPy `Trace` object. |
| `UTCDateTime` | A UTC-based datetime object. |
| `read_events` | Read event files into an ObsPy `Catalog` object. |
| `Catalog` | Container for `Event` objects. |
| `Event` | Describes a seismic event which does not necessarily need to be a tectonic earthquake. |
| `read_inventory` | Function to read inventory files. |
| `Inventory` | The root object of the `Network` → `Station` → `Channel` hierarchy. |

## Modules

| Module | Description |
|--------|-------------|
| `obspy.core.trace` | Module for handling ObsPy `Trace` and `Stats` objects. |
| `obspy.core.stream` | Module for handling ObsPy `Stream` objects. |
| `obspy.core.utcdatetime` | Module containing a UTC-based datetime class. |
| `obspy.core.event` | Module handling event metadata. |
| `obspy.core.inventory` | Module for handling station metadata. |
| `obspy.core.util` | Various utilities for ObsPy. |
| `obspy.core.preview` | Tools for creating and merging previews. |

Overview

This skill provides an overview of the core data API of ObsPy, a Python framework for processing seismological data. It explains waveform, event, and station metadata objects and the common functions used to read and manipulate them. The content is focused on how to parse common seismological file formats and convert custom data into standard ObsPy objects for downstream processing.

How this skill works

The skill describes three main object hierarchies: Stream → Trace for waveform time series, Catalog → Event for event metadata, and Inventory → Network → Station → Channel for station metadata. It explains how read(), read_events(), and read_inventory() parse files into those objects, and how Trace.data (NumPy arrays) and Trace.stats (metadata with UTCDateTime) are used. It also summarizes key methods available on Trace and Stream for filtering, tapering, resampling, integrating, and removing instrument response.

When to use it

Import seismograms from SAC, MiniSEED, GSE2, SEISAN, Q, or HTTP-hosted example files into ObsPy Stream objects.
Prepare waveform arrays and metadata for ObsPy signal processing routines or machine-learning pipelines like SeisBench.
Parse QuakeML event catalogs and manipulate origins, magnitudes, and picks programmatically.
Load FDSN StationXML station inventories and inspect channel responses, sample rates, and instrument geometry.
Convert custom seismic data formats into ObsPy Trace/Stream or Catalog/Inventory for downstream analysis.

Best practices

Keep Trace.data as NumPy arrays and use Trace.stats for all timing and metadata to ensure compatibility across ObsPy functions.
Validate sampling_rate, delta, npts, starttime, and endtime after any resampling or trimming operations.
Use remove_response() before amplitude-based analyses to work in physical units, and keep original data copies for reproducibility.
Store station response and channel geometry in Inventory objects so response deconvolution and instrument corrections are reproducible.
Prefer read(), read_events(), and read_inventory() where possible to leverage ObsPy's format detection and parsing robustness.

Example use cases

Read three-component continuous data from a MiniSEED file into a Stream, apply bandpass filtering, and resample for model input.
Load a QuakeML catalog with read_events(), extract origin times and magnitudes, and cross-match with waveform arrivals.
Import a StationXML inventory, inspect channel sample_rate and response, then remove the instrument response from traces.
Convert proprietary sensor output into Trace objects with proper stats fields and save standardized MiniSEED for sharing.
Build training datasets by assembling Stream slices aligned to event origin times for use with SeisBench or custom ML pipelines.

FAQ

What object holds waveform samples and metadata?

A Trace holds waveform samples in data (NumPy array) and metadata in stats, and multiple Traces form a Stream.

How do I get time information for a trace?

Use tr.stats.starttime and tr.stats.endtime which are UTCDateTime objects; sampling_rate and delta describe time spacing.