home / skills / benchflow-ai / skillsbench / seisbench-model-api

seisbench-model-api skill

Q: Do I need to convert waveforms to tensors before using SeisBench?

No. Pass obspy Stream objects directly; SeisBench converts them to PyTorch tensors and back for you.

Q: How do I get pretrained weights?

Call the model's from_pretrained method (for example PhaseNet.from_pretrained) and weights will be downloaded and cached on first use.

Q: What is the fastest way to process large datasets?

Use a GPU with a large batch_size, compile the model with torch.compile if available, and use the asyncio interfaces to parallelize I/O and compute.

safe

/tasks/seismic-phase-picking/environment/skills/seisbench-model-api

This skill helps you apply SeisBench's WaveformModel interface to annotate and classify seismic streams using pretrained models.

npx playbooks add skill benchflow-ai/skillsbench --skill seisbench-model-api

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.2 KB

---
name: seisbench-model-api
description: An overview of the core model API of SeisBench, a Python framework for training and applying machine learning algorithms to seismic data. It is useful for annotating waveforms using pretrained SOTA ML models, for tasks like phase picking, earthquake detection, waveform denoising and depth estimation. For any waveform, you can manipulate it into an obspy stream object and it will work seamlessly with seisbench models.
---

# SeisBench Model API

## Installing SeisBench
The recommended way is installation through pip. Simply run:
```
pip install seisbench
```

## Overview

SeisBench offers the abstract class `WaveformModel` that every SeisBench model should subclass. This class offers two core functions, `annotate` and `classify`. Both of the functions are automatically generated based on configurations and submethods implemented in the specific model.

The `SeisBenchModel` bridges the gap between the pytorch interface of the models and the obspy interface common in seismology. It automatically assembles obspy streams into pytorch tensors and reassembles the results into streams. It also takes care of batch processing. Computations can be run on GPU by simply moving the model to GPU.

The `annotate` function takes an obspy stream object as input and returns annotations as stream again. For example, for picking models the output would be the characteristic functions, i.e., the pick probabilities over time.

```python
stream = obspy.read("my_waveforms.mseed")
annotations = model.annotate(stream)  # Returns obspy stream object with annotations
```

The `classify` function also takes an obspy stream as input, but in contrast to the `annotate` function returns discrete results. The structure of these results might be model dependent. For example, a pure picking model will return a list of picks, while a picking and detection model might return a list of picks and a list of detections.

```python
stream = obspy.read("my_waveforms.mseed")
outputs = model.classify(stream)  # Returns a list of picks
print(outputs)
```

Both `annotate` and `classify` can be supplied with waveforms from multiple stations at once and will automatically handle the correct grouping of the traces. For details on how to build your own model with SeisBench, check the documentation of `WaveformModel`. For details on how to apply models, check out the Examples.

## Loading Pretrained Models

For annotating waveforms in a meaningful way, trained model weights are required. SeisBench offers a range of pretrained model weights through a common interface. Model weights are downloaded on the first use and cached locally afterwards. For some model weights, multiple versions are available. For details on accessing these, check the documentation at `from_pretrained`.

```python
import seisbench.models as sbm

sbm.PhaseNet.list_pretrained()                  # Get available models
model = sbm.PhaseNet.from_pretrained("original")  # Load the original model weights released by PhaseNet authors
```

Pretrained models can not only be used for annotating data, but also offer a great starting point for transfer learning.

## Speeding Up Model Application

When applying models to large datasets, run time is often a major concern. Here are a few tips to make your model run faster:

- **Run on GPU.** Execution on GPU is usually faster, even though exact speed-ups vary between models. However, we note that running on GPU is not necessarily the most economic option. For example, in cloud applications it might be cheaper (and equally fast) to pay for a handful of CPU machines to annotate a large dataset than for a GPU machine.

- **Use a large `batch_size`.** This parameter can be passed as an optional argument to all models. Especially on GPUs, larger batch sizes lead to faster annotations. As long as the batch fits into (GPU) memory, it might be worth increasing the batch size.

- **Compile your model (torch 2.0+).** If you are using torch in version 2.0 or newer, compile your model. It's as simple as running `model = torch.compile(model)`. The compilation will take some time but if you are annotating large amounts of waveforms, it should pay off quickly. Note that there are many options for compile that might influence the performance gains considerably.

- **Use asyncio interface.** Load data in parallel while executing the model using the asyncio interface, i.e., `annotate_asyncio` and `classify_asyncio`. This is usually substantially faster because data loading is IO-bound while the actual annotation is compute-bound.

- **Manual resampling.** While SeisBench can automatically resample the waveforms, it can be faster to do the resampling manually beforehand. SeisBench uses obspy routines for resampling, which (as of 2023) are not parallelised. Check the required sampling rate with `model.sampling_rate`. Alternative routines are available, e.g., in the Pyrocko library.

## Models Integrated into SeisBench

You don't have to build models from scratch if you don't want to. SeisBench integrates the following notable models from the literature for you to use. Again, as they inherit from the common SeisBench model interface, all these deep learning models are constructed through PyTorch. Where possible, the original trained weights are imported and made available. These can be accessed via the `from_pretrained` method.

| Integrated Model | Task |
|------------------|------|
| `BasicPhaseAE` | Phase Picking |
| `CRED` | Earthquake Detection |
| `DPP` | Phase Picking |
| `DepthPhaseNet` | Depth estimation from depth phases |
| `DepthPhaseTEAM` | Depth estimation from depth phases |
| `DeepDenoiser` | Denoising |
| `SeisDAE` | Denoising |
| `EQTransformer` | Earthquake Detection/Phase Picking |
| `GPD` | Phase Picking |
| `LFEDetect` | Phase Picking (Low-frequency earthquakes) |
| `OBSTransformer` | Earthquake Detection/Phase Picking |
| `PhaseNet` | Phase Picking |
| `PhaseNetLight` | Phase Picking |
| `PickBlue` | Earthquake Detection/Phase Picking |
| `Skynet` | Phase Picking |
| `VariableLengthPhaseNet` | Phase Picking |

Currently integrated models are capable of earthquake detection and phase picking, waveform denoising, depth estimation, and low-frequency earthquake phase picking. Furthermore, with SeisBench you can build ML models to perform general seismic tasks such as magnitude and source parameter estimation, hypocentre determination etc.

## Best Practices
- If the waveform data happen to be extremely small in scale (`<=1e-10`), there might be risk of numerical instability. It is acceptable to increase the value first (by multiplying a large number like `1e10`) before normalization or passing to the model.

- Although the seisbench model API will normalize the waveform for you, it is still highly suggested to apply normalization yourself. Since seisbench's normalization scheme uses an epsilon `(waveform - mean(waveform)) / (std(waveform) + epsilon)`, for extremely small values (such as `<=1e-10`), their normalization can destroy the signals in the waveform.

- The seisbench model API can process a stream of waveform data of arbitrary length. Hence, it is not necessary to segment the data yourself. In addition, you should not assume a stream of waveform can only contain one P-wave and one S-wave. It is the best to treat the stream like what it is: a stream of continuous data.

Overview

This skill describes the core model API of SeisBench, a Python framework for training and applying state-of-the-art machine learning models to seismic waveform data. It highlights the unified WaveformModel interface for annotation and classification, pretrained model access, and practical tips for fast, reliable application. The content is focused on using SeisBench to annotate waveforms, run detections, denoise signals, and estimate depth with minimal friction.

How this skill works

SeisBench exposes an abstract WaveformModel that standardizes two main operations: annotate (returns time-series annotations as an obspy Stream) and classify (returns discrete outputs like picks or detections). The SeisBenchModel layer converts between obspy Stream objects and PyTorch tensors, handles batching, grouping of multi-station traces, and GPU execution when the model is moved to a CUDA device. Pretrained weights are available via from_pretrained and are downloaded and cached on first use.

When to use it

Annotating continuous or event-triggered waveform streams with pick probability time-series.
Running earthquake detection or combined detection+phase picking across multiple stations.
Applying pretrained denoising models to improve SNR before downstream processing.
Estimating depth-related features with depth-prediction models on stacked phases.
Rapid prototyping or transfer learning using available pretrained weights as a starting point.

Best practices

Provide obspy Stream objects; SeisBench will handle grouping, resampling, and batching automatically.
Run models on GPU and increase batch_size to maximize throughput while respecting memory limits.
Compile models with torch.compile (PyTorch 2.0+) for large workloads to gain runtime speedups.
Use the asyncio interfaces annotate_asyncio/classify_asyncio to overlap I/O and compute.
Manually resample large datasets beforehand if you need faster preprocessing than obspy resampling.
If waveform amplitudes are extremely small (<=1e-10), scale them before normalization to avoid numerical issues.

Example use cases

Load a pretrained PhaseNet to annotate pick probability traces from continuous seismic data.
Run EQTransformer to detect events across a regional network and extract pick lists via classify.
Apply DeepDenoiser to improve low-SNR records before running automatic phase picking.
Use DepthPhaseNet pretrained weights to estimate focal depth from depth-phase signatures.
Batch-process a large archive with GPU + large batch_size and asyncio to minimize wall-clock time.

FAQ

Do I need to convert waveforms to tensors before using SeisBench?

No. Pass obspy Stream objects directly; SeisBench converts them to PyTorch tensors and back for you.

How do I get pretrained weights?

Call the model's from_pretrained method (for example PhaseNet.from_pretrained) and weights will be downloaded and cached on first use.

What is the fastest way to process large datasets?

Use a GPU with a large batch_size, compile the model with torch.compile if available, and use the asyncio interfaces to parallelize I/O and compute.