home / skills / benchflow-ai / skillsbench / usgs-data-download
/tasks/flood-risk-analysis/environment/skills/usgs-data-download
This skill downloads USGS water level and discharge data using the dataretrieval nwis module to streamline real-time and historical analyses.
npx playbooks add skill benchflow-ai/skillsbench --skill usgs-data-downloadReview the files below or copy the command above to add this skill to your agents.
---
name: usgs-data-download
description: Download water level data from USGS using the dataretrieval package. Use when accessing real-time or historical streamflow data, downloading gage height or discharge measurements, or working with USGS station IDs.
license: MIT
---
# USGS Data Download Guide
## Overview
This guide covers downloading water level data from USGS using the `dataretrieval` Python package. USGS maintains thousands of stream gages across the United States that record water levels at 15-minute intervals.
## Installation
```bash
pip install dataretrieval
```
## nwis Module (Recommended)
The NWIS module is reliable and straightforward for accessing gage height data.
```python
from dataretrieval import nwis
# Get instantaneous values (15-min intervals)
df, meta = nwis.get_iv(
sites='<station_id>',
start='<start_date>',
end='<end_date>',
parameterCd='00065'
)
# Get daily values
df, meta = nwis.get_dv(
sites='<station_id>',
start='<start_date>',
end='<end_date>',
parameterCd='00060'
)
# Get site information
info, meta = nwis.get_info(sites='<station_id>')
```
## Parameter Codes
| Code | Parameter | Unit | Description |
|------|-----------|------|-------------|
| `00065` | Gage height | feet | Water level above datum |
| `00060` | Discharge | cfs | Streamflow volume |
## nwis Module Functions
| Function | Description | Data Frequency |
|----------|-------------|----------------|
| `nwis.get_iv()` | Instantaneous values | ~15 minutes |
| `nwis.get_dv()` | Daily values | Daily |
| `nwis.get_info()` | Site information | N/A |
| `nwis.get_stats()` | Statistical summaries | N/A |
| `nwis.get_peaks()` | Annual peak discharge | Annual |
## Returned DataFrame Structure
The DataFrame has a datetime index and these columns:
| Column | Description |
|--------|-------------|
| `site_no` | Station ID |
| `00065` | Water level value |
| `00065_cd` | Quality code (can ignore) |
## Downloading Multiple Stations
```python
from dataretrieval import nwis
station_ids = ['<id_1>', '<id_2>', '<id_3>']
all_data = {}
for site_id in station_ids:
try:
df, meta = nwis.get_iv(
sites=site_id,
start='<start_date>',
end='<end_date>',
parameterCd='00065'
)
if len(df) > 0:
all_data[site_id] = df
except Exception as e:
print(f"Failed to download {site_id}: {e}")
print(f"Successfully downloaded: {len(all_data)} stations")
```
## Extracting the Value Column
```python
# Find the gage height column (excludes quality code column)
gage_col = [c for c in df.columns if '00065' in str(c) and '_cd' not in str(c)]
if gage_col:
water_levels = df[gage_col[0]]
print(water_levels.head())
```
## Common Issues
| Issue | Cause | Solution |
|-------|-------|----------|
| Empty DataFrame | Station has no data for date range | Try different dates or use `get_iv()` |
| `get_dv()` returns empty | No daily gage height data | Use `get_iv()` and aggregate |
| Connection error | Network issue | Wrap in try/except, retry |
| Rate limited | Too many requests | Add delays between requests |
## Best Practices
- Always wrap API calls in try/except for failed downloads
- Check `len(df) > 0` before processing
- Station IDs are 8-digit strings with leading zeros (e.g., '04119000')
- Use `get_iv()` for gage height, as daily data is often unavailable
- Filter columns to exclude quality code columns (`_cd`)
- Break up large requests into smaller time periods to avoid timeouts
This skill downloads water level and streamflow data from USGS using the Python dataretrieval package. It focuses on retrieving instantaneous (15-minute) gage height and daily discharge, handling multiple stations, and returning clean pandas DataFrames ready for analysis. It is designed for realtime and historical water-level workflows using USGS station IDs.
The skill uses dataretrieval.nwis functions (get_iv, get_dv, get_info, get_stats, get_peaks) to request data by station ID, date range, and parameter code (e.g., 00065 for gage height, 00060 for discharge). It inspects returned DataFrames for content, filters out quality-code columns (suffix _cd), and collects results for single or multiple sites while handling exceptions and retries. The output is a datetime-indexed pandas DataFrame keyed by site_no containing the requested measurements.
What parameter code should I use for water level?
Use parameter code 00065 for gage height (water level). For discharge use 00060.
Why is the DataFrame empty for a station?
An empty DataFrame usually means no observations exist for the requested date range or the station does not measure that parameter; try a different date range or use get_iv instead of get_dv.