home / skills / gptomics / bioskills / binding-site-annotation
This skill helps map CLIP-seq binding sites to transcript features such as 3'UTR, 5'UTR, CDS, introns, and ncRNAs in Python.
npx playbooks add skill gptomics/bioskills --skill binding-site-annotationReview the files below or copy the command above to add this skill to your agents.
---
name: bio-clip-seq-binding-site-annotation
description: Annotate CLIP-seq binding sites to genomic features including 3'UTR, 5'UTR, CDS, introns, and ncRNAs. Use when characterizing where an RBP binds in transcripts.
tool_type: mixed
primary_tool: ChIPseeker
---
## Version Compatibility
Reference examples tested with: bedtools 2.31+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Binding Site Annotation
**"Annotate where my RBP binds in transcripts"** → Map CLIP-seq peaks to genomic features (3'UTR, 5'UTR, CDS, introns, ncRNAs) to characterize RNA-binding protein target regions.
- R: `ChIPseeker::annotatePeak()` with transcript annotation databases
- CLI: `bedtools intersect` with gene model BED files
## Using ChIPseeker (R)
**Goal:** Classify CLIP-seq binding sites by genomic feature (3'UTR, 5'UTR, CDS, intron).
**Approach:** Load peaks and a TxDb transcript database, annotate with annotatePeak, and visualize the feature distribution with a pie chart.
```r
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
peaks <- readPeakFile('peaks.bed')
anno <- annotatePeak(peaks, TxDb = txdb)
plotAnnoPie(anno)
```
## Using BEDTools
```bash
# Annotate to UTRs
bedtools intersect -a peaks.bed -b 3utr.bed -wa -wb > peaks_3utr.bed
```
## Python Annotation
```python
import pandas as pd
def annotate_peaks(peaks_bed, annotation_gtf):
'''Annotate peaks to genomic features'''
# Load peaks and annotations
# Intersect and categorize
pass
```
## Related Skills
- clip-peak-calling - Get peaks
- genome-intervals/interval-arithmetic - Intersect peaks with genomic features
This skill annotates CLIP-seq binding sites to transcriptomic genomic features such as 3'UTR, 5'UTR, CDS, introns, and noncoding RNAs. It provides simple, reproducible patterns for R, Python, and CLI workflows to classify where an RNA-binding protein (RBP) contacts transcripts. Use it to summarize binding distributions, generate feature-specific peak sets, or prepare inputs for downstream motif and enrichment analyses.
The core approach intersects peak coordinates with a gene model or transcript annotation (TxDb/GTF/BED) and assigns each peak to a feature category. In R, ChIPseeker::annotatePeak uses a TxDb object to return feature annotations and offers quick visualizations. On the command line, bedtools intersect pairs peaks with pre-extracted feature BED files (3'UTR, CDS, intron, etc.). A Python pattern loads peak and annotation tables, performs interval joins, and collapses overlapping feature matches into a primary category per peak.
How do I handle peaks overlapping multiple features?
Choose a consistent resolution rule: hierarchical priority (e.g., CDS > UTR > intron), longest-overlap assignment, or report multi-labels. Document the rule and use it throughout analyses.
Which annotation source should I use?
Prefer an annotation matched to your reference genome (TxDb for R, GTF from Ensembl/GENCODE). For reproducibility, record the annotation version and source.