home / skills / gptomics / bioskills / ribosome-periodicity

ribosome-periodicity skill

safe

This skill helps validate Ribo-seq data quality by checking 3-nucleotide periodicity and estimating P-site offsets for accurate downstream analysis.

npx playbooks add skill gptomics/bioskills --skill ribosome-periodicity

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

6.0 KB

---
name: bio-ribo-seq-ribosome-periodicity
description: Validate Ribo-seq data quality by checking 3-nucleotide periodicity and calculating P-site offsets. Use when assessing library quality or determining read offsets for downstream analysis.
tool_type: python
primary_tool: Plastid
---

## Version Compatibility

Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pysam 0.22+, scipy 1.12+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Ribosome Periodicity Analysis

**"Check if my Ribo-seq data shows triplet periodicity"** → Validate Ribo-seq library quality by verifying 3-nucleotide translocation patterns and calculating P-site offsets from metagene profiles.
- Python: `plastid` for P-site offset calculation and metagene analysis

## 3-Nucleotide Periodicity

**Goal:** Verify that Ribo-seq reads exhibit the expected 3-nucleotide translocation pattern characteristic of active translation.

**Approach:** Load P-site mapped reads and compute metagene profiles around start codons to check for triplet periodicity.

Ribosomes move 3 nucleotides per codon. Good Ribo-seq data shows strong periodicity:

```python
from plastid import BAMGenomeArray, FivePrimeMapFactory, GenomicSegment
import numpy as np
import matplotlib.pyplot as plt

# Load aligned reads
alignments = BAMGenomeArray('riboseq.bam', mapping=FivePrimeMapFactory())

# Get metagene around start codons
# Expect strong 3-nt periodicity
```

## Calculate P-site Offset

**Goal:** Determine the optimal P-site offset from the 5' end of ribosome footprints for accurate codon-level positioning.

**Approach:** Run metagene analysis around annotated start codons and identify the offset that aligns the signal peak with the AUG position.

```python
from plastid import metagene_analysis

# The P-site offset varies by read length
# Typically 12-15 nt from 5' end for 28-30 nt reads

def determine_psite_offset(bam_path, annotation_file):
    '''Determine optimal P-site offset from metagene analysis'''
    from plastid import GTF2_TranscriptAssembler, BAMGenomeArray

    # Load annotations
    transcripts = list(GTF2_TranscriptAssembler(annotation_file))

    # Load reads
    alignments = BAMGenomeArray(bam_path, mapping=FivePrimeMapFactory())

    # Metagene around start codons
    # Peak should align with start codon position
    metagene_data = metagene_analysis(
        transcripts,
        alignments,
        upstream=50,
        downstream=100
    )

    return metagene_data
```

## Metagene Plots

**Goal:** Visualize the metagene profile around start codons with frame-colored bars and a periodicity power spectrum.

**Approach:** Plot read counts by reading frame and compute FFT to confirm a dominant period of 3 nucleotides.

```python
def plot_metagene(metagene_data, offset=12):
    '''Plot metagene profile around start codon'''
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Frame 0, 1, 2 around start codon
    positions = np.arange(-50, 100)

    # Plot by frame
    for frame in range(3):
        frame_positions = positions[positions % 3 == frame]
        counts = metagene_data[positions % 3 == frame]
        axes[0].bar(frame_positions, counts, alpha=0.7, label=f'Frame {frame}')

    axes[0].set_xlabel('Position relative to start codon')
    axes[0].set_ylabel('Normalized counts')
    axes[0].legend()
    axes[0].axvline(0, color='red', linestyle='--', label='Start')

    # Periodicity
    from scipy.fft import fft
    fft_result = np.abs(fft(metagene_data))
    freq = np.fft.fftfreq(len(metagene_data))

    axes[1].plot(1/freq[1:len(freq)//2], fft_result[1:len(freq)//2])
    axes[1].set_xlabel('Period (nt)')
    axes[1].set_ylabel('Power')
    axes[1].axvline(3, color='red', linestyle='--')

    plt.tight_layout()
    plt.savefig('periodicity.pdf')
```

## Assess by Read Length

**Goal:** Evaluate 3-nucleotide periodicity strength for each read length to identify the most informative footprint sizes.

**Approach:** Group reads by query length, compute periodicity score per group, and retain lengths with strong triplet signal.

```python
def periodicity_by_length(bam_path, annotation_file):
    '''Calculate periodicity score for each read length'''
    import pysam

    # Group reads by length
    reads_by_length = {}
    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for read in bam:
            if not read.is_unmapped:
                length = read.query_length
                if length not in reads_by_length:
                    reads_by_length[length] = []
                reads_by_length[length].append(read)

    # Calculate periodicity for each length
    # Good lengths show strong 3-nt periodicity
    results = {}
    for length, reads in reads_by_length.items():
        if len(reads) > 1000:  # Need sufficient reads
            periodicity = calculate_periodicity(reads, annotation_file)
            results[length] = periodicity

    return results
```

## P-site Offset Table

Common P-site offsets by read length (5' end mapping):

| Read Length | P-site Offset |
|-------------|---------------|
| 28 nt | 12 |
| 29 nt | 12 |
| 30 nt | 13 |
| 31 nt | 13 |
| 32 nt | 14 |

## Validate with RiboCode

**Goal:** Run an automated periodicity and ORF detection pipeline as an independent validation of data quality.

**Approach:** Execute RiboCode's one-step command, which internally assesses periodicity and generates diagnostic plots.

```bash
# RiboCode includes periodicity analysis
RiboCode_onestep \
    -g annotation.gtf \
    -r riboseq.bam \
    -f genome.fa \
    -o output_dir

# Check output for periodicity plots
```

## Related Skills

- riboseq-preprocessing - Generate aligned BAM
- orf-detection - Uses P-site offsets
- translation-efficiency - Requires proper positioning

Overview

This skill validates Ribo-seq data quality by checking 3-nucleotide (triplet) periodicity and calculating P-site offsets per read length. It helps you assess whether footprints reflect translating ribosomes and produces metagene profiles, periodicity spectra, and read-length-specific offset recommendations. Use it to guide downstream codon-level analyses and ORF detection.

How this skill works

The skill ingests aligned Ribo-seq reads (BAM) and transcript annotations to compute metagene profiles around annotated start codons. It groups reads by length, computes frame-specific counts, performs an FFT-based periodicity analysis to detect a dominant period of three nucleotides, and identifies P-site offsets that center the metagene peak on the start codon. Outputs include frame-colored metagene plots, periodicity power spectra, and a table of P-site offsets by read length.

When to use it

After alignment and adapter trimming to assess library quality before downstream analysis
When selecting read lengths to retain for codon-level analyses and ORF calling
To determine P-site offsets for accurate codon assignment in translation-efficiency or ORF-detection workflows
To validate experimental changes or protocols that may affect footprint length or positioning
As an independent check alongside automated tools like RiboCode

Best practices

Verify compatible package versions (matplotlib, numpy, pysam, scipy, plastid) and inspect APIs if errors arise
Compute metagenes around annotated start codons and require sufficient reads per length (e.g., >1000) before trusting periodicity scores
Group reads by query length and report offsets per length rather than using a single global offset
Plot frame-separated counts and an FFT power spectrum; confirm a clear peak at period = 3 nt
Save diagnostics (plots and offset tables) and use them to filter footprint lengths for downstream pipelines

Example use cases

Generate metagene plots to visually confirm triplet periodicity and report a quality-status summary for a new library
Compute P-site offset table for each read length and export offsets for use in ORF detection or codon-resolution quantification
Compare periodicity strength across experimental conditions or replicates to detect protocol-induced biases
Run periodicity analysis per read length to select the subset of footprint sizes with the strongest triplet signal for translation-efficiency calculations

FAQ

What read depth is needed to assess periodicity reliably?

Per read-length analysis typically requires thousands of reads (commonly >1000) to produce stable metagene profiles and FFT signals.

How do I choose the P-site offset for a read length?

Identify the offset that aligns the metagene peak with the annotated start codon; common offsets are ~12–15 nt for 28–32 nt footprints, but verify per dataset.

Can I trust automated tools instead of this analysis?

Automated tools like RiboCode provide independent validation, but visual metagenes and per-length checks are recommended to confirm automated results.