home / skills / gptomics / bioskills / ribosome-periodicity

ribosome-periodicity skill

/ribo-seq/ribosome-periodicity

This skill helps validate Ribo-seq data quality by checking 3-nucleotide periodicity and estimating P-site offsets for accurate downstream analysis.

npx playbooks add skill gptomics/bioskills --skill ribosome-periodicity

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
6.0 KB
---
name: bio-ribo-seq-ribosome-periodicity
description: Validate Ribo-seq data quality by checking 3-nucleotide periodicity and calculating P-site offsets. Use when assessing library quality or determining read offsets for downstream analysis.
tool_type: python
primary_tool: Plastid
---

## Version Compatibility

Reference examples tested with: matplotlib 3.8+, numpy 1.26+, pysam 0.22+, scipy 1.12+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Ribosome Periodicity Analysis

**"Check if my Ribo-seq data shows triplet periodicity"** → Validate Ribo-seq library quality by verifying 3-nucleotide translocation patterns and calculating P-site offsets from metagene profiles.
- Python: `plastid` for P-site offset calculation and metagene analysis

## 3-Nucleotide Periodicity

**Goal:** Verify that Ribo-seq reads exhibit the expected 3-nucleotide translocation pattern characteristic of active translation.

**Approach:** Load P-site mapped reads and compute metagene profiles around start codons to check for triplet periodicity.

Ribosomes move 3 nucleotides per codon. Good Ribo-seq data shows strong periodicity:

```python
from plastid import BAMGenomeArray, FivePrimeMapFactory, GenomicSegment
import numpy as np
import matplotlib.pyplot as plt

# Load aligned reads
alignments = BAMGenomeArray('riboseq.bam', mapping=FivePrimeMapFactory())

# Get metagene around start codons
# Expect strong 3-nt periodicity
```

## Calculate P-site Offset

**Goal:** Determine the optimal P-site offset from the 5' end of ribosome footprints for accurate codon-level positioning.

**Approach:** Run metagene analysis around annotated start codons and identify the offset that aligns the signal peak with the AUG position.

```python
from plastid import metagene_analysis

# The P-site offset varies by read length
# Typically 12-15 nt from 5' end for 28-30 nt reads

def determine_psite_offset(bam_path, annotation_file):
    '''Determine optimal P-site offset from metagene analysis'''
    from plastid import GTF2_TranscriptAssembler, BAMGenomeArray

    # Load annotations
    transcripts = list(GTF2_TranscriptAssembler(annotation_file))

    # Load reads
    alignments = BAMGenomeArray(bam_path, mapping=FivePrimeMapFactory())

    # Metagene around start codons
    # Peak should align with start codon position
    metagene_data = metagene_analysis(
        transcripts,
        alignments,
        upstream=50,
        downstream=100
    )

    return metagene_data
```

## Metagene Plots

**Goal:** Visualize the metagene profile around start codons with frame-colored bars and a periodicity power spectrum.

**Approach:** Plot read counts by reading frame and compute FFT to confirm a dominant period of 3 nucleotides.

```python
def plot_metagene(metagene_data, offset=12):
    '''Plot metagene profile around start codon'''
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Frame 0, 1, 2 around start codon
    positions = np.arange(-50, 100)

    # Plot by frame
    for frame in range(3):
        frame_positions = positions[positions % 3 == frame]
        counts = metagene_data[positions % 3 == frame]
        axes[0].bar(frame_positions, counts, alpha=0.7, label=f'Frame {frame}')

    axes[0].set_xlabel('Position relative to start codon')
    axes[0].set_ylabel('Normalized counts')
    axes[0].legend()
    axes[0].axvline(0, color='red', linestyle='--', label='Start')

    # Periodicity
    from scipy.fft import fft
    fft_result = np.abs(fft(metagene_data))
    freq = np.fft.fftfreq(len(metagene_data))

    axes[1].plot(1/freq[1:len(freq)//2], fft_result[1:len(freq)//2])
    axes[1].set_xlabel('Period (nt)')
    axes[1].set_ylabel('Power')
    axes[1].axvline(3, color='red', linestyle='--')

    plt.tight_layout()
    plt.savefig('periodicity.pdf')
```

## Assess by Read Length

**Goal:** Evaluate 3-nucleotide periodicity strength for each read length to identify the most informative footprint sizes.

**Approach:** Group reads by query length, compute periodicity score per group, and retain lengths with strong triplet signal.

```python
def periodicity_by_length(bam_path, annotation_file):
    '''Calculate periodicity score for each read length'''
    import pysam

    # Group reads by length
    reads_by_length = {}
    with pysam.AlignmentFile(bam_path, 'rb') as bam:
        for read in bam:
            if not read.is_unmapped:
                length = read.query_length
                if length not in reads_by_length:
                    reads_by_length[length] = []
                reads_by_length[length].append(read)

    # Calculate periodicity for each length
    # Good lengths show strong 3-nt periodicity
    results = {}
    for length, reads in reads_by_length.items():
        if len(reads) > 1000:  # Need sufficient reads
            periodicity = calculate_periodicity(reads, annotation_file)
            results[length] = periodicity

    return results
```

## P-site Offset Table

Common P-site offsets by read length (5' end mapping):

| Read Length | P-site Offset |
|-------------|---------------|
| 28 nt | 12 |
| 29 nt | 12 |
| 30 nt | 13 |
| 31 nt | 13 |
| 32 nt | 14 |

## Validate with RiboCode

**Goal:** Run an automated periodicity and ORF detection pipeline as an independent validation of data quality.

**Approach:** Execute RiboCode's one-step command, which internally assesses periodicity and generates diagnostic plots.

```bash
# RiboCode includes periodicity analysis
RiboCode_onestep \
    -g annotation.gtf \
    -r riboseq.bam \
    -f genome.fa \
    -o output_dir

# Check output for periodicity plots
```

## Related Skills

- riboseq-preprocessing - Generate aligned BAM
- orf-detection - Uses P-site offsets
- translation-efficiency - Requires proper positioning

Overview

This skill validates Ribo-seq data quality by checking 3-nucleotide (triplet) periodicity and calculating P-site offsets per read length. It helps you assess whether footprints reflect translating ribosomes and produces metagene profiles, periodicity spectra, and read-length-specific offset recommendations. Use it to guide downstream codon-level analyses and ORF detection.

How this skill works

The skill ingests aligned Ribo-seq reads (BAM) and transcript annotations to compute metagene profiles around annotated start codons. It groups reads by length, computes frame-specific counts, performs an FFT-based periodicity analysis to detect a dominant period of three nucleotides, and identifies P-site offsets that center the metagene peak on the start codon. Outputs include frame-colored metagene plots, periodicity power spectra, and a table of P-site offsets by read length.

When to use it

  • After alignment and adapter trimming to assess library quality before downstream analysis
  • When selecting read lengths to retain for codon-level analyses and ORF calling
  • To determine P-site offsets for accurate codon assignment in translation-efficiency or ORF-detection workflows
  • To validate experimental changes or protocols that may affect footprint length or positioning
  • As an independent check alongside automated tools like RiboCode

Best practices

  • Verify compatible package versions (matplotlib, numpy, pysam, scipy, plastid) and inspect APIs if errors arise
  • Compute metagenes around annotated start codons and require sufficient reads per length (e.g., >1000) before trusting periodicity scores
  • Group reads by query length and report offsets per length rather than using a single global offset
  • Plot frame-separated counts and an FFT power spectrum; confirm a clear peak at period = 3 nt
  • Save diagnostics (plots and offset tables) and use them to filter footprint lengths for downstream pipelines

Example use cases

  • Generate metagene plots to visually confirm triplet periodicity and report a quality-status summary for a new library
  • Compute P-site offset table for each read length and export offsets for use in ORF detection or codon-resolution quantification
  • Compare periodicity strength across experimental conditions or replicates to detect protocol-induced biases
  • Run periodicity analysis per read length to select the subset of footprint sizes with the strongest triplet signal for translation-efficiency calculations

FAQ

What read depth is needed to assess periodicity reliably?

Per read-length analysis typically requires thousands of reads (commonly >1000) to produce stable metagene profiles and FFT signals.

How do I choose the P-site offset for a read length?

Identify the offset that aligns the metagene peak with the annotated start codon; common offsets are ~12–15 nt for 28–32 nt footprints, but verify per dataset.

Can I trust automated tools instead of this analysis?

Automated tools like RiboCode provide independent validation, but visual metagenes and per-length checks are recommended to confirm automated results.