home / skills / gptomics / bioskills / seq-objects

This skill helps you create and manipulate Biopython Seq, MutableSeq, and SeqRecord objects for in-string construction, in-place edits, and annotated sequence

npx playbooks add skill gptomics/bioskills --skill seq-objects

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
7.1 KB
---
name: bio-seq-objects
description: Create and manipulate Seq, MutableSeq, and SeqRecord objects using Biopython. Use when creating sequences from strings, modifying sequence data in-place, or building annotated sequence records.
tool_type: python
primary_tool: Bio.Seq
---

## Version Compatibility

Reference examples tested with: BioPython 1.83+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Seq Objects

Create and manipulate biological sequence objects using Biopython.

**"Create a sequence object"** → Wrap a raw string in a typed sequence container for biological operations.
- Immutable: `Seq('ATGC')` (BioPython) — string-like, supports complement/translate
- Mutable: `MutableSeq('ATGC')` (BioPython) — supports in-place edits
- Annotated: `SeqRecord(Seq(...), id=...)` (BioPython) — adds metadata for file I/O

## Required Imports

```python
from Bio.Seq import Seq, MutableSeq
from Bio.SeqRecord import SeqRecord
```

## Core Objects

### Seq - Immutable Sequence

The basic sequence object. Immutable like Python strings.

```python
seq = Seq('ATGCGATCGATCG')
```

Seq objects support string-like operations:
```python
len(seq)           # Length
seq[0]             # First base
seq[-1]            # Last base
seq[0:10]          # Slice (returns Seq)
str(seq)           # Convert to string
'ATG' in seq       # Membership test
seq.count('G')     # Count occurrences
seq.find('ATG')    # Find position (-1 if not found)
seq.upper()        # Uppercase
seq.lower()        # Lowercase
seq * 3            # Repeat sequence
seq.strip()        # Remove leading/trailing whitespace
```

### MutableSeq - Mutable Sequence

For in-place modifications when performance matters.

```python
mut_seq = MutableSeq('ATGCGATCG')
mut_seq[0] = 'C'              # Modify single position
mut_seq[0:3] = 'GGG'          # Replace slice
mut_seq.append('A')           # Add to end
mut_seq.insert(0, 'G')        # Insert at position
mut_seq.pop()                 # Remove and return last
mut_seq.remove('G')           # Remove first occurrence
mut_seq.reverse()             # Reverse in place
```

Convert between types:
```python
seq = Seq(mut_seq)            # MutableSeq to Seq
mut_seq = MutableSeq(seq)     # Seq to MutableSeq
```

### SeqRecord - Annotated Sequence

Sequence with metadata for file I/O and analysis.

```python
record = SeqRecord(
    Seq('ATGCGATCG'),
    id='gene1',
    name='example_gene',
    description='An example gene sequence'
)
```

SeqRecord attributes:
```python
record.seq           # The Seq object
record.id            # Identifier string
record.name          # Name string
record.description   # Description string
record.features      # List of SeqFeature objects
record.annotations   # Dict of annotations
record.letter_annotations  # Per-letter annotations (quality scores)
record.dbxrefs       # Database cross-references
```

### SeqRecord Methods

**Goal:** Transform entire records while preserving metadata.

**Approach:** Use SeqRecord methods that return new records with features remapped to new coordinates.

```python
# Reverse complement (preserves ID, updates features)
rc_record = record.reverse_complement(id='gene1_rc', description='reverse complement')

# Translate to protein (creates new SeqRecord with protein)
protein_record = record.translate(id='gene1_protein')

# Quick format output (returns string in file format)
fasta_str = record.format('fasta')
genbank_str = record.format('genbank')
```

Slicing preserves features (adjusted to new coordinates):
```python
# Slice SeqRecord - features are clipped/adjusted automatically
subset = record[10:50]  # Features outside range are dropped
```

## Code Patterns

### Create Seq from String
```python
dna = Seq('ATGCGATCGATCG')
rna = Seq('AUGCGAUCGAUCG')
protein = Seq('MRCRS')
```

### Create SeqRecord for File Output
```python
record = SeqRecord(Seq('ATGCGATCG'), id='seq1', description='My sequence')
```

### Create SeqRecord with Annotations
```python
record = SeqRecord(Seq('ATGCGATCG'), id='gene1', description='Example')
record.annotations['organism'] = 'Homo sapiens'
record.annotations['molecule_type'] = 'DNA'
```

### Build SeqRecord from Parsed Data
```python
from Bio.SeqFeature import SeqFeature, FeatureLocation

record = SeqRecord(Seq('ATGCGATCGATCG'), id='gene1')
feature = SeqFeature(FeatureLocation(0, 9), type='CDS', qualifiers={'product': ['Example protein']})
record.features.append(feature)
```

### Batch Create SeqRecords
```python
sequences = ['ATGC', 'GCTA', 'TTAA']
records = [SeqRecord(Seq(s), id=f'seq_{i}') for i, s in enumerate(sequences)]
```

### Copy a SeqRecord
```python
from copy import deepcopy
new_record = deepcopy(record)
new_record.id = 'modified_copy'
```

### Modify SeqRecord Sequence
```python
record = SeqRecord(Seq('ATGCGATCG'), id='seq1')
record.seq = Seq('GGGGGATCG')  # Replace entire sequence
```

### Join Sequences into One SeqRecord
```python
combined_seq = seq1 + Seq('NNNN') + seq2  # With linker
combined_record = SeqRecord(combined_seq, id='combined')
```

### Transform SeqRecord with reverse_complement
```python
# Reverse complement a gene sequence
record = SeqRecord(Seq('ATGCGATCGATCG'), id='gene1', description='Forward strand')
rc_record = record.reverse_complement(id=f'{record.id}_rc', description='Reverse complement')
# Features are remapped to new coordinates
```

### Translate SeqRecord to Protein
```python
# Translate coding sequence
cds_record = SeqRecord(Seq('ATGCGATCGATCGTAA'), id='cds1', description='Coding sequence')
protein_record = cds_record.translate(id=f'{cds_record.id}_protein', to_stop=True)
```

### Quick Output with format()
```python
record = SeqRecord(Seq('ATGCGATCG'), id='seq1', description='Example sequence')
print(record.format('fasta'))
# >seq1 Example sequence
# ATGCGATCG
```

## Common Errors

| Error | Cause | Solution |
|-------|-------|----------|
| `TypeError: 'Seq' object does not support item assignment` | Trying to modify immutable Seq | Use MutableSeq instead |
| `TypeError: SeqRecord object argument must be a Seq object` | Passed string instead of Seq | Wrap string in Seq() |
| Missing annotations in output | Didn't set required annotations | Add `molecule_type` to annotations for GenBank output |

## Decision Tree

```
Need to work with sequence data?
├── Just doing string-like operations?
│   └── Use Seq
├── Need to modify sequence in-place?
│   └── Use MutableSeq
├── Need metadata (ID, description, features)?
│   └── Use SeqRecord
└── Need to write to file?
    └── Use SeqRecord with appropriate annotations
```

## Related Skills

- sequence-io/read-sequences - Parse files to get SeqRecord objects
- sequence-io/write-sequences - Write SeqRecord objects to files
- transcription-translation - Transform Seq objects (DNA to protein)
- reverse-complement - Get reverse complement of Seq
- sequence-slicing - Slice and extract from Seq/SeqRecord
- database-access/entrez-fetch - Fetch sequences from NCBI as SeqRecords

Overview

This skill provides concise patterns and helpers for creating and manipulating Biopython Seq, MutableSeq, and SeqRecord objects. It focuses on choosing the right container for string-like operations, in-place edits, and building annotated records suitable for file I/O and analysis. Examples cover creation, conversion, slicing, transformation, and common pitfalls.

How this skill works

The skill wraps raw nucleotide or protein strings into Seq or MutableSeq objects and builds SeqRecord containers that hold metadata, features, and per-letter annotations. It shows APIs for in-place edits with MutableSeq, safe conversions between Seq and MutableSeq, and SeqRecord methods that preserve or remap features during operations like slicing, reverse-complement, and translate. The patterns include creating annotated records, batching records, copying, and formatting output for FASTA/GenBank.

When to use it

  • Wrap raw sequence strings for biological operations (use Seq)
  • Make in-place edits or performance-sensitive modifications (use MutableSeq)
  • Attach ids, features, annotations or prepare records for file I/O (use SeqRecord)
  • Transform whole records while preserving metadata (reverse_complement, translate, slicing)
  • Batch-create or combine sequences for downstream export or analysis

Best practices

  • Prefer Seq for immutable, string-like manipulations and safe API behavior
  • Use MutableSeq when you need many in-place changes and then convert to Seq for downstream operations
  • Always wrap raw strings with Seq before placing them in a SeqRecord
  • Set record.annotations['molecule_type'] when preparing GenBank output to avoid missing-data errors
  • Use deepcopy to clone SeqRecord objects when modifying metadata to avoid accidental shared-state

Example use cases

  • Create a SeqRecord with organism and molecule_type annotations for GenBank export
  • Build many SeqRecord objects from a list of raw sequences for batch file writing
  • Edit a long sequence in-place with MutableSeq for performance, then convert to Seq for translation
  • Slice a SeqRecord to extract a sub-region while keeping adjusted features for visualization
  • Reverse-complement a gene SeqRecord and obtain a remapped feature set and new id/description

FAQ

What if I need to change bases but still use SeqRecord methods?

Make edits on a MutableSeq, convert to Seq, then assign record.seq = Seq(modified) so SeqRecord methods (translate, format) work correctly.

Why does item assignment on Seq fail?

Seq is immutable by design. Use MutableSeq for item or slice assignment, then convert back to Seq when done.