home / skills / gptomics / bioskills / hisat2-alignment
This skill helps align RNA-seq reads using HISAT2, offering memory-efficient, splice-aware alignment for efficient gene expression workflows.
npx playbooks add skill gptomics/bioskills --skill hisat2-alignmentReview the files below or copy the command above to add this skill to your agents.
---
name: bio-read-alignment-hisat2-alignment
description: Align RNA-seq reads with HISAT2, a memory-efficient splice-aware aligner. Use when STAR's memory requirements are too high or for general RNA-seq alignment.
tool_type: cli
primary_tool: HISAT2
---
## Version Compatibility
Reference examples tested with: samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# HISAT2 RNA-seq Alignment
**"Align RNA-seq reads with HISAT2"** → Map RNA-seq reads to a reference genome with splice-aware alignment. Suitable for gene expression quantification workflows.
- CLI: `hisat2 -x index -1 R1.fq -2 R2.fq | samtools sort -o aligned.bam`
## Build Index
```bash
# Basic index (no annotation)
hisat2-build -p 8 reference.fa hisat2_index
# Index with splice sites and exons (recommended)
hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt
hisat2_extract_exons.py annotation.gtf > exons.txt
hisat2-build -p 8 \
--ss splice_sites.txt \
--exon exons.txt \
reference.fa hisat2_index
```
## Basic Alignment
```bash
# Paired-end reads
hisat2 -p 8 -x hisat2_index \
-1 reads_1.fq.gz -2 reads_2.fq.gz \
-S aligned.sam
# Single-end reads
hisat2 -p 8 -x hisat2_index \
-U reads.fq.gz \
-S aligned.sam
```
## Direct to Sorted BAM
```bash
# Pipe to samtools
hisat2 -p 8 -x hisat2_index \
-1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
samtools index aligned.sorted.bam
```
## Stranded Libraries
```bash
# Forward stranded (e.g., Ligation)
hisat2 -p 8 -x hisat2_index \
--rna-strandness FR \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Reverse stranded (e.g., dUTP, TruSeq - most common)
hisat2 -p 8 -x hisat2_index \
--rna-strandness RF \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Single-end stranded
hisat2 -p 8 -x hisat2_index \
--rna-strandness F \ # or R for reverse
-U reads.fq.gz -S aligned.sam
```
## Novel Splice Junction Discovery
```bash
# Output novel splice junctions
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-outfile novel_splices.txt \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Use known + novel junctions for subsequent alignments
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-infile novel_splices.txt \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
```
## Two-Pass Alignment (Manual)
**Goal:** Improve splice junction sensitivity by discovering novel junctions across all samples in a first pass, then realigning with the combined junction set.
**Approach:** Run HISAT2 on each sample to extract novel splice sites, merge and deduplicate junctions across samples, then realign all samples using the combined junction catalog.
```bash
# Pass 1: Discover junctions from all samples
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-outfile ${base}_splices.txt \
-1 $r1 -2 $r2 -S /dev/null
done
# Combine and filter junctions
cat *_splices.txt | sort -u > combined_splices.txt
# Pass 2: Realign with all junctions
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-infile combined_splices.txt \
-1 $r1 -2 $r2 | \
samtools sort -@ 4 -o ${base}.sorted.bam -
done
```
## Read Group Information
```bash
hisat2 -p 8 -x hisat2_index \
--rg-id sample1 \
--rg SM:sample1 \
--rg PL:ILLUMINA \
--rg LB:lib1 \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
```
## Downstream Quantification
```bash
# Output name-sorted BAM for htseq-count
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -n -@ 4 -o aligned.namesorted.bam -
# Or coordinate-sorted for featureCounts
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
```
## Key Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| -p | 1 | Number of threads |
| -x | - | Index basename |
| --rna-strandness | unstranded | FR/RF/F/R |
| --dta | off | Downstream transcriptome assembly |
| --dta-cufflinks | off | For Cufflinks |
| --min-intronlen | 20 | Minimum intron length |
| --max-intronlen | 500000 | Maximum intron length |
| -k | 5 | Max alignments to report |
## For StringTie/Cufflinks
```bash
# Use --dta for StringTie
hisat2 -p 8 -x hisat2_index \
--dta \
-1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
```
## Alignment Summary
```bash
# HISAT2 prints summary to stderr
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt
```
Example:
```
50000000 reads; of these:
50000000 (100.00%) were paired; of these:
2500000 (5.00%) aligned concordantly 0 times
45000000 (90.00%) aligned concordantly exactly 1 time
2500000 (5.00%) aligned concordantly >1 times
95.00% overall alignment rate
```
## Memory Comparison
| Aligner | Human Genome Memory |
|---------|-------------------|
| STAR | ~30GB |
| HISAT2 | ~8GB |
## Related Skills
- read-alignment/star-alignment - Alternative with more features
- rna-quantification/featurecounts-counting - Count aligned reads
- rna-quantification/alignment-free-quant - Skip alignment entirely
- differential-expression/deseq2-basics - Downstream DE analysis
This skill aligns RNA-seq reads with HISAT2, a memory-efficient splice-aware aligner suited to gene expression workflows. It provides command patterns for indexing (with optional splice/exon annotation), single- and paired-end alignment, piping to samtools for sorted BAM output, and strategies for stranded libraries and two-pass junction discovery. Use this when STAR’s memory needs are prohibitive or when you need a compact, junction-aware aligner.
The skill shows how to build a HISAT2 index from a reference FASTA, optionally incorporating splice site and exon lists extracted from a GTF to improve alignment of spliced reads. It demonstrates direct alignment commands for paired and single reads, streaming output into samtools to produce sorted and indexed BAMs, and flags for stranded libraries, novel splice discovery, and read-group tagging. It also describes a manual two-pass workflow: discover novel junctions across samples, merge them, and realign using the combined junction set to boost splice sensitivity.
How much memory does HISAT2 need compared to STAR?
HISAT2 typically requires around 8 GB for a human genome index, whereas STAR usually needs on the order of ~30 GB.
Should I always build the index with splice and exon files?
It is recommended when you have a reliable annotation: adding splice_sites and exons improves spliced alignment sensitivity and accuracy.