home / skills / gptomics / bioskills / modern-tree-inference

modern-tree-inference skill

/phylogenetics/modern-tree-inference

This skill helps you build publication-quality maximum likelihood phylogenetic trees from alignments using IQ-TREE2 or RAxML-ng with automated model selection.

npx playbooks add skill gptomics/bioskills --skill modern-tree-inference

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
7.5 KB
---
name: bio-phylo-modern-tree-inference
description: Build maximum likelihood phylogenetic trees using IQ-TREE2 and RAxML-ng. Use when inferring publication-quality trees with model selection, ultrafast bootstrap, or partitioned analyses from sequence alignments.
tool_type: cli
primary_tool: IQ-TREE2
---

## Version Compatibility

Reference examples tested with: IQ-TREE 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Modern ML Tree Inference

**"Infer a maximum likelihood tree from my alignment"** → Build publication-quality ML trees with automatic substitution model selection and ultrafast bootstrap support assessment.
- CLI: `iqtree2 -s alignment.fa -m MFP -B 1000` (IQ-TREE2)
- CLI: `raxml-ng --all --msa alignment.fa --model GTR+G --bs-trees 100` (RAxML-ng)

Build maximum likelihood phylogenetic trees with automatic model selection and ultrafast bootstrap.

## IQ-TREE2 Basic Usage

```bash
# Simple ML tree with automatic model selection
iqtree2 -s alignment.fasta -m MFP -B 1000 -T AUTO

# -s: input alignment
# -m MFP: ModelFinder Plus (automatic model selection + tree inference)
# -B 1000: 1000 ultrafast bootstrap replicates (minimum recommended for publication)
# -T AUTO: automatic thread detection
```

## IQ-TREE2 Output Files

| File | Description |
|------|-------------|
| `.treefile` | Best ML tree in Newick format |
| `.iqtree` | Full analysis report with model parameters |
| `.log` | Run log |
| `.contree` | Consensus tree with bootstrap support |
| `.splits.nex` | Bootstrap splits in Nexus format |
| `.model.gz` | Model parameters |
| `.bionj` | Initial BIONJ tree |
| `.mldist` | ML distance matrix |
| `.ckp.gz` | Checkpoint file for resuming |

## Model Selection

```bash
# ModelFinder only (no tree inference)
iqtree2 -s alignment.fasta -m MF

# Use specific model
iqtree2 -s alignment.fasta -m GTR+G4 -B 1000

# Test only specific models
iqtree2 -s alignment.fasta -m MF -mset GTR,HKY,K2P

# Protein models
iqtree2 -s protein.fasta -m MFP -B 1000 -st AA
```

## Common DNA Substitution Models

| Model | Parameters | Use Case |
|-------|------------|----------|
| JC | Equal rates | Very simple, rarely appropriate |
| K2P/K80 | Ti/Tv ratio | Simple, some rate variation |
| HKY | Ti/Tv + base freq | Moderate complexity |
| GTR | 6 rates + base freq | Most general, recommended default |
| +G4 | Gamma rate variation | 4 discrete rate categories |
| +I | Invariant sites | Sites that never change |
| +R4 | FreeRate model | More flexible than Gamma |

## Ultrafast Bootstrap

```bash
# Standard ultrafast bootstrap (UFBoot2)
# B>=1000: Minimum for publication. Use 10000 for final analyses.
iqtree2 -s alignment.fasta -m GTR+G4 -B 1000

# Standard bootstrap (slower but more accurate for small datasets)
iqtree2 -s alignment.fasta -m GTR+G4 -b 100

# SH-aLRT test (fast approximate likelihood ratio test)
iqtree2 -s alignment.fasta -m GTR+G4 -alrt 1000

# Both UFBoot and SH-aLRT
iqtree2 -s alignment.fasta -m GTR+G4 -B 1000 -alrt 1000
```

## Interpreting Bootstrap Values

| UFBoot | SH-aLRT | Interpretation |
|--------|---------|----------------|
| >= 95 | >= 80 | Strong support |
| 80-94 | 70-79 | Moderate support |
| < 80 | < 70 | Weak support |

## Partitioned Analysis

For multi-gene datasets with different evolutionary rates:

```bash
# Create partition file (partitions.nex)
cat > partitions.nex << 'EOF'
#nexus
begin sets;
    charset gene1 = 1-500;
    charset gene2 = 501-1200;
    charset gene3 = 1201-1800;
    charpartition mine = HKY:gene1, GTR:gene2, GTR+G:gene3;
end;
EOF

# Run partitioned analysis
iqtree2 -s concat.fasta -p partitions.nex -m MFP -B 1000

# Edge-linked partition model (proportional branch lengths)
iqtree2 -s concat.fasta -q partitions.nex -m MFP -B 1000

# Edge-unlinked (independent branch lengths per partition)
iqtree2 -s concat.fasta -Q partitions.nex -m MFP -B 1000
```

## RAxML-ng Basic Usage

```bash
# Simple ML tree with GTR+G
raxml-ng --all --msa alignment.fasta --model GTR+G --bs-trees 100

# --all: ML search + bootstrapping
# --msa: input alignment
# --model: substitution model
# --bs-trees: number of bootstrap replicates
```

## RAxML-ng Model Specification

```bash
# DNA models
raxml-ng --msa alignment.fasta --model GTR+G4+I

# Protein models (automatic detection)
raxml-ng --msa protein.fasta --model LG+G8+F

# Check alignment and determine model
raxml-ng --check --msa alignment.fasta --model GTR+G
```

## RAxML-ng Output Files

| File | Description |
|------|-------------|
| `.raxml.bestTree` | Best ML tree |
| `.raxml.support` | Tree with bootstrap support values |
| `.raxml.bootstraps` | All bootstrap trees |
| `.raxml.mlTrees` | All ML trees from search |
| `.raxml.log` | Analysis log |
| `.raxml.rba` | Binary alignment (for restart) |

## RAxML-ng Advanced Options

```bash
# Multiple ML searches (find global optimum)
# --tree pars{10}: 10 starting parsimony trees recommended for thorough search
raxml-ng --msa alignment.fasta --model GTR+G --tree pars{10} --prefix ml_search

# Constrained tree search
raxml-ng --msa alignment.fasta --model GTR+G --tree-constraint constraint.tre

# Site likelihoods for topology tests
raxml-ng --sitelh --msa alignment.fasta --model GTR+G --tree candidate.tre
```

## Comparing IQ-TREE2 vs RAxML-ng

| Feature | IQ-TREE2 | RAxML-ng |
|---------|----------|----------|
| Model selection | Built-in ModelFinder | External (ModelTest-NG) |
| Ultrafast bootstrap | Yes (UFBoot2) | No |
| Standard bootstrap | Yes | Yes |
| Partition models | Extensive | Good |
| Speed | Faster for UFBoot | Faster for standard BS |
| Memory | Lower | Higher |
| Checkpointing | Yes | Yes |

## Large Dataset Strategies

```bash
# IQ-TREE2 with reduced memory
iqtree2 -s large.fasta -m GTR+G -B 1000 -T 4 -mem 8G

# Use approximate NNI search
iqtree2 -s large.fasta -m GTR+G -B 1000 -fast

# RAxML-ng with parsimony starting trees
raxml-ng --msa large.fasta --model GTR+G --tree pars{5} --threads 8
```

## Tree Topology Tests

```bash
# IQ-TREE2: AU test comparing trees
iqtree2 -s alignment.fasta -m GTR+G -z trees.nwk -n 0 -zb 10000 -au

# Output interpretation:
# p-AU < 0.05: Reject tree
# p-AU >= 0.05: Cannot reject tree
```

## Constrained Analysis

```bash
# IQ-TREE2: Enforce monophyly constraint
iqtree2 -s alignment.fasta -m GTR+G -g constraint.tre -B 1000

# Constraint file format (Newick with taxa to constrain):
# ((Human,Chimp),Gorilla);
```

## Complete Workflow Example

```bash
# 1. Check alignment
iqtree2 -s alignment.fasta -m GTR+G -n 0

# 2. Find best model
iqtree2 -s alignment.fasta -m MF -T AUTO

# 3. Full analysis with best model
iqtree2 -s alignment.fasta -m GTR+I+G4 -B 1000 -alrt 1000 -T AUTO

# 4. Visualize result
cat alignment.fasta.treefile
```

## Resuming Interrupted Runs

```bash
# IQ-TREE2: Resume from checkpoint
iqtree2 -s alignment.fasta -m GTR+G -B 1000 --redo-tree

# RAxML-ng: Resume
raxml-ng --msa alignment.fasta --model GTR+G --redo
```

## Reproducibility

```bash
# Set random seed for reproducible results
# seed=12345: Any fixed seed ensures reproducibility across runs
iqtree2 -s alignment.fasta -m GTR+G -B 1000 --seed 12345

raxml-ng --msa alignment.fasta --model GTR+G --seed 12345 --bs-trees 100
```

## Related Skills

- tree-io - Read and convert output tree files
- tree-visualization - Visualize trees with bootstrap support
- distance-calculations - Compare with distance-based methods
- alignment/alignment-io - Prepare alignments for tree inference

Overview

This skill builds publication-quality maximum likelihood phylogenetic trees using IQ-TREE2 and RAxML-ng. It automates model selection, bootstrap support estimation, partitioned analyses, and produces standard output files ready for downstream visualization and tests. Use it to get reproducible ML trees from aligned sequence data with sensible defaults for publication.

How this skill works

The skill runs IQ-TREE2 or RAxML-ng commands on an input alignment, optionally invoking ModelFinder (MFP) for automatic substitution model selection. It supports ultrafast bootstrap (UFBoot2), SH-aLRT, standard bootstrap, partition files, constrained searches, checkpoint resumption, and reproducible seeds. Outputs include best ML trees, consensus/support trees, logs, and model reports for interpretation.

When to use it

  • Inferring a publication-quality ML tree from nucleotide or protein alignments.
  • Running automatic model selection (ModelFinder) before full ML inference.
  • Estimating node support with ultrafast bootstrap or combined UFBoot + SH-aLRT.
  • Performing partitioned analyses for multi-gene/concatenated datasets.
  • Resuming interrupted large analyses or enforcing reproducible seeds.

Best practices

  • Verify installed tool versions (iqtree2 --version, raxml-ng --version) and adapt flags if needed.
  • Run ModelFinder (MFP) for robust model choice, then rerun full inference with the chosen model for final results.
  • Use >=1000 UFBoot replicates for publication; consider 10000 for final confirmation on critical datasets.
  • Include SH-aLRT (e.g., -alrt 1000) alongside UFBoot for complementary support metrics.
  • Set a fixed --seed for reproducible runs and use checkpointing/--redo to resume long jobs.

Example use cases

  • Single-gene ML tree with automatic model selection and 1000 UFBoot: iqtree2 -s alignment.fasta -m MFP -B 1000 -T AUTO.
  • Partitioned concatenated analysis using a partitions.nex file and edge-linked model: iqtree2 -s concat.fasta -p partitions.nex -m MFP -B 1000.
  • Fast ML search and standard bootstrap with RAxML-ng: raxml-ng --all --msa alignment.fasta --model GTR+G --bs-trees 100.
  • Constrained monophyly test and support estimation: iqtree2 -s alignment.fasta -m GTR+G -g constraint.tre -B 1000.
  • Topology tests (AU test) comparing candidate trees: iqtree2 -s alignment.fasta -m GTR+G -z trees.nwk -n 0 -zb 10000 -au.

FAQ

Which tool should I pick, IQ-TREE2 or RAxML-ng?

Use IQ-TREE2 when you want built-in model selection and ultrafast bootstrap. Use RAxML-ng for standard bootstrap workflows or very large datasets where its search heuristics are faster.

How many bootstrap replicates should I run?

For UFBoot, run at least 1000 replicates for publication. Consider 10000 replicates for final confirmation on critical nodes. Standard bootstrap often requires more replicates and more computation time.