home / skills / gptomics / bioskills / tumor-fraction-estimation

tumor-fraction-estimation skill

Q: What minimum tumor fraction can ichorCNA reliably detect?

Validation indicates ~97–100% detection sensitivity at ≥3% tumor fraction; below 3% is at the limit of detection and requires cautious interpretation.

Q: What input files are required?

You need binned read counts (WIG) generated from BAMs, plus GC and mappability WIGs and a normal panel; ichorCNA expects sWGS data, not targeted panels.

safe

/liquid-biopsy/tumor-fraction-estimation

This skill estimates circulating tumor DNA fraction from shallow whole-genome sequencing using ichorCNA, enabling tumor burden assessment and treatment

npx playbooks add skill gptomics/bioskills --skill tumor-fraction-estimation

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

6.2 KB

---
name: bio-tumor-fraction-estimation
description: Estimates circulating tumor DNA fraction from shallow whole-genome sequencing using ichorCNA. Detects copy number alterations via HMM segmentation and calculates ctDNA percentage. Requires 0.1-1x sWGS coverage. Use when quantifying tumor burden from liquid biopsy or monitoring treatment response.
tool_type: r
primary_tool: ichorCNA
---

## Version Compatibility

Reference examples tested with: CNVkit 0.9+, ichorCNA 0.5+, pandas 2.2+

Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Tumor Fraction Estimation

**"Estimate tumor fraction from my cfDNA data"** → Calculate the proportion of tumor-derived DNA in a liquid biopsy sample using copy number aberrations from shallow whole-genome sequencing.
- R: `ichorCNA` for tumor fraction and CNA estimation from sWGS

Estimate ctDNA tumor fraction from shallow whole-genome sequencing.

## ichorCNA Overview

ichorCNA (GavinHaLab fork, v0.5.1+) detects copy number alterations and estimates tumor fraction from sWGS (0.1-1x coverage).

**Sensitivity:** 97-100% detection at >= 3% tumor fraction (2024 validation)

## Input Requirements

| Requirement | Specification |
|-------------|---------------|
| Data type | sWGS (NOT targeted panel) |
| Coverage | 0.1-1x (0.5x recommended) |
| Input | BAM files |
| Output | Tumor fraction, ploidy, CNA segments |

## Running ichorCNA

```r
library(ichorCNA)

# Step 1: Generate read counts in bins
# Run from command line or use HMMcopy
# readCounter --window 1000000 --quality 20 sample.bam > sample.wig

# Step 2: Run ichorCNA
runIchorCNA(
    WIG = 'sample.wig',
    gcWig = 'gc_hg38_1mb.wig',
    mapWig = 'mappability_hg38_1mb.wig',
    normalPanel = 'pon_median_1mb.rds',
    centromere = 'centromeres_hg38.txt',
    outDir = 'ichor_results/',
    id = 'sample_id',

    # Tumor fraction estimation parameters
    normal = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99),
    ploidy = c(2, 3),
    maxCN = 5,

    # Subclonality
    estimateScPrevalence = TRUE,
    scStates = c(1, 3),

    # Segmentation
    txnE = 0.9999,
    txnStrength = 10000,

    # Chromosomes
    chrs = paste0('chr', c(1:22, 'X'))
)
```

## Batch Processing

**Goal:** Run ichorCNA tumor fraction estimation on a cohort of sWGS samples in parallel, collecting results and handling failures gracefully.

**Approach:** Apply the ichorCNA pipeline to each sample's WIG file using mclapply for parallelization, wrapping each call in tryCatch to report per-sample success or failure.

```r
library(ichorCNA)
library(parallel)

process_sample <- function(wig_file, params) {
    sample_id <- basename(wig_file)
    sample_id <- gsub('.wig$', '', sample_id)

    tryCatch({
        runIchorCNA(
            WIG = wig_file,
            gcWig = params$gcWig,
            mapWig = params$mapWig,
            normalPanel = params$normalPanel,
            centromere = params$centromere,
            outDir = params$outDir,
            id = sample_id,
            normal = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99),
            ploidy = c(2, 3),
            maxCN = 5
        )
        return(list(sample = sample_id, status = 'success'))
    }, error = function(e) {
        return(list(sample = sample_id, status = 'failed', error = e$message))
    })
}

# Run in parallel
wig_files <- list.files('wig/', pattern = '.wig$', full.names = TRUE)
params <- list(
    gcWig = 'gc_hg38_1mb.wig',
    mapWig = 'mappability_hg38_1mb.wig',
    normalPanel = 'pon_median_1mb.rds',
    centromere = 'centromeres_hg38.txt',
    outDir = 'ichor_results/'
)

results <- mclapply(wig_files, process_sample, params = params, mc.cores = 4)
```

## Parsing Results

```r
parse_ichor_results <- function(results_dir) {
    # Find results files
    param_files <- list.files(results_dir, pattern = '.params.txt$',
                              full.names = TRUE, recursive = TRUE)

    results <- data.frame()

    for (f in param_files) {
        params <- read.table(f, header = TRUE, sep = '\t', stringsAsFactors = FALSE)
        sample_id <- gsub('.params.txt$', '', basename(f))

        results <- rbind(results, data.frame(
            sample = sample_id,
            tumor_fraction = 1 - params$n[1],  # n is normal fraction
            ploidy = params$phi[1],
            log_likelihood = params$loglik[1]
        ))
    }

    return(results)
}

# Parse all results
tf_results <- parse_ichor_results('ichor_results/')
print(tf_results)
```

## Python Wrapper

```python
import subprocess
import pandas as pd
from pathlib import Path


def run_ichorcna(wig_file, output_dir, gc_wig, map_wig, normal_panel, centromere):
    '''Run ichorCNA from Python.'''
    sample_id = Path(wig_file).stem

    cmd = f'''
    Rscript -e "
    library(ichorCNA)
    runIchorCNA(
        WIG = '{wig_file}',
        gcWig = '{gc_wig}',
        mapWig = '{map_wig}',
        normalPanel = '{normal_panel}',
        centromere = '{centromere}',
        outDir = '{output_dir}',
        id = '{sample_id}',
        normal = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99),
        ploidy = c(2, 3),
        maxCN = 5
    )
    "
    '''

    subprocess.run(cmd, shell=True, check=True)


def parse_tumor_fraction(params_file):
    '''Parse tumor fraction from ichorCNA output.'''
    df = pd.read_csv(params_file, sep='\t')
    return {
        'tumor_fraction': 1 - df['n'].iloc[0],
        'ploidy': df['phi'].iloc[0],
        'log_likelihood': df['loglik'].iloc[0]
    }
```

## Interpretation

| Tumor Fraction | Interpretation |
|----------------|----------------|
| >= 10% | High ctDNA, reliable detection |
| 3-10% | Moderate ctDNA, detectable |
| < 3% | Low ctDNA, at detection limit |
| 0% | No detectable ctDNA or below LOD |

## Related Skills

- cfdna-preprocessing - Preprocess BAMs before ichorCNA
- fragment-analysis - Complementary fragmentomics analysis
- ctdna-mutation-detection - Mutation detection from panel data
- copy-number/cnvkit-analysis - CNV concepts

Overview

This skill estimates circulating tumor DNA (ctDNA) fraction from shallow whole-genome sequencing (sWGS) using ichorCNA. It detects copy number alterations via HMM segmentation and returns tumor fraction, ploidy, and CNA segments. The pipeline supports single-sample and batch processing and includes parsers to extract results for downstream analysis.

How this skill works

Input is 0.1–1x sWGS read-count data (WIG or binned counts) generated from BAM files. ichorCNA performs GC and mappability correction, HMM-based segmentation, and model selection over candidate normal fractions and ploidies to estimate the tumor fraction. Outputs include parameter files (.params.txt) from which tumor fraction = 1 - normal_fraction is derived, plus segmented CNV calls and quality metrics.

When to use it

Quantifying tumor burden from a cfDNA liquid biopsy (sWGS, not targeted panels)
Monitoring treatment response or minimal residual disease over time
Large-cohort CNV-based ctDNA screening with low-pass WGS
When you have BAMs or binned WIG files at 0.1–1× coverage (0.5× recommended)

Best practices

Use 0.1–1× sWGS coverage; aim for ~0.5× for best sensitivity vs cost.
Generate reliable 1 Mb bins (or supplied gc/map wig files) and use an appropriate normal panel (PON) for your reference genome.
Run a grid of normal fractions and ploidies (e.g., normal = 0.5–0.99, ploidy = 2,3) to let ichorCNA select the best model.
Parallelize samples and wrap runs in try/catch (or subprocess error handling) to collect per-sample status and recover from failures.
Validate results against orthogonal measures or spike-ins for low tumor fractions (<3%), and inspect segmentation plots for unusual artifacts.

Example use cases

Run ichorCNA on a single plasma sample WIG to report ctDNA percentage and ploidy for a clinical study.
Batch-process a cohort of sWGS samples in parallel, aggregate .params.txt files, and build a table of tumor_fraction, ploidy, and log-likelihood per sample.
Integrate into a longitudinal pipeline to track ctDNA changes across treatment timepoints and flag samples crossing detection thresholds.
Wrap ichorCNA calls from Python (Rscript subprocess) to embed tumor fraction estimation in automated QC and reporting workflows.

FAQ

What minimum tumor fraction can ichorCNA reliably detect?

Validation indicates ~97–100% detection sensitivity at ≥3% tumor fraction; below 3% is at the limit of detection and requires cautious interpretation.

What input files are required?

You need binned read counts (WIG) generated from BAMs, plus GC and mappability WIGs and a normal panel; ichorCNA expects sWGS data, not targeted panels.