LongPhase-S: purity estimation and variant recalibration with somatic haplotying for long-read sequencing

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate detection of somatic variants is crucial for precision oncology, and long-read sequencing offers unprecedented advantages in resolving complex cancer genomes. However, most long-read somatic callers rely on phasing built for a diploid genome, an assumption violated by various contamination, subclonal heterogeneity, and aneuploidy in tumors. We present LongPhase-S, a novel method that jointly reconstructs somatic haplotypes, infers tumor purity, and recalibrates somatic variants in a purity-aware manner for paired tumor-normal long-read sequencing. By anchoring each somatic read to a parental germline lin-eage, LongPhase-S provides a phase-resolved view in which germline and somatic reads are disentangled across the genome. Building on somatic haplotyping, LongPhase-S trains a phase-aware purity estimator that outperformed existing methods. Using eight benchmark datasets comprising six cancer cell lines, including breast, melanoma, and lung cancers, LongPhase-S boosted the accuracy of state-of-the-art somatic callers wuth the estimated purity and somatic haplotypes. Specifically, mean F1 scores increased by 4.5% and 7.1% for single-nucleotide variants and insertions and deletions with ClairS, and by 1.2% and 0.5% with DeepSomatic. Collectively, these results showed that somatic haplotyping is a critical yet missing piece in existing somatic callers, which enables purity-aware and phase-resolved variant interpretation in heterogneous tumors.

Article activity feed