Pansoma, a machine learning tool for identifying somatic variants using pangenome graphs

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Somatic variant calling, the identification of mutations in non-germline cells acquired over an individual’s lifetime, is critical for studying diseases, including cancer, and for developing precision oncology strategies. Traditional somatic variant calling methods rely on linear reference genomes, which do not adequately capture human genetic diversity and result in reference bias, compromising the accuracy of somatic variant detection. The recently developed graph-based human pangenome reference represents diverse genetic variants across human populations and has promised to drive advances in many genetics and genomics studies. In this study, we introduce Pansoma, a novel pangenome-native and machine learning-based tool specifically designed for somatic variant calling using a pangenome graph reference. Pansoma performs somatic variant detection from both short□ and long□read sequencing data by learning tensor representations of alignment on graph nodes rather than on a linear reference. Pansoma outputs variant representations anchored to the pangenome graph paths and conventional somatic variant calls remapped to the linear reference. Additionally, we provide a suite of bioinformatics tools tailored for graph-based genomic data management and analysis of variant calling results. Benchmarking shows that Pansoma improves tumor-only somatic variant detection while preserving graph-specific variant representations that are not directly recoverable from linear-reference outputs.

Article activity feed