ANOMALY: A Snakemake pipeline for identifying NuMTs from Long-Read Sequencing Data

Nirmal Singh Mahar
Rachit Singh
Ishaan Gupta
Shweta Ramdas

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Nuclear mitochondrial DNA segments (NuMT) can significantly affect cellular processes, including cancer development and disease progression. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data.

Results

Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing data or aligned data and calls and visualizes sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification.

Availability and Implementation

The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information about setting up and running the pipeline and the source code can be accessed at https://github.com/Nirmal2310/ANOMALY .

Version published to 10.1101/2025.04.08.647704 on bioRxiv
Apr 15, 2025