ANOMALY: A Snakemake pipeline for identifying NuMTs from Long-Read Sequencing Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Nuclear mitochondrial DNA segments (NuMT) can significantly affect cellular processes, including cancer development and disease progression. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data.

Results

Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing data or aligned data and calls and visualizes sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification.

Availability and Implementation

The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information about setting up and running the pipeline and the source code can be accessed at https://github.com/Nirmal2310/ANOMALY .

Article activity feed