ANOMALY: A Snakemake pipeline for identifying NuMTs from Long-Read Sequencing Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Nuclear mitochondrial DNA segments (NuMT) can significantly affect cellular processes, including cancer development and disease progression. Current methods to call NuMTs rely on short-read sequencing data but struggle to resolve complex NuMTs. These limitations can be overcome by employing long-read sequencing data. However, no such workflow exists to capture NuMTs from long-read sequencing data.
Results
Here, we introduce ANOMALY, a novel, easy-to-use workflow for detecting NuMTs from long-read sequencing data. The pipeline takes raw sequencing data or aligned data and calls and visualizes sample NuMTs. On 50 simulated datasets, the pipeline demonstrated high accuracy, with a precision of 1.000, a recall of 0.989, and an F1-score of 0.994. The pipeline underscores the limitations of short-read data in resolving and capturing complex NuMTs while demonstrating that long-read data enables their accurate identification.
Availability and Implementation
The Snakemake pipeline employs Python, Bash and R and is published under an open-source GNU GPL v3 license. Detailed information about setting up and running the pipeline and the source code can be accessed at https://github.com/Nirmal2310/ANOMALY .