CN-RNN: a Deep Learning Framework for Copy Number Variation Detection with Exome Sequencing Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Copy number variations (CNVs) are major structural genomic variants that contribute to a wide range of human diseases. Accurate detection of CNVs from whole-exome sequencing (WES) data has been a long-sought goal for clinical and population genetic studies. Despite recent progress, existing WES-based CNV callers still suffer from high false-positive rates and reduced recall for short-length variants, and current deep learning methods have not fully used complementary information in region-level genomic features. Here we present CN-RNN, a deep learning-based CNV caller for WES data. The model combines a bidirectional long short-term memory (BiLSTM) branch that captures local depth changes and contextual dependencies across neighboring exons with a parallel multi-layer perceptron (MLP) branch that encodes region-level metadata such as GC content, mappability, and exon length. CN-RNN was trained on the Autism Sequencing Consortium (ASC) parent-child trio cohort using the Mendelian rule of inheritance to ensure high-quality training sets. It was evaluated across three independent datasets, in which we showed that CN-RNN outperformed existing WES-based CNV callers and deep learning methods. CN-RNN offers a scalable, accurate tool for CNV profiling in WES-based studies and supports broader application of CNV analysis in population and clinical research. CN-RNN is available at https://github.com/FeifeiXiao-lab/CN-RNN .

Article activity feed