A deep learning framework for building INDEL mutation rate maps
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Germline short insertions and deletions (INDELs) are pervasive genetic variants that shape genome evolution and contribute to human disease. However, accurately quantifying fine-scale INDEL mutation rates remains challenging due to data limitations and the diversity of INDEL subtypes. Here, we present MuRaL-indel, a deep learning framework that predicts germline INDEL mutation rates by leveraging long-range sequence context through a U-Net architecture. Using extensive rare variant data from large population cohorts, MuRaL-indel generates base-resolution, length-specific mutation rate maps for the human genome and achieves superior accuracy compared with existing models across multiple genomic scales. We successfully apply MuRaL-indel to three non-human species ( Macaca mulatta , Drosophila melanogaster , and Arabidopsis thaliana ), demonstrating its broad applicability across taxa. Using the predicted mutation rate maps, we reveal the mutational landscape around human coding genes and show that MuRaL-indel–derived constraint scores better prioritize pathogenic INDELs than previous models. Through deep learning interpretability analyses, we uncovered sequence motifs—including both repeat and non-repeat elements—associated with elevated INDEL mutability, providing insights into underlying mutational mechanisms. Together, MuRaL-indel establishes a generalizable and scalable framework for building high-resolution INDEL mutation rate maps, offering a valuable resource for studies of genome evolution, mutational mechanism, variant interpretation, and genetic disease.