SyMetrics: An Integrated Machine Learning Model for Evaluating the Pathogenicity of Synonymous Variants in the Human Genome
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Synonymous single nucleotide variants (sSNVs), traditionally seen as neutral, are now recognized for their biological impact. To assess their relevance, we developed SyMetrics, a framework that integrates predictors of splicing, RNA stability, evolutionary conservation, codon usage, synonymous variation effects, sequence properties, and allele frequency. We analyzed all possible sSNVs across the human genome, and our machine-learning model achieved 97% accuracy in distinguishing deleterious from benign variants, with a ROC-AUC of 0.89, outperforming individual predictors. Our estimates indicate that about 1.98 +/- 0.17% of sSNVs absent from population databases are damaging (roughly 900,000 sSNVs), with an odds ratio of 3.87 for deleteriousness compared to common sSNVs (p < 0.05). To validate predictions, we performed functional assays on selected sSNVs in the AVPR2 gene. In a clinical cohort, we identified 15 predicted deleterious sSNVs in genes linked to patient phenotypes; 9 were classified as (likely) pathogenic while 6 were variants of uncertain significance (VUS) per American College of Medical Genetics guidelines. For three VUS, segregation data supported their suspected inheritance patterns (de novo, X-linked). Our findings underscore the functional importance of sSNVs. To support further research and clinical applications, we provide a Python package and web application for evaluating these variants comprehensively.