Investigating the Origins of SARS-CoV-2: AI-Driven Functional Genomics and Probabilistic Evolutionary Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study integrates machine learning-driven functional genomics and evolutionary probabilistic modeling to investigate the origins of SARS-CoV-2, focusing on nine critical amino acid substitutions in the ORF1ab gene essential for viral replication. Building on Luellen's (2022) findings, these substitutions were identified as predicting 99% of the difference in infectability between 68 species and humans across 365 SARS-CoV-2 variants from 65 global locations, suggesting pandemic potential. The simultaneous occurrence of all nine amino acid substitutions in the highly conserved ORF1ab region of SARS-CoV-2, which enables efficient animal-to-human transmission, underscores the extraordinary spontaneity and pivotal role of these substitutions in the virus's emergence as a human pathogen. Using probabilistic modeling of mutation rate estimates of 1.8 × 10⁻³ for moderately conserved sites, 1 × 10⁻⁴ for highly conserved sites, and a mixed estimate for three moderately conserved site and six highly conserved sites, the analysis revealed that these substitutions would require approximately 15,000, 270,000, and 185,000 years, respectively, to occur naturally. This timeline is incompatible with the documented emergence of SARS-CoV-2 across the last four months of 2019, particularly in the absence of zoonotic intermediates or genetic evidence of stepwise evolution. Despite extensive investigations, no evidence of stepwise zoonotic evolution or intermediate hosts has been identified, unlike in previous coronavirus outbreaks, such as those involving SARS-CoV and MERS-CoV. These findings highlight the improbability of a natural origin under any evolutionary scenario and call for investigation into alternative explanations.