A Genome-Wide Codon-Permissiveness Framework Uncovers Spike-Centric Escape Hotspots and Distal Epistatic Couplings Across SARS-CoV-2 Structural Proteins
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Immune escape mutations in SARS-CoV-2 are not randomly distributed, yet current methods for prioritizing functionally consequential residues remain heavily biased by prior experimental data or literature-curated escape maps. To overcome this limitation, we introduce a fully de novo, data-driven framework that identifies evolutionarily pivotal sites using only codon usage constraints across 9.4 million high-quality SARS-CoV-2 genomes (2020-2025). We integrated six orthogonal codon bias metrics into a unified CUB6 Metric Suite (C6MS) to compute a Codon-Permissiveness Score (CPS) for every residue in the receptor-binding domain (RBD). By combining CPS with observed mutational frequency, we mapped high-permissiveness, high-mutation residues onto the hACE2 interface (PDB: 6M0J; ≤5 Å cutoff). This revealed a core set of 12 key residues including F486, L452, and K444 that form a statistically robust intra-Spike epistatic network (χ² p < 1×10⁻¹⁵ mutual information > 0.8) and exhibit accelerated global frequency increases from 2020 to 2025. Notably, N450 which is a site absent from conventional experimental escape maps displays high codon-permissiveness (Shannon entropy = 0.19) and has accumulated 13 distinct mutations, predominantly L450N (97.1%) and L450D (2.8%), indicating active, evolutionarily stable diversification. In contrast, residues like G447 and V483 now show low entropy due to near-fixation (N447G: 99.998%; E483V: 99.95%), yet their rapid global sweeps confirm they were critical permissive hotspots during earlier immune escape waves. All three surpassed 15% global frequency by early 2025 and continue to shape emerging variant fitness. Strikingly, while immune escape remains predominantly modular and confined to Spike, our analysis detects recurrent co-occurrence between non-RBD Spike variants and Membrane: D3G which is likely reflecting shared lineage history. In contrast, high-permissiveness RBD residues (e.g., N450, L452) show no such dependencies, underscoring their evolutionary autonomy. This insight transforms therapeutic strategy: monoclonal antibodies (mAbs) targeting autonomous, codon-permissive sites like N450 can be engineered based solely on local conformational plasticity and predicted mutational spectra, dramatically simplifying development and extending therapeutic shelf-life. By proactively accommodating evolutionary trajectories (e.g., L450N/D), even with modest affinity trade-offs, we shift mAb design from reactive to predictive now informed not only by local Spike plasticity but also by emerging signals of genome-wide epistatic constraints. Our framework, requiring no prior experimental annotation, defines a Codon-Permissive Epistatic Backbone (CpEB) that explains variant success, enables evolution-informed surveillance, and is immediately generalizable to other pathogens, including H5N1.