Secondary structure of the SARS-CoV-2 genome is predictive of nucleotide substitution frequency

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This short manuscript uses mutation counts in phylogenies of millions of SARS-CoV-2 genomes to show that mutation rates systematically differ between regions that are paired or unpaired in the predicted RNA secondary structure of the viral genome. Such an effect of pairing state is not unexpected, but its systematic demonstration using millions of viral genomes is valuable and convincing.

This article has been Reviewed by the following groups

Read the full article

Abstract

Accurate estimation of the effects of mutations on SARS-CoV-2 viral fitness can inform public-health responses such as vaccine development and predicting the impact of a new variant; it can also illuminate biological mechanisms including those underlying the emergence of variants of concern 1 . Recently, Lan et al reported a high-quality model of SARS-CoV-2 secondary structure and its underlying dimethyl sulfate (DMS) reactivity data 2 . I investigated whether secondary structure can explain some variability in the frequency of observing different nucleotide substitutions across millions of patient sequences in the SARS-CoV-2 phylogenetic tree 3 . Nucleotide basepairing was compared to the estimated “mutational fitness” of substitutions, a measurement of the difference between a substitution’s observed and expected frequency that is correlated with other estimates of viral fitness 4 . This comparison revealed that secondary structure is often predictive of substitution frequency, with significant decreases in substitution frequencies at basepaired positions. Focusing on the mutational fitness of C → T, the most common type of substitution, I describe C → T substitutions at basepaired positions that characterize major SARS-CoV-2 variants; such mutations may have a greater impact on fitness than appreciated when considering substitution frequency alone.

Article activity feed

  1. eLife assessment

    This short manuscript uses mutation counts in phylogenies of millions of SARS-CoV-2 genomes to show that mutation rates systematically differ between regions that are paired or unpaired in the predicted RNA secondary structure of the viral genome. Such an effect of pairing state is not unexpected, but its systematic demonstration using millions of viral genomes is valuable and convincing.

  2. Reviewer #1 (Public Review):

    Summary:

    This very short paper shows a greater likelihood of C->U substitutions at sites predicted to be unpaired in the SARS-CoV-2 RNA genome, using previously published observational data on mutation frequencies in SARS-CoV-2 (Bloom and Neher, 2023).

    General comments:

    A preference for unpaired bases as a target for APOBEC-induced mutations has been demonstrated previously in functional studies so the finding is not entirely surprising. This of course assumes that A3A or other APOBEC is actually the cause of the majority of C->U changes observed in SARS-CoV-2 sequences.

    I'm not sure why the authors did not use the published mutation frequency data to investigate other potential influences on editing frequencies, such as 5' and 3' base contexts. The analysis did not contribute any insights into the potential mechanisms underlying the greater frequency of C->U (or G->U) substitutions in the SARS-CoV-2 genome.

  3. Reviewer #2 (Public Review):

    Hensel investigated the implications of SARS-CoV-2 RNA secondary structure in synonymous and nonsynonymous mutation frequency. The analysis integrated estimates of mutational fitness generated by Bloom and Neher (from publicly available patient sequences) and a population-averaged model of RNA basepairing from Lan et al (from DMS mutational profiling with sequencing, DMS-MaPseq).

    The results show that base-pairing limits the frequency of some synonymous substitutions (including the most common CT), but not all: GA and AG substitutions seem unaffected by base-pairing.

    The author then addressed nonsynonymous CT substitutions at base-paired positions. While there is still a generally higher estimated mutational fitness at unpaired positions, they propose a coarse adjustment to disentangle base-pairing from inherent mutational fitness at a given position. This adjustment reveals that nonsynonymous substitutions at base-paired positions, which define major variants, have higher mutational fitness.

    Overall, this manuscript highlights the importance of considering RNA secondary structure in viral evolution studies.

    The conclusions of this work are generally well supported by the data presented. Particularly, the author acknowledges most limitations of the analyses, and addresses them. Even though no new sequencing results were generated, the author used available data generated from the analysis of roughly seven million sequenced patient samples. Finally, the author discusses ways to improve the current available models.

    There are a number of limitations of this work that should be highlighted, specifically in regard to the secondary structure data used in this paper. The Lan et al. dataset was generated using a multiplicity of infection (MOI) of 0.05, 24 hours post-infection (h.p.i.). At such a low MOI and late timepoint, viral replication is not synchronous and sequencing artifacts might be generated by cell debris and viral RNA degradation, therefore impacting the population-averaged results. In addition, the nonsynonymous base-paired positions in Figure 2 have relatively high population-averaged DMS reactivity, which suggests those positions are dynamic. Therefore, the proposed adjustment could result in an incorrect estimation of their inherent mutational fitness.

    Additionally, like all such RNA probing experiments within cells, it remains difficult to deconvolve DMS/SHAPE low reactivity with RNA accessibility (e.g. from protein binding).

    This work presents clear methods and an easy-to-access bioinformatic pipeline, which can be applied to other RNA viruses. Of note, it can be readily implemented in existing datasets. Finally, this study raises novel mechanistic questions on how mutational fitness is not correlated to secondary structure in the same way for every substitution.

    Overall, this work highlights the importance of studying mutational fitness beyond an immune evasion perspective. On the other hand, it also adds to the viral intrinsic constraints to immune evasion.