Recombinant SARS-CoV-2 genomes circulated at low levels over the first year of the pandemic
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
- Evaluated articles (Rapid Reviews Infectious Diseases)
Abstract
Viral recombination can generate novel genotypes with unique phenotypic characteristics, including transmissibility and virulence. Although the capacity for recombination among betacoronaviruses is well documented, recombination between strains of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has not been characterized in detail. Here, we present a lightweight approach for detecting genomes that are potentially recombinant. This approach relies on identifying the mutations that primarily determine SARS-CoV-2 clade structure and then screening genomes for ones that contain multiple mutational markers from distinct clades. Among the over 537,000 genomes queried that were deposited on GISAID.org prior to 16 February 2021, we detected 1,175 potential recombinant sequences. Using a highly conservative criteria to exclude sequences that may have originated through de novo mutation, we find that at least 30 per cent (n = 358) are likely of recombinant origin. An analysis of deep-sequencing data for these putative recombinants, where available, indicated that the majority are high quality. Additional phylogenetic analysis and the observed co-circulation of predicted parent clades in the geographic regions of exposure further support the feasibility of recombination in this subset of potential recombinants. An analysis of these genomes did not reveal evidence for recombination hotspots in the SARS-CoV-2 genome. While most of the putative recombinant sequences we detected were genetic singletons, a small number of genetically identical or highly similar recombinant sequences were identified in the same geographic region, indicative of locally circulating lineages. Recombinant genomes were also found to have originated from parental lineages with substitutions of concern, including D614G, N501Y, E484K, and L452R. Adjusting for an unequal probability of detecting recombinants derived from different parent clades and for geographic variation in clade abundance, we estimate that at most 0.2–2.5 per cent of circulating viruses in the USA and UK are recombinant. Our identification of a small number of putative recombinants within the first year of SARS-CoV-2 circulation underscores the need to sustain efforts to monitor the emergence of new genotypes generated through recombination.
Article activity feed
-
M Hossain, M. Nazmul Hoque
Review 1: "Recombinant SARS-CoV-2 genomes are currently circulating at low levels"
-
M Hossain, M. Nazmul Hoque
Review of "Recombinant SARS-CoV-2 genomes are currently circulating at low levels"
Reviewer: M Hossain (Dhaka University), M Nazmul Hoque ( Sheikh Mujibur Rahman Agricultural University) | 📒📒📒 ◻️◻️
-
-
-
SciScore for 10.1101/2020.08.05.238386: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Genomes were aligned to the NCBI reference sequence genome using MAFFT v7.464 (Katoh et al., 2013). MAFFTsuggested: (MAFFT, RRID:SCR_011811)Identifying clade-defining SNPs in SARS-CoV-2 genomes: Clades were identified as monophyletic groups within a maximum likelihood phylogenetic tree built from 9783 unique high quality genome sequences with <1% Ns using PhyML (Guindon et al., 2010). PhyMLsuggested: (PhyML, RRID:SCR_014629)Results from OddPub: Thank you for sharing your code.
Resul…SciScore for 10.1101/2020.08.05.238386: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
Software and Algorithms Sentences Resources Genomes were aligned to the NCBI reference sequence genome using MAFFT v7.464 (Katoh et al., 2013). MAFFTsuggested: (MAFFT, RRID:SCR_011811)Identifying clade-defining SNPs in SARS-CoV-2 genomes: Clades were identified as monophyletic groups within a maximum likelihood phylogenetic tree built from 9783 unique high quality genome sequences with <1% Ns using PhyML (Guindon et al., 2010). PhyMLsuggested: (PhyML, RRID:SCR_014629)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-