Unusual SARS-CoV-2 intrahost diversity reveals lineage superinfection

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has infected almost 200 million people worldwide by July 2021 and the pandemic has been characterized by infection waves of viral lineages showing distinct fitness profiles. The simultaneous infection of a single individual by two distinct SARS-CoV-2 lineages may impact COVID-19 disease progression and provides a window of opportunity for viral recombination and the emergence of new lineages with differential phenotype. Several hundred SARS-CoV-2 lineages are currently well phylogenetically defined, but two main factors have precluded major coinfection/codetection and recombination analysis thus far: (i) the low diversity of SARS-CoV-2 lineages during the first year of the pandemic, which limited the identification of lineage defining mutations necessary to distinguish coinfecting/recombining viral lineages; and the (ii) limited availability of raw sequencing data where abundance and distribution of intrasample/intrahost variability can be accessed. Here, we assembled a large sequencing dataset from Brazilian samples covering a period of 18 May 2020 to 30 April 2021 and probed it for unexpected patterns of high intrasample/intrahost variability. This approach enabled us to detect nine cases of SARS-CoV-2 coinfection with well characterized lineage-defining mutations, representing 0.61 % of all samples investigated. In addition, we matched these SARS-CoV-2 coinfections with spatio-temporal epidemiological data confirming its plausibility with the cocirculating lineages at the timeframe investigated. Our data suggests that coinfection with distinct SARS-CoV-2 lineages is a rare phenomenon, although it is certainly a lower bound estimate considering the difficulty to detect coinfections with very similar SARS-CoV-2 lineages and the low number of samples sequenced from the total number of infections.

Article activity feed

  1. SciScore for 10.1101/2021.09.18.21263755: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    EthicsIRB: All samples used in this work are approved by the Ethics Committee of the Aggeu Magalhaes Institute Ethical Committee—CAAE 32333120.4.0000.5190, the Ethics Committee of Amazonas State University (no. 25430719.6.0000.5016), the Ethics Committee of FIOCRUZ-IOC (68118417.6.0000.5248) and the Brazilian Ministry of Health SISGEN (A1767C3).
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Genome assembly and intrahost variant analysis: The Fastq reads were submitted in an in house workflow available at https://github.com/dezordi/IAM_SARSCOV2 that performs the following steps: The remotion of duplicated reads, adapters and read extremities with less than 20 of phred score quality with the fastp tool 19; A genome assembly guided by reference was performed with BWA 20 mapping reads against the SARS-CoV-2 Wuhan reference genome (NC_045512.2); The consensus genomes were generated with samtools mpileup 21 and iVar 22, using a threshold quality score of 30 and calling SNPs and indels present as major allele frequencies; After the consensus generation, the bam-readcount tool 23 was used to retrieve the proportion of each base (A, C, T, G) present in each position of the bam file and a in house python script (intrahost.py) was used to identify position with putative intra-host variants related to minor allele frequencies following specific rules: The minor variant (MinV) should represent at least 5% of total position depth, and the MinV should have present in reads of both senses (at least 5% in each sense) with at least 100 reads of depth.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Phylogenetic Analysis: A reference alignment was created using MAFFT 27 with the 6,167 genomes, which represents the genomes present in the nextstrain 28 phylogeny with less than 5% of N content on 24th May 2021 and Brazilian genomes obtained through a cd-hit-est 29 clusterization of genomes present on GISAID at 16th March 2021 with high-quality, with more than 99.8% sequence identity and from the same Brazilian state.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This is certainly an underestimation due to the limitation of detecting true coinfection events of earlier low diverging SARS-CoV-2 lineages that dominated the first year of the pandemic. Despite that, considering the lower bound of recorded SARS-CoV-2 cases worldwide until July 2021 were around 190 million (https://coronavirus.jhu.edu/map.html), we can infer that at least 1,1 million patients have been coinfected across the world, which in turn provides a substantial window of opportunity for SARS-CoV-2 recombination events. Moreover, this estimate is certainly downwardly biased because the number of asymptomatic infections is largely not accounted for.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.