Intra-host evolution during SARS-CoV-2 prolonged infection

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Long-term infection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) represents a challenge to virus dispersion and the control of coronavirus disease 2019 (COVID-19) pandemic. The reason why some people have prolonged infection and how the virus persists for so long are still not fully understood. Recent studies suggested that the accumulation of intra-host single nucleotide variants (iSNVs) over the course of the infection might play an important role in persistence as well as emergence of mutations of concern. For this reason, we aimed to investigate the intra-host evolution of SARS-CoV-2 during prolonged infection. Thirty-three patients who remained reverse transcription polymerase chain reaction (RT-PCR) positive in the nasopharynx for on average 18 days from the symptoms onset were included in this study. Whole-genome sequences were obtained for each patient at two different time points. Phylogenetic, populational, and computational analyses of viral sequences were consistent with prolonged infection without evidence of coinfection in our cohort. We observed an elevated within-host genomic diversity at the second time point samples positively correlated with cycle threshold (Ct) values (lower viral load). Direct transmission was also confirmed in a small cluster of healthcare professionals that shared the same workplace by the presence of common iSNVs. A differential accumulation of missense variants between the time points was detected targeting crucial structural and non-structural proteins such as Spike and helicase. Interestingly, longitudinal acquisition of iSNVs in Spike protein coincided in many cases with SARS-CoV-2 reactive and predicted T cell epitopes. We observed a distinguishing pattern of mutations over the course of the infection mainly driven by increasing A→U and decreasing G→A signatures. G→A mutations may be associated with RNA-editing enzyme activities; therefore, the mutational profiles observed in our analysis were suggestive of innate immune mechanisms of the host cell defense. Therefore, we unveiled a dynamic and complex landscape of host and pathogen interaction during prolonged infection of SARS-CoV-2, suggesting that the host’s innate immunity shapes the increase of intra-host diversity. Our findings may also shed light on possible mechanisms underlying the emergence and spread of new variants resistant to the host immune response as recently observed in COVID-19 pandemic.

Article activity feed

  1. SciScore for 10.1101/2020.11.13.20231217: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Antibodies
    SentencesResources
    Plates were washed five times with PBST then incubated for one hour at room temperature with polyclonal anti-human IgG antibody conjugated to HRP (Promega).
    anti-human IgG
    suggested: None
    Software and Algorithms
    SentencesResources
    Sequencing data processing and analysis: Raw read sequences in FASTQ format were first pre-processed using FastQC (v0.11.4) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for quality control analysis.
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    Next, we used trimmomatic v0.39 (Bolger et al. 2014) for filtering low-quality reads, keeping those with an average quality ≥ 25.
    trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    The sequences were then mapped to the Wuhan-Hu-1 reference genome (NC_045512.2) using the BWA 0.7.17 software (Li and Durbin 2009; Martin 2011)
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Post-processing steps were performed with samtools v1.10 (Li et al. 2009) and picard v2.17.0 packages (http://broadinstitute.github.io/picard/).
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    http://broadinstitute.github.io/picard/
    suggested: (Picard, RRID:SCR_006525)
    After variant calling, a consensus genome sequence was generated using bcftools v1.10.2 and bedtools v2.29.2 packages (Quinlan and Hall 2010; Li 2011a; Li 2011b).
    bcftools
    suggested: (SAMtools/BCFtools, RRID:SCR_005227)
    bedtools
    suggested: (BEDTools, RRID:SCR_006646)
    Next, we combined the GATK and LoFreq results and performed a pairwise variant filtration analysis using the following criteria: (i) average base quality criteria ≥ 15; (ii) allele frequency ≥ 5% and (iii) minimum coverage ≥ 50 in at least one sample of the pair.
    GATK
    suggested: (GATK, RRID:SCR_001876)
    LoFreq
    suggested: (LoFreq, RRID:SCR_013054)
    All variants were annotated using snpEff v4.5 (Cingolani et al. 2012).
    snpEff
    suggested: (SnpEff, RRID:SCR_005191)
    We gathered 135 genome sequences to compose our phylogenetic dataset (GISAID accession numbers are available in Table S7) and used the MAFFT algorithm to build the multiple sequence alignment from the resulting dataset (Katoh and Standley 2013).
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    We assessed virus lineages for the whole dataset using Pangolin (https://pangolin.cog-uk.io) V 2.0.7 software (Rambaut et al. 2020) and checked our sequences for recombination using the full exploratory recombination method in RDP4 (Martin et al. 2015) and by the Phi-test approach (Bruen et al. 2006) in SplitsTree (Huson and Bryant 2006).
    SplitsTree
    suggested: (SplitsTree, RRID:SCR_014734)
    Spike protein epitope analysis: To identify potential T cell S-reactive epitopes we performed an integrative analysis based on Immunology custom tracks available at UCSC Genome Browser.
    UCSC Genome Browser
    suggested: (UCSC Genome Browser, RRID:SCR_005780)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This extremely high frequency of persistent infections may be a caveat of our cohort. Populational studies including asymptomatic individuals in a randomized analysis could lower this proportion. Of note, we have amplified and sequenced the full-length genome of viruses in T2 samples from these patients. The full-length genomes sequenced had all SARS-CoV-2 ORFs intact and consequently, are possibly able to infect cells and transmit to other persons. In fact, the transmission cluster formed by health care workers of the same hospital highlights the preservation of the infectious ability of the virus in T2 samples. Moreover, there is a 13 days interval from the first patient’s symptom onset to that of the last patient, demonstrating the maintenance of the virus’s ability to establish successful prolonged infection events even after large intervals of time. The finding of transmission of intra-host variants between patients of the cluster, meaning that infection doses were large enough to contain low-frequency iSNVs, is essential to understand the nature of SARS-CoV-2 transmission and needs further investigation. Previous studies also reported the spread of the SARS-CoV-2 virus within hospital workers providing evidence for maintaining the prolonged infectious state (Rivett et al. 2020; Sikkema et al. 2020; Suárez-García et al. 2020). From the viewpoint of mutational signatures, our findings may reflect host RNA-editing enzyme activities on the viral genome as a cell defense mec...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.