Putative host-derived insertions in the genomes of circulating SARS-CoV-2 variants

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Insertions in the SARS-CoV-2 genome have the potential to drive viral evolution, but the source of the insertions is often unknown. Recent proposals have suggested that human RNAs could be a source of some insertions, but the small size of many insertions makes this difficult to confirm. Through an analysis of available direct RNA sequencing data from SARS-CoV-2 infected cells, we show that viral-host chimeric RNAs are formed through what are likely stochastic RNA-dependent RNA polymerase template switching events. Through an analysis of the publicly available GISAID SARS-CoV-2 genome collection, we identified two genomic insertions in circulating SARS-CoV-2 variants that are identical to regions of the human 18S and 28S rRNAs. These results provide direct evidence of the formation of viral-host chimeric sequences and the integration of host genetic material into the SARS-CoV-2 genome, highlighting the potential importance of host-derived insertions in viral evolution.

IMPORTANCE

Throughout the COVID-19 pandemic, the sequencing of SARS-CoV-2 genomes has revealed the presence of insertions in multiple globally circulating lineages of SARS-CoV-2, including the Omicron variant. The human genome has been suggested to be the source of some of the larger insertions, but evidence for this kind of event occurring is still lacking. Here, we leverage direct RNA sequencing data and SARS-CoV-2 genomes to show host-viral chimeric RNAs are generated in infected cells and two large genomic insertions have likely been formed through the incorporation of host rRNA fragments into the SARS-CoV-2 genome. These host-derived insertions may increase the genetic diversity of SARS-CoV-2 and expand its strategies to acquire genetic materials, potentially enhancing its adaptability, virulence, and spread.

Article activity feed

  1. SciScore for 10.1101/2022.01.04.474799: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Analysis of the expression level of host genes observed in chimeric reads: Gene expression profiles for two SARS-CoV-2 infected Caco-2 cell line samples (GSM4477888, GSM4477889), two SARS-CoV-2 infected Calu-3 cell line samples (GSM4477962, GSM4477963), and three SARS-CoV-2 infected Vero-6 cell line samples (GSM4916368, GSM4916369, GSM4916370) were downloaded from the GEO database.
    Caco-2
    suggested: None
    Calu-3
    suggested: None
    Vero-6
    suggested: None
    Software and Algorithms
    SentencesResources
    Identification of host-virus chimeric reads in SARS-CoV-2 direct-RNA seq data: The nanopore direct RNA-seq data from SARS-CoV-2 infected cell lines were downloaded from the NCBI SRA database (Supplementary Table 1).
    NCBI SRA
    suggested: None
    All reads were quality trimmed using NanoFilt v2.8.034, to remove the first 50 nucleotides of each read and require an average quality score of at least 10 over the length of the read.
    NanoFilt
    suggested: (NanoFilt, RRID:SCR_016966)
    The trimmed reads were then mapped using Minimap2 v2.2335 to the SARS-CoV-2 reference genome (NCBI GenBank accession: NC_045512.2)36, and either a reference Chlorocebus sabaeus transcriptome (ftp://ftp.ensembl.org/pub/release-105/fasta/chlorocebus_sabaeus/) or human transcriptome (ftp://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/).
    Minimap2
    suggested: (Minimap2, RRID:SCR_018550)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.