SARS-CoV-2-Host Chimeric RNA-Sequencing Reads Do Not Necessarily Arise From Virus Integration Into the Host DNA

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The human genome bears evidence of extensive invasion by retroviruses and other retroelements, as well as by diverse RNA and DNA viruses. High frequency of somatic integration of the RNA virus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) into the DNA of infected cells was recently suggested, based on a number of observations. One key observation was the presence of chimeric RNA-sequencing (RNA-seq) reads between SARS-CoV-2 RNA and RNA transcribed from human host DNA. Here, we examined the possible origin specifically of human-SARS-CoV-2 chimeric reads in RNA-seq libraries and provide alternative explanations for their origin. Chimeric reads were frequently detected also between SARS-CoV-2 RNA and RNA transcribed from mitochondrial DNA or episomal adenoviral DNA present in transfected cell lines, which was unlikely the result of SARS-CoV-2 integration. Furthermore, chimeric reads between SARS-CoV-2 RNA and RNA transcribed from nuclear DNA were highly enriched for host exonic, rather than intronic or intergenic sequences and often involved the same, highly expressed host genes. Although these findings do not rule out SARS-CoV-2 somatic integration, they nevertheless suggest that human-SARS-CoV-2 chimeric reads found in RNA-seq data may arise during library preparation and do not necessarily signify SARS-CoV-2 reverse transcription, integration in to host DNA and further transcription.

Article activity feed

  1. SciScore for 10.1101/2021.03.05.434119: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    RNA-seq analysis: Public RNA-seq datasets (Blanco-Melo et al., 2020) under the accession number GSE147507 were downloaded from NCBI Gene Expression Omnibus (GEO) server.
    Gene Expression Omnibus
    suggested: (Gene Expression Omnibus (GEO, RRID:SCR_005012)
    Adapter and quality trimming were conducted using Trimmomatic v0.36 (Bolger et al., 2014).
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    Quality of sequencing reads was assessed by FastQC v0.11.5.
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    The resulted reads were aligned to the merged GRCh38/hg38 genome (including alternative and random chromosome sequences) and SARS-CoV-2 NC_045512v2 genome using STAR v2.7.1 aligner (Dobin et al., 2013).
    STAR
    suggested: (STAR, RRID:SCR_015899)
    GENCODE v29 basic version and wihCor1 NCBI genes (http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/bigZips/genes/) were used for human and SARS-CoV-2 gene annotations respectively.
    GENCODE
    suggested: (GENCODE, RRID:SCR_014966)
    Gene expression was calculated by FeatureCounts (part of the Subread package v1.5.0) (Liao et al., 2014) and normalized with DESeq2 v1.22.1 within R v3.5.1 (Love et al., 2014).
    FeatureCounts
    suggested: (featureCounts, RRID:SCR_012919)
    Subread
    suggested: (Subread, RRID:SCR_009803)
    DESeq2
    suggested: (DESeq, RRID:SCR_000154)
    BLASTN+ v2.3.0 was used to align mtRNA-nRNA chimeric reads to identify mitochondrial and nuclear aligning sequences within the reads (Camacho et al., 2009).
    BLASTN+
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.