Long-Read RNA Sequencing Identifies Polyadenylation Elongation and Differential Transcript Usage of Host Transcripts During SARS-CoV-2 In Vitro Infection

This article has been Reviewed by the following groups

Read the full article

Abstract

Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) using Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analyzed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~101 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2 , which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.

Article activity feed

  1. SciScore for 10.1101/2021.12.14.472725: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Additional datasets were generated for this study including PCR cDNA datasets for cell lines (Vero, Caco-2, Calu-3 and A549) and the direct RNA and direct cDNA datasets for A549.
    Vero
    suggested: None
    Calu-3
    suggested: None
    For this current study, we additionally cultured A549 (human lung carcinoma epithelial – ATCC CCL-185) cells to supplement our main data, using similar methods.
    A549
    suggested: None
    Briefly, RNA from mock control and infected cells harvested at 0, 2, 24 and 48 hpi from Caco-2, Calu-3 and Vero cells was sequenced with the ONT Direct cDNA Sequencing Kit (SQK-DCS109) in conjunction with the Native Barcoding Kit (EXP-NBD104)
    Caco-2
    suggested: None
    Software and Algorithms
    SentencesResources
    availability: ONT sequencing data (direct RNA and direct cDNA) for this study from cell lines (Vero, Caco-2 and Calu-3) was derived from our previous work (Chang et al., 2021), and is currently publicly available at NCBI repository BioProject PRJNA675370
    BioProject
    suggested: (NCBI BioProject, RRID:SCR_004801)
    The results of individual analyses are available at Figtree DOI: 10.6084/m9.figshare.17139995 (differential expression), 10.6084/m9.figshare.16841794 (differential polyadenylation) and 10.6084/m9.figshare.17140007 (differential transcript usage).
    Figtree
    suggested: (FigTree, RRID:SCR_008515)
    All direct RNA and direct cDNA libraries were loaded onto a R9.4.1 MinION flow cell and sequenced for 72 hrs using an ONT MinION or GridION.
    MinION
    suggested: (MinION, RRID:SCR_017985)
    All resulting FASTQ data were mapped using Minimap2 v2.17 (Heng Li, 2018)
    Minimap2
    suggested: (Minimap2, RRID:SCR_018550)
    Direct RNA-seq data was mapped to the combined genome (consisting of human/African green monkey genome from Ensembl (release 100), SARS-CoV-2 Australia virus (Australia/VIC01/2020, NCBI:MT007544.1) and the RNA sequin decoy chromosome genome (Hardwick et al., 2016) with the default direct RNA parameters ‘-ax splice -uf -k14 --secondary=no’ and for all cDNA datasets ‘-ax splice –secondary=no’.
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    The resulting BAM files were sorted and indexed using Samtools v1.9 (H.
    Samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    Counts files were generated using Featurecounts v2.0.0 (Liao, Smyth, & Shi, 2014) for genome-mapped cDNA data, and with Salmon v0.13.1 (Patro
    Featurecounts
    suggested: (featureCounts, RRID:SCR_012919)
    Salmon
    suggested: (Salmon, RRID:SCR_017036)
    Differential expression analysis: DESeq2 was used to identify differentially expressed genes/transcripts from direct cDNA data.
    DESeq2
    suggested: (DESeq, RRID:SCR_000154)
    For the nanopolish analysis, all Caco-2, Calu-3 and Vero direct RNA BAM files mapped to the combined reference genome (host, sequin, virus) were indexed with the nanopolish v0.13.2 ‘index’ function with the command ‘nanopolish index -d $FAST5 -s $SEQUENCING_SUMMARY $FASTQ’.
    nanopolish
    suggested: (Nanopolish, RRID:SCR_016157)
    Raincloud plots were generated for median poly(T) lengths of each gene with increased poly(A) length in the Calu-3 48 hpi dataset in both conditions (control and infected) using using ggplot2 v3.3.4 (Wickham, 2016) to replicate the raincloud plots generated by the raincloudplots package in R (Allen et al., 2021).
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    GO and KEGG pathway analysis: Significant biological GO biological terms and KEGG pathways were identified with genes that were found to be significantly differentially expressed and polyadenylated in the analyses above.
    KEGG
    suggested: (KEGG, RRID:SCR_012773)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.