A direct RNA-seq-based EBV Latency Transcriptome Offers Insights into the Biogenesis of EBV Gene Products
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Epstein-Barr virus (EBV) ubiquitously infects humans, establishing lifelong persistence in B cells. In vitro, EBV-infected B cells can establish a lymphoblastoid cell line (LCL). EBV’s transcripts in LCLs (Latency III) produce six nuclear proteins (EBNAs), two latency membrane proteins (LMPs) and various microRNAs and putative long non-coding RNAs (BARTs). The BART and EBNA transcription units are characterised by extensive alternative splicing.
We generated LCLs with B95-8 EBV-BACs, including one engineered with “barcodes” in the first and last repeat of internal repeat 1 (IR1), and analysed their EBV transcriptomes using long-read nanopore direct RNA-seq. Our pipeline ensures appropriate mapping of the W promoter (Wp) 5’ exon, and corrects W1-W2 exon counts that misalign to IR1. This suggests that splicing across IR1 largely includes all W exons, and that Wp-derived transcripts more frequently encode the EBNA-LP start codon than Cp transcripts. Analysis identified a short variant of exon W2 and a novel polyA site before EBNA2, provided insights into BHRF1 miRNA processing and suggested co-ordination between polyA and splice site usage, although improved read depth and integrity are required to confirm this. The BAC region disrupts the integrity of BART transcripts through premature polyadenylation and cryptic splice sites in the hygromycin expression cassette. Finally, a few transcripts extended across established gene boundaries, running from EBNA, to BART to LMP2 gene regions, sometimes including novel exons between EBNA1 and the BART promoter. We have produced an EBV annotation based on these findings to help others better characterise EBV transcriptomes in future.
Data Summary
Scripts (and the shell scripts used to combine commands into the pipeline), and guidance in their usage are available on Github (github.com/robertewhite/ebv-transcriptomics-tools). All of the RNA-seq raw data and analyses relevant to this study are available from the EMBL Nucleotide Archive Study accession PRJEB83447 (read files ERR14129300-303), or from the author’s website, ebv.org.uk. Processed and analysed data is presented in Excel format in Supplementary tables ST1-8, and the intermediate data processing conducted in excel is linked from the front page of ebv.org.uk, alongside the scripts, raw reads and updated B95-8-BAC and prototype EBV transcriptome annotation (gff3) files.
Impact statement
This article showcases the potential of direct RNA-seq to characterise complex transcriptomes like Epstein-Barr virus, highlights specific challenges of interpreting RNA-seq data, and presents tools to solve specific challenges of mapping RNA-seq reads to EBV exons. Biologically, the data analysis identifies several new aspects of Epstein-Barr virus transcription, offering insights into miRNA proteins, and identifies idiosyncrasies of the transcriptome of the widely used B95-8 BAC that will inform studies using this system. Finally we provide an improved annotation for EBV RNA-seq studies.