Intronization enhances expression of S-protein and other transgenes challenged by cryptic splicing

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The natural habitat of SARS-CoV-2 is the cytoplasm of a mammalian cell where it replicates its genome and expresses its proteins. While SARS-CoV-2 genes and hence its codons are presumably well optimized for mammalian protein translation, they have not been sequence optimized for nuclear expression. The cDNA of the Spike protein harbors over a hundred predicted splice sites and produces mostly aberrant mRNA transcripts when expressed in the nucleus. While different codon optimization strategies increase the proportion of full-length mRNA, they do not directly address the underlying splicing issue with commonly detected cryptic splicing events hindering the full expression potential. Similar splicing characteristics were also observed in other transgenes. By inserting multiple short introns throughout different transgenes, significant improvement in expression was achieved, including >7-fold increase for Spike transgene. Provision of a more natural genomic landscape offers a novel way to achieve multi-fold improvement in transgene expression.

Article activity feed

  1. SciScore for 10.1101/2021.09.15.460454: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Ethicsnot detected.
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Cell Line AuthenticationContamination: All cell lines have tested negative for mycoplasma contamination.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Cell lines used in this study: 293FT cells were obtained from Dr. Kosuke Yusa’s Lab. 293FT.Cas9 cell lines were generated through lentiviral integration of an EF1a-Cas9-T2A-BlastR construct at low MOI to achieve single-copy integration.
    293FT
    suggested: ATCC Cat# PTA-5077, RRID:CVCL_6911)
    To generate cell lines permissive to Spike-Pseudotyped lentiviral infection, 293FT.Cas9 cells were engineered to stably express SARS-CoV-2 receptors ACE2 and TMPRSS2.
    293FT.Cas9
    suggested: None
    For in vitro transcribed mRNA, 293T cells were transfected for functional testing using Lipofectamine messengerMAX (Invitrogen) according to the manufacturer’s instructions.
    293T
    suggested: CCLV Cat# CCLV-RIE 1018, RRID:CVCL_0063)
    293FT.Cas9.ACE2/TMPRSS2 clonal cell lines were harvested by trypsinization and resuspend at a density of 70.000 cells per 30 μL.
    293FT.Cas9.ACE2/TMPRSS2
    suggested: None
    Recombinant DNA
    SentencesResources
    In vitro transcription of S protein mRNA: Templates for in vitro transcription were generated by cloning P1 and P13 between NcoI and NotI sites of pTNT-B18R-6His (addgene plasmid 58979, a kind gift from Steven Dowdy (30)).
    pTNT-B18R-6His
    suggested: RRID:Addgene_58979)
    For transfection, 1 μg of lentiviral transfer vector (pCSGW-GFP), were mixed with 0.72 μg of gag-pol expressing plasmid p8.9 and 68.33 fmol of S protein expressing construct in 500 μL of optiMEM media followed by the addition of 2 μL of PLUS reagent and incubation for 5 minutes at room temperature.
    pCSGW-GFP
    suggested: None
    p8.9
    suggested: None
    These PCR products were both visualized on an agarose gel as well as TA-cloned using ‘TA Cloning Kit with pCR2.1 vector and OneShot TOP10 Chemically Competent E.coli’ (ThermoFisher) according to kit instructions.
    pCR2.1
    suggested: None
    The template used to align the ALKBH5 reads to was constructed manually, by performing in silico Gateway cloning, inserting the ALKBH5 CDS and mRuby3 CDS into the pLIX_403 vector (Addgene plasmid #41395).
    pLIX_403
    suggested: RRID:Addgene_41395)
    Software and Algorithms
    SentencesResources
    Full DNA sequences of these plasmids are found at a Zenodo provided doi: 10.5281/zenodo.5470001. “Wuhan” in plasmid names refers to the S protein DNA sequence from the Wuhan-Hu-1 isolate (Genbank: MN908947.3) while “18F” refers to the removal of the last 18 amino acids of the S protein C terminus (ER retention sequence) and the addition of a FLAG tag.
    Zenodo
    suggested: (ZENODO, RRID:SCR_004129)
    Data was analysed using FlowJo software (BD Biosciences) and displayed as % cells infected at 1:500 dilution of pseudotyped virus, normalized to the intronless construct infection rates (Figure 7).
    FlowJo
    suggested: (FlowJo, RRID:SCR_008520)
    Staining was developed using 20X LumiGLO® Reagent and 20X Peroxide reagents according to manufacturer’s recommendations (Cell Signaling Technology, #7003). cDNA analysis: RNA was extracted from the frozen cell pellets using RNeasy Mini Kit (Qiagen) and treated with ezDNase (ThermoFisher) before applying oligo(dT) guided 1st strand cDNA synthesis using SuperScript IV reverse transcriptase (ThermoFisher), all according to manufactures’ recommendations.
    ThermoFisher
    suggested: (ThermoFisher; SL 8; Centrifuge, RRID:SCR_020809)
    All reads were mapped back to the original construct DNA sequence using SnapGene software to assess individual mRNA splicing events.
    SnapGene
    suggested: (SnapGene, RRID:SCR_015052)
    After another round of bead purification, samples were pooled and submitted to Edinburgh Genomics, where they were further processed, barcoded and run on PromethION platform.
    PromethION
    suggested: (PromethION, RRID:SCR_017987)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The main limitation lies in the quantification of the observed events in comparison to the reads without events which can be influenced by size dependency of some of the sample preparation steps as the full-size S protein cDNA is over 4kb long and many of the observed splicing events can be shorter than 200 bp and depleted by many sample preparation methods. Reported here are distributions based on the full length read population alone within each sample, but care should be applied before comparing these frequencies to other datasets with different sample preparation methods, such as the ChAdOx1 Nanopore RNA direct data (Sup Figure 8). Hundreds of randomly spaced splicing events characterized by the cDNA direct sequencing approach for the S protein and similar events seen in other proteins highlight the fact that even when the transgene’s CDS is apparently well designed and expression of full-length protein can be detected, the status quo design is just not optimal for RNA expression as transcript heterogeneity will inevitably impact both product levels (yield) and homogeneity. The impact of this will be dependent on the nature of the product, whether it is expressed in vitro or in vivo. In vitro expression offers opportunities to improve homogeneity by application of purification methods. Two thirds of cryptically spliced RNA molecules would be out-of-frame and would not just impact protein yield but any translation in vivo would generate novel peptides with potential immuno...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.