High-fidelity, efficient, and reversible labeling of endogenous proteins using CRISPR-based designer exon insertion

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The reviewers found your description of the new method interesting and potentially useful for the field. They raised concerns about fusion protein's functionality and reagent accessibility among other technical questions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Precise and efficient insertion of large DNA fragments into somatic cells using gene editing technologies to label or modify endogenous proteins remains challenging. Non-specific insertions/deletions (INDELs) resulting from the non-homologous end joining pathway make the process error-prone. Further, the insert is not readily removable. Here, we describe a method called CRISP R-mediated i nsertion of e xon (CRISPIE) that can precisely and reversibly label endogenous proteins using CRISPR/Cas9-based editing. CRISPIE inserts a designer donor module, which consists of an exon encoding the protein sequence flanked by intron sequences, into an intronic location in the target gene. INDELs at the insertion junction will be spliced out, leaving mRNAs nearly error-free. We used CRISPIE to fluorescently label endogenous proteins in mammalian neurons in vivo with previously unachieved efficiency. We demonstrate that this method is broadly applicable, and that the insert can be readily removed later. CRISPIE permits protein sequence insertion with high fidelity, efficiency, and flexibility.

Article activity feed

  1. Author Response:

    Reviewer #1:

    In this manuscript the authors show that a designer exon containing a Fluorescent Protein insert can be used to edit vertebrate genes using an NHEJ based repair mechanism. The approach utilizes CRISPR to generate DSBs in intronic sequences of a target gene along with excision of a donor fragment from a co-transfected plasmid to initiate insertion of the exon cassette by ligation into the chromosome DSB.

    I like the idea here of inserting FP sequences (and other tags) into introns in this way. Focusing on the N- and C-termini for insertions has always seemed arbitrary to me. In practice these internal sites may even tolerate tag insertions better than the termini. However, this remains to be seen.

    My major reservation with this study is that the concepts here are not particularly novel. The approach is very similar to a concept already well established in gene-therapy circles of using introns as targets for inserting a super-exon preceded by a splice acceptor to correct inborn genetic lesions. The methodology employed is essentially HITI (https://www.nature.com/articles/nature20565).

    What is new is the finding that FP insertions are frequently expressed and at least partly functional as evidenced by their ability to localize to the expected intracellular structures. However, no actual functional data is provided in this study so it remains to be seen how frequently the insertion of FP exons is tolerated. It would help the study substantilly to have functional information for a few insertions.

    The value and utility of this study hinges on whether insertions of this type frequently retain function. The authors speculate that "labeling at an internal site of a gene is feasible as long as the insertion does not disrupt the function of the encoded protein. Many introns reside at the junctions of functional domains because introns have evolved in part to facilitate functional domain exchanges (Kaessmann et al., 2002; Patthy, 1999)." Thus an analysis of how often intron tags are tolerated as homozygotes would be helpful for users who will worry that a potentially "quick and dirty" CRISPIE insertion might not accurately report on the function and localization of their protein of interest.

    We thank the reviewer for appreciating our idea. CRISPIE is indeed improved HITI, with the notable difference that the insertion takes place at the intronic region and that a designer intron/exon module is used. This design has a significant benefit in that INDELs in both labeled and unlabeled alleles will be unlikely to cause mutations at the levels of mRNA and proteins. CRISPIE is also different from the super-exon, which is now cited (Bednarski et al, 2016). CRISPIE does not involve the 3’ UTR and the poly A signal. This makes the donor template more standardized and smaller. Transcriptional controls embedded in endogenous introns after the editing sites can be retained in CRISPIE, but not when super-exons are used. We also achieve much higher efficiency in vivo than previous editing methods, which we feel is an important advance.

    We now provide three different experiments to address the function of CRISPIEd β-actin and, in one experiment, the function of CRISPIEd α-tubulin 1B. One of the key functions of the cytoskeleton is to support growth. We now show that neither CRISPIE labeling of β-actin (hACTB), at two different intronic loci, and nor CRISPIE labeling of α-tubulin 1B (TUBA1B) affect the growth of U2OS cells (New Experiment #1; Figure 1H, and Figure 1-figure supplement 4), suggesting that labeled β-actin and α-tubulin are functional. In addition, as suggested, we now demonstrate that cells homozygous for CRISPIE insertions are viable and able to divide (New Experiment #2; Figure 4-figure supplement 1). We also show that two important neuronal functional parameters – the mEPSC frequency and amplitude – are not altered by CRISPIE labeling of hACTB in neurons in cultured hippocampal slices. (New Experiment #3; Figure 5– figure supplement 2).

    Having shown the above results, we also hope to emphasize that, although CRISPIE provides a way to perform FP tagging of endogenous protein with high efficiency and low error rates, it cannot ensure that FP-tagging itself is benign for all proteins. Numerous studies have overexpressed FP-tagged proteins, which is well documented to have side effects. The CRISPIE method empowers researchers by allowing them to tag endogenous proteins without overexpression. However, if the FP-tagging itself affects protein function, CRISPIE will not be helpful. Each FP-tagging project, whether it is based on CRISPIE or other methods, will requires its own systematic characterization. We have now made this clear in the discussion (pg. 17): “… although CRISPIE enables the tagging of endogenous proteins with low error rates, it does not ensure that the tagged protein functions the same as the wild-type protein. Not all tagging is benign, and rigorous characterizations will be needed for each tagging experiment.”

    Other comments:

    1. Were homozygotes identified and were they viable in each instance?

    We now provide data showing that cells homozygous for CRISPIE insertions are viable and able to divide (New Experiment #2; Figure 4-figure supplement 1).

    1. You say: "The CRISPIE method should be broadly applicable for use with different FPs or with other functional domains, different protein targets, and different animal species." I don't know if you optimized your FP to avoid potential reverse strand splice acceptors, but some discussion of this important point should be made so that those trying to apply the approach will make sure that strong acceptors are not included accidentally in reverse oriented inserts.

    Our RT-PCR does not detect reversed inserts at the mRNA level. We now add in the Discussion that donor design needs to eliminate unintended splicing sites in the reverse orientation. We write (pg. 17): “It should also be noted that, when designing the donor template, care should be given to not create unintended splicing acceptor sites in the inverted orientation. Otherwise, inverted insertion events can cause mutations at the mRNA and protein levels.”

    1. Would your mRNA sequencing methodologies detect defective transcripts where the splice acceptor and a portion of the upstream FP exon was inserted causing a frame shifted and mispliced mRNA? Such mRNAs would be unstable due to NMD and thus not detected readily in a PCR based approach. Thus disruption of the mRNA by partial insertion of your donor (or fragments of the other co-injected DNA) might be much more widespread than is measured here. This could be tested by recovering clones that partially inserted the donor in the forward orientation and carefully monitoring for defects in mRNA splicing of the inserted allele. Were such clones detected and how frequently?

    Our method should detect defective mRNAs, if they are not degraded. However, if defective mRNAs are quickly degraded, they are not measured in our current RT-PCR and NGS experiments, as described in Figure 2. While we cannot address this question directly, we now provide evidence that the cell growth and neuronal function after CRISPIE labeling of β-actin remain normal.

    We also thank the reviewer for suggesting the cloning approach. This proposed experiment, however, may potentially be affected if potential defective mRNAs can result in decreased cell survival/growth. Although this experiment will require time beyond the three-month revision period expected by eLife due to the length of time required to clone cells, we will keep this in mind in our future efforts.

    1. You note that in the case of vinculin the coding sequence of the last exon of hVCL was included in the insertion donor sequence, and a stop codon was introduced at the end of the mEGFP coding sequence. This is essentially the strategy for super-exon insertion into targets for gene therapy, instead of a splice donor on the C-terminus you include a stop codon. You should site these previous studies. Inclusion of a stop codon in frame would be expected to cause NMD, did you also include transcription termination signals?

    NMD will happen if the stop codon is further than about approximately 50 nucleotides upstream of any exon-junction complexes (Lewis et al, PNAS 2009). However, NMD won’t occur if it is within 50 nucleotides. For example, synaptophysin – a highly expressed neuronal protein – has its stop codon at its second to last exon within 50 nucleotides of the exon junction. The stop codon we used for labeling hVCL is also within 50 nucleotides (~20 nt) of the exon junction.

    We now cite Bernarski et al, 2016, which describes the use of super-exons in gene therapy. At the same time, we think that our approach is still different from the super-exon concept. After the stop codon, the 3’ UTR is not included. Instead, a splicing donor is included, allowing the exon to be spliced to the subsequent endogenous exon. This allows the insert to remain small for high insertion efficiency and makes it easy to produce the template (some 3’ UTRs can be several kilobase pairs in length), while utilizing the endogenous translational controls built into the native 3’ UTR.

    Reviewer #2:

    In-frame insertion of fluorescent protein tags into endogenous genes allows observation of protein localization at native expression levels, and is therefore an essential approach for quantitative cell biology. Once limited to unicellular model organisms such as yeast, endogenous gene tagging has become well-established in invertebrate model systems such as C. elegans and Drosophila since the advent of CRISPR technology in the last decade. However, a robust and widely accepted endogenous gene tagging strategy for mammalian cells has remained elusive. This is largely due to the fact that homologous recombination, the method used to create knock-ins in invertebrates, is inefficient (or sometimes doesn't work at all) in mammalian cells, especially those that do not divide rapidly.

    Several studies have attempted to bypass the need for homologous recombination by using a different method, non-homologous end joining (NHEJ) to insert GFP tags into vertebrate genomes (e.g. Auer et al. Genome Res 2014; Suzuki et al. Nature 2016; Artegiani et al. Nature Cell Biol. 2020). Such approaches can be orders of magnitude more efficient than homologous recombination, but the generated alleles require careful validation because of the error-prone nature of NHEJ.

    Here, Zhong and colleagues improve upon the existing NHEJ-based gene tagging approaches by designing synthetic exons (comprising a FP coding sequence with 5' and 3' splice sites) that can be inserted into native introns using NHEJ. The beauty of this approach is that any mutations (indels) created by the error-prone NHEJ repair mechanism are spliced out, and therefore do not affect the sequence of the encoded protein. A limitation is that tags must be inserted internally within a protein of interest and cannot be targeted to the extreme N- or C-terminus, but this limitation is clearly stated and discussed by the authors. Overall, this is a novel (to my knowledge) and powerful strategy that is likely to advance the field.

    We thank the reviewer for the very positive comments regarding our CRISPIE method.

  2. Reviewer #2 (Public Review):

    In-frame insertion of fluorescent protein tags into endogenous genes allows observation of protein localization at native expression levels, and is therefore an essential approach for quantitative cell biology. Once limited to unicellular model organisms such as yeast, endogenous gene tagging has become well-established in invertebrate model systems such as C. elegans and Drosophila since the advent of CRISPR technology in the last decade. However, a robust and widely accepted endogenous gene tagging strategy for mammalian cells has remained elusive. This is largely due to the fact that homologous recombination, the method used to create knock-ins in invertebrates, is inefficient (or sometimes doesn't work at all) in mammalian cells, especially those that do not divide rapidly.

    Several studies have attempted to bypass the need for homologous recombination by using a different method, non-homologous end joining (NHEJ) to insert GFP tags into vertebrate genomes (e.g. Auer et al. Genome Res 2014; Suzuki et al. Nature 2016; Artegiani et al. Nature Cell Biol. 2020). Such approaches can be orders of magnitude more efficient than homologous recombination, but the generated alleles require careful validation because of the error-prone nature of NHEJ.

    Here, Zhong and colleagues improve upon the existing NHEJ-based gene tagging approaches by designing synthetic exons (comprising a FP coding sequence with 5' and 3' splice sites) that can be inserted into native introns using NHEJ. The beauty of this approach is that any mutations (indels) created by the error-prone NHEJ repair mechanism are spliced out, and therefore do not affect the sequence of the encoded protein. A limitation is that tags must be inserted internally within a protein of interest and cannot be targeted to the extreme N- or C-terminus, but this limitation is clearly stated and discussed by the authors. Overall, this is a novel (to my knowledge) and powerful strategy that is likely to advance the field.

  3. Reviewer #1 (Public Review):

    In this manuscript the authors show that a designer exon containing a Fluorescent Protein insert can be used to edit vertebrate genes using an NHEJ based repair mechanism. The approach utilizes CRISPR to generate DSBs in intronic sequences of a target gene along with excision of a donor fragment from a co-transfected plasmid to initiate insertion of the exon cassette by ligation into the chromosome DSB.

    I like the idea here of inserting FP sequences (and other tags) into introns in this way. Focusing on the N- and C-termini for insertions has always seemed arbitrary to me. In practice these internal sites may even tolerate tag insertions better than the termini. However, this remains to be seen.

    My major reservation with this study is that the concepts here are not particularly novel. The approach is very similar to a concept already well established in gene-therapy circles of using introns as targets for inserting a super-exon preceded by a splice acceptor to correct inborn genetic lesions. The methodology employed is essentially HITI (https://www.nature.com/articles/nature20565).

    What is new is the finding that FP insertions are frequently expressed and at least partly functional as evidenced by their ability to localize to the expected intracellular structures. However, no actual functional data is provided in this study so it remains to be seen how frequently the insertion of FP exons is tolerated. It would help the study substantilly to have functional information for a few insertions.

    The value and utility of this study hinges on whether insertions of this type frequently retain function. The authors speculate that "labeling at an internal site of a gene is feasible as long as the insertion does not disrupt the function of the encoded protein. Many introns reside at the junctions of functional domains because introns have evolved in part to facilitate functional domain exchanges (Kaessmann et al., 2002; Patthy, 1999)." Thus an analysis of how often intron tags are tolerated as homozygotes would be helpful for users who will worry that a potentially "quick and dirty" CRISPIE insertion might not accurately report on the function and localization of their protein of interest.

    Other comments:

    1. Were homozygotes identified and were they viable in each instance?

    2. You say: "The CRISPIE method should be broadly applicable for use with different FPs or with other functional domains, different protein targets, and different animal species." I don't know if you optimized your FP to avoid potential reverse strand splice acceptors, but some discussion of this important point should be made so that those trying to apply the approach will make sure that strong acceptors are not included accidentally in reverse oriented inserts.

    3. Would your mRNA sequencing methodologies detect defective transcripts where the splice acceptor and a portion of the upstream FP exon was inserted causing a frame shifted and mispliced mRNA? Such mRNAs would be unstable due to NMD and thus not detected readily in a PCR based approach. Thus disruption of the mRNA by partial insertion of your donor (or fragments of the other co-injected DNA) might be much more widespread than is measured here. This could be tested by recovering clones that partially inserted the donor in the forward orientation and carefully monitoring for defects in mRNA splicing of the inserted allele. Were such clones detected and how frequently?

    4. You note that in the case of vinculin the coding sequence of the last exon of hVCL was included in the insertion donor sequence, and a stop codon was introduced at the end of the mEGFP coding sequence. This is essentially the strategy for super-exon insertion into targets for gene therapy, instead of a splice donor on the C-terminus you include a stop codon. You should site these previous studies. Inclusion of a stop codon in frame would be expected to cause NMD, did you also include transcription termination signals?

  4. Evaluation Summary:

    The reviewers found your description of the new method interesting and potentially useful for the field. They raised concerns about fusion protein's functionality and reagent accessibility among other technical questions.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)