Genomic features underlie the co-option of SVA transposons as cis-regulatory elements in human pluripotent stem cells

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

Domestication of transposable elements (TEs) into functional cis-regulatory elements is a widespread phenomenon. However, the mechanisms behind why some TEs are co-opted as functional enhancers while others are not are underappreciated. SINE-VNTR-Alus (SVAs) are the youngest group of transposons in the human genome, where ~3,700 copies are annotated, nearly half of which are human-specific. Many studies indicate that SVAs are among the most frequently co-opted TEs in human gene regulation, but the mechanisms underlying such processes have not yet been thoroughly investigated. Here, we leveraged CRISPR-interference (CRISPRi), computational and functional genomics to elucidate the genomic features that underlie SVA domestication into human stem-cell gene regulation. We found that ~750 SVAs are co-opted as functional cis-regulatory elements in human induced pluripotent stem cells. These SVAs are significantly closer to genes and harbor more transcription factor binding sites than non-co-opted SVAs. We show that a long DNA motif composed of flanking YY1/2 and OCT4 binding sites is enriched in the co-opted SVAs and that these two transcription factors bind consecutively on the TE sequence. We used CRISPRi to epigenetically repress active SVAs in stem cell-like NCCIT cells. Epigenetic perturbation of active SVAs strongly attenuated YY1/OCT4 binding and influenced neighboring gene expression. Ultimately, SVA repression resulted in ~3,000 differentially expressed genes, 131 of which were the nearest gene to an annotated SVA. In summary, we demonstrated that SVAs modulate human gene expression, and uncovered that location and sequence composition contribute to SVA domestication into gene regulatory networks.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    We plan to address the minor comments from both reviewers as here described:

    Reviewer 1 – comments____:

    1. Figure 1a + 1b: The pattern of H3K9me3 at SVA elements appears to be quite different between the two cell types tested - in iPSCs it appears to be more strongly centered on the SVA with some "spreading" occurring upstream of the element, however in NCCITs it appears to mark downstream of the element and is not clearly seen to occur on the SVA. Is this due to sequencing data from different labs or due to cell - type specific epigenetic marks? The authors should consider showing H3K9me3 marks at a comparable region for example repressed genes or another family of repressed TEs as an internal control as well as showing profile plots to more clearly show the spread of epigenetic marks over the SVAs. As recommended by the Reviewer, in Figure 1 we will show several examples of H3K9me3 marked SVAs in both cell types. We will propose explanations for discrepancies if needed, but acknowledge the possibility of a sequencing artifact, as suggested by the Reviewer.

    Figure 1: Given that the data in this paper is mostly from NCCIT cells it would be advisable for conclusions in hiPSCs to be tempered accordingly.

    __We will temper the conclusions as recommended. __

    Figure 1c: Which SVAs have neither H3K9me3 nor H3K27ac, what might these represent? Do they have any other notable features, or are they of a specific age?

    __The (very few) SVAs that have neither of those markers are almost certainly an artifact of mappability issues. We will point this out in the manuscript, including both the text and the figure legend. __

    Line 128-129: Is the assumption that SVAs are enriched in these region as opposed to dictating the epigenetic landscape as a consequence of their own sequence, is this correct?

    __This is correct, and we will clarify this in the text. __

    Figure 2c: Are there any enriched motifs in the repressed SVAs?

    __We performed this analysis, and several motifs are enriched in the repressed SVAs. We will add this data in the revised manuscript and in Fig. 2C. __

    Figure 2c: How many of the de-repressed SVAs contain this motif? It would be interesting to know if the YY1/OCT4 binding sites exist within any of the repressed SVAs and then, later, whether they accordingly lack actual binding of the TFs due to the presence of H3K9me3.

    __We will perform this straightforward analysis using the MEME-suit or HOMER, and include it in the revised version. __

    Figure 3d: It would be interesting to assess how close the differentially expressed genes are to an LTR5H element, or however many of the SVA-proximal and differentially expressed genes are also LTR5H-proximal?

    __We will perform this straightforward analysis and include it in the revised version. __

    Figure 3e: Are genes close only to de-repressed SVAs considered here? Worth specifying in the text. Also, what are the 20% of genes which are upregulated?

    __Yes, the reviewer is correct, only genes close to de-repressed SVAs are considered. We will specify that in the manuscript, and also add a results paragraph on the 20% of genes that are upregulated. __

    Figure 4a: Are the binding motifs for YY1 present at both binding sites? Are they the same, is the surrounding sequence the same? Are either of the binding sites present in repressed SVAs/is there any detectable binding of SVA/OCT4 in repressed SVAs? Are the YY1/OCT4 bound SVAs also those marked with H3K27ac in the Wysocka Lab dataset?

    __The YY1-OCT4 motif is present only where both of the factors bind together. We will specify this in the manuscript (results section). As for the second question: yes these correspond to the regions marked by H3K27ac in the Wysocka dataset. We will clarify this in the manuscript. __

    Figure 4c: Example used is a SVA-D element, are the YY1/OCT4-bound SVAs within the de-repressed group of a specific age?

    __No, they are found in all SVA groups (SVA through F). We will specify this in the manuscript (results section). __

    Figure 4d: are there GO enrichment terms for the genes bound by either YY1 and / or OCT4 different?

    __We will perform this straightforward analysis using the Ingenuity Pathway Analysis toolkit, and include it in the revised version in the results section. __

    Figure 5: Concluding figure should address how the SVAs subset in terms of binding, H3K9me3, gene expression changes and TFBS/TF binding - there are a lot of parameters which are assessed within the de-repressed subclass and it would be useful to show somewhere graphically where and when they co-occur or not.

    __We will edit the model figure to show the mechanism highlighted by the reviewer, as requested. __

    Finally, it would be helpful to see a discussion about what dictates the absence of H3K9me3 / presence of H3K27ac? Is this due to the TFBS sequence within the element? Further, a discussion on how the TFBS is gained in newer elements / lost in older elements is lacking. While the authors begin by stating that they are going to address what dictates whether an element is co-opted and conclude that it is due to sequence and location, I would suggest that as no conclusion is drawn on how the sequence changes to permit co-option and how the location dictates co-option, it may be worth tempering down the introduction on this point.

    __We will edit the introduction and discussion, as requested. __

    Reviewer 2 – comments____:

    1. Given the CRISPR sgRNAs also target LTR5Hs, which are also bound by OCT4 in NCCIT cells (PMID: 25896322), it would be helpful to rule out more specifically that the observed effects on gene regulation associated with SVAs are actually due to nearby LTR5Hs copies being similarly repressed by CRISPRi. Depending on those results, it may also be fair to further note in the Discussion that this aspect of sgRNA selection is a potential caveat.

    We will edit the discussion as recommended by the Reviewer.

    1. The approaches taken here provide surprisingly good locus-specific resolution of histone modifications and TF binding to SVAs using only uniquely mapping reads. An example of this, SVA_D_r153 (a heavily 5' truncated SVA) is provided. It could be really useful to convey the central theme of the study by providing in a main figure another SVA example. Except, show a longer SVA (to demonstrate mappability and perhaps enrichment of reads on specific SVA features) near a protein-coding gene, where the SVA becomes repressed in the CRISPRi approach and the gene is differentially expressed as a result. An IGV-style figure perhaps demonstrating each key component of the work in one figure.

    __As recommended, we will update figure 4 by adding a full length SVA. __

    1. Literature. Line 58 - would suggest adding PMID: 27197217 and PMID: 33186547. Line 60 - would add PMID: 33722937. Line 67 - would add PMID: 22053090.

    __We will add these citations in the manuscript as recommended by the Reviewer. __

    1. Clarifications: Line 108 - the same 751 SVAs came up in both iPSCs and NCCITs? Line 117 - how many (if any) of the SVAs called as repressed by H3K9me3 were also called as de-repressed by H3K27ac? i.e. are the two histone marks giving completely concordant results for calling SVAs as repressed / de-repressed. Line 215 - no evidence for OCT4 or YY1 binding to any SVAs after CRISPRi at all? __We will provide the exact numbers in the manuscript to answer these questions. __
    1. Finally, a point perhaps best left to the Discussion. Was there any cellular phenotype identified subsequent to the CRISPRi?

    __We did not identify any obvious cellular phenotype and we will mention this in the discussion, as recommended. __

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    Barnada et al. explore the regulatory impact of SVA retrotransposons on gene regulation, focusing on NCCIT cells as a workhorse model of pluripotency. They use published and new H3K9me3 and H3K27ac ChIP-seq datasets to identify repressed and de-repressed SVAs, noting that many are in close proximity to protein coding genes. De-repressed SVAs are enriched for YY1/OCT4 binding sites. The authors then use RNA-seq and ChIP-seq to demonstrate YY1 and OCT4 binding are abrogated by SVA CRISPRi, disrupting gene regulation enacted by the SVAs. These findings highlight an important mechanism by which SVA retrotransposons can regulate genes in pluripotent cells.

    This work is well executed. I appreciated the consideration paid to ChIP-seq read mappability and the implementation of CRISPRi followed by additional ChIP-seq. The following comments are intended to clarify the findings, which appear to have been obtained from robust experimental approaches.

    Minor issues:

    1. Given the CRISPR sgRNAs also target LTR5Hs, which are also bound by OCT4 in NCCIT cells (PMID: 25896322), it would be helpful to rule out more specifically that the observed effects on gene regulation associated with SVAs are actually due to nearby LTR5Hs copies being similarly repressed by CRISPRi. Depending on those results, it may also be fair to further note in the Discussion that this aspect of sgRNA selection is a potential caveat.
    2. The approaches taken here provide surprisingly good locus-specific resolution of histone modifications and TF binding to SVAs using only uniquely mapping reads. An example of this, SVA_D_r153 (a heavily 5' truncated SVA) is provided. It could be really useful to convey the central theme of the study by providing in a main figure another SVA example. Except, show a longer SVA (to demonstrate mappability and perhaps enrichment of reads on specific SVA features) near a protein-coding gene, where the SVA becomes repressed in the CRISPRi approach and the gene is differentially expressed as a result. An IGV-style figure perhaps demonstrating each key component of the work in one figure.
    3. Literature. Line 58 - would suggest adding PMID: 27197217 and PMID: 33186547. Line 60 - would add PMID: 33722937. Line 67 - would add PMID: 22053090.
    4. Clarifications: Line 108 - the same 751 SVAs came up in both iPSCs and NCCITs? Line 117 - how many (if any) of the SVAs called as repressed by H3K9me3 were also called as de-repressed by H3K27ac? i.e. are the two histone marks giving completely concordant results for calling SVAs as repressed / de-repressed. Line 215 - no evidence for OCT4 or YY1 binding to any SVAs after CRISPRi at all?
    5. Finally, a point perhaps best left to the Discussion. Was there any cellular phenotype identified subsequent to the CRISPRi?

    Significance

    This interesting study systematically demonstrates the importance of SVA-mediated gene regulation, mediated by YY1 and OCT4, in a cellular model of pluripotency. It would be of broad interest, particularly to those interested in gene regulation, pluripotency and retrotransposons. I would say most of the findings are new to the literature and that the closest publications I can think of in terms of scope either deal differently with SVA regulation (e.g. PMID: 33722937) or upon different retrotransposons (e.g. PMID: 30070637). The significant advance is therefore both technical and conceptual.

    Geoff Faulkner (University of Queensland)

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    Genomic features underlie the co-option of SVA transposons as cis-regulatory elements in human pluripotent stem cells Barnada et al.

    Barnada et al., set out to address the question of which factors dictate when and how certain TEs become co-opted to regulate host cellular functions while others do not, citing a lack of global understanding of the mechanisms underpinning this widely occurring phenomenon. To do so they use human SVA elements as a study, interrogating the epigenetic regulation of these elements in NCCITs as a proxy for pluripotent cells to reveal that a subset of younger, human-specific SVAs lack canonical H3K9me3 repressive marks while being enriched for H3K27ac active enhancer marks. The authors show that these SVAs, termed de-repressed SVAs, contain adjacent YY1 and OCT4 binding motifs, are closer to genes and TFBSs than the repressed SVAs and as such propose that they may function as active enhancers. To demonstrate this they generate a CRISPRi NCCIT cell line to epigenetically silence de-repressed SVAs using two gRNAs targeting SVAs genome-wide. Upon activation of dCas9 in this system differential expression of >3000 genes occurs, notably the broad repression of de-repressed SVA-proximal genes which are enriched for GO terms and TFBSs related to gametogenesis. The authors then demonstrate that silencing of naturally de-repressed SVAs disrupts YY1/OCT4 binding which is seen in wildtype NCCITs to occur adjacently at one site in the SVA element with solo-YY1 binding also occurring in a subset of SVAs at a second site. Disruption of YY1/OCT4 binding by targeting H3K9me3 to derepressed SVAs leads to dysregulation of proximal genes providing further evidence that SVAs act as enhancers via YY1/OCT4 binding to regulate nearby cellular genes.

    Overall, the manuscript is clearly written and the computational and experimental approaches thorough; the findings are compelling and novel and make a helpful contribution to our current understanding of transposon co-option by host genomes. The demonstration of TFBS located within a subset of newer SVA elements is particularly interesting, with binding disrupted upon epigenetic silencing of these elements by CRISPRi. I have some minor comments and queries about further discussion points:

    Minor comments:

    1. Figure 1a + 1b: The pattern of H3K9me3 at SVA elements appears to be quite different between the two cell types tested - in iPSCs it appears to be more strongly centred on the SVA with some "spreading" occurring upstream of the element, however in NCCITs it appears to mark downstream of the element and is not clearly seen to occur on the SVA. Is this due to sequencing data from different labs or due to cell - type specific epigenetic marks? The authors should consider showing H3K9me3 marks at a comparable region for example repressed genes or another family of repressed TEs as an internal control as well as showing profile plots to more clearly show the spread of epigenetic marks over the SVAs.
    2. Figure 1: Given that the data in this paper is mostly from NCCIT cells it would be advisable for conclusions in hiPSCs to be tempered accordingly.
    3. Figure 1c: Which SVAs have neither H3K9me3 nor H3K27ac, what might these represent? Do they have any other notable features, or are they of a specific age?
    4. Line 128-129: Is the assumption that SVAs are enriched in these region as opposed to dictating the epigenetic landscape as a consequence of their own sequence, is this correct?
    5. Figure 2c: Are there any enriched motifs in the repressed SVAs?
    6. Figure 2c: How many of the de-repressed SVAs contain this motif? It would be interesting to know if the YY1/OCT4 binding sites exist within any of the repressed SVAs and then, later, whether they accordingly lack actual binding of the TFs due to the presence of H3K9me3.
    7. Figure 3d: It would be interesting to assess how close the differentially expressed genes are to an LTR5H element, or however many of the SVA-proximal and differentially expressed genes are also LTR5H-proximal?
    8. Figure 3e: Are genes close only to de-repressed SVAs considered here? Worth specifying in the text. Also, what are the 20% of genes which are upregulated?
    9. Figure 4a: Are the binding motifs for YY1 present at both binding sites? Are they the same, is the surrounding sequence the same? Are either of the binding sites present in repressed SVAs/is there any detectable binding of SVA/OCT4 in repressed SVAs? Are the YY1/OCT4 bound SVAs also those marked with H3K27ac in the Wysocka Lab dataset?
    10. Figure 4c: Example used is a SVA-D element, are the YY1/OCT4-bound SVAs within the de-repressed group of a specific age?
    11. Figure 4d: are the GO enrichment terms for the genes bound by either YY1 and / or OCT4 different?
    12. Figure 5: Concluding figure should address how the SVAs subset in terms of binding, H3K9me3, gene expression changes and TFBS/TF binding - there are a lot of parameters which are assessed within the de-repressed subclass and it would be useful to show somewhere graphically where and when they co-occur or not.
    13. Finally, it would be helpful to see a discussion about what dictates the absence of H3K9me3 / presence of H3K27ac? Is this due to the TFBS sequence within the element? Further, a discussion on how the TFBS is gained in newer elements / lost in older elements is lacking. While the authors begin by stating that they are going to address what dictates whether an element is co-opted and conclude that it is due to sequence and location, I would suggest that as no conclusion is drawn on how the sequence changes to permit co-option and how the location dictates co-option, it may be worth tempering down the introduction on this point.

    Significance

    This work represents a novel, comprehensive and significant contribution to the understanding of co-option of transposons into host gene networks and factors underlying these processes.