Neutral evolution of snoRNA Host Gene long non-coding RNA affects cell fate control

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

A fundamental challenge in molecular biology is to understand how evolving genomes can acquire new functions. Several recent studies have underscored how non-conserved sequences can contribute to organismal diversification in the primate lineage 1–3 . Actively transcribed, non-coding parts of the genome provide a potential platform for the development of new functional sequences 4 , but their biological and evolutionary roles remain largely unexplored. Here we show that a set of neutrally evolving long non-coding RNAs (lncRNA) arising from small nucleolar RNA Host Genes (SNHGs) are highly expressed in skin and dysregulated in inflammatory conditions. SNHGs affect cell fate determination and can behave as evolutionary intermediates to develop new functions 5 . Using SNHG7 and human epidermal keratinocytes as a model, we describe a mechanism by which these lncRNAs can increase self-renewal and inhibit differentiation. SNHG7 lncRNA’s activity has been acquired recently in the primate lineage and depends on a short sequence required for microRNA binding. Taken together, our results highlight the importance of understanding the role of fast-evolving transcripts in normal and diseased epithelia, and inform on how poorly conserved, actively transcribed non-coding sequences can participate in the evolution of genomic functionality.

Article activity feed

  1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    We would like to sincerely thank both reviewers for taking the time to examine our work, for the cogent points they have raised, and their constructive attitude aimed at improving our manuscript.

    Reviewer #1

    • Single-cell RNAseq (Fig. 1B) and in situ (Fig. 3D) results both indicate that SNHG7 is broadly expressed in multiple epidermal layers but more enriched in the spinous layer. Although most assays, such as colony formation and Ki67 staining, did not specifically examine the role of SNHG7 in the spinous layer, the raft culture experiment seemed to indicate specific reduction of the spinous layer (Fig. 3H), which was more prominent than basal defects. The authors should examine the defects more carefully in the raft culture system by using basal, spinous and granular markers. It is possible that SNHG7 functions to maintain limited cell proliferation while restrict premature differentiation. In addition, they should perform serial passage experiments to distinguish whether overexpression of SNGH7 can indeed confer self-renewal in long-term experiments.*

    We have included staining of a range of epidermal markers in the raft cultures (ITGB1, K14, K10 and IVL) in the revised manuscript (Fig. S6B). We do not observe major changes in the distribution of any of the differentiation markers.

    In terms of serial passage, we have cultured SNHG7-overexpressing or control cells for multiple passages until their growth capacity was exhausted. The SNHG7-overexpressing cells grew for approximately seven more passages than the control cells. We have added this information to the revised manuscript (Fig. S6L).

    • The main proposed mechanism is the sequestration of miR34 by SNHG7. While miR34 is well known for its function in inhibiting cell proliferation, the ability of coding or noncoding RNA to sequestrate miRNAs is highly dependent on the stability and copy number of these RNAs. Since they have single-cell data with UMI information, they should estimate the copy number of SNHG7 in epithelial cell populations, and this could provide a range for the "buffering" capacity of SNHG7. They should also examine, ideally by in situ hybridization, the expression patterns of miR34 in human vs mouse skin. While miR34 expression can be induced by p53 activation, it is possible that its expression varies in different species. It'll be interesting to determine whether the lack of miR34 expression in mouse keratinocyte or mouse skin could explain the insensitivity of mouse keratinocytes to SNHG7. Finally, to further demonstrate the competition between SNHG7 and miR34 targets, they can use a heterologous luciferase reporter system with a canonical miR34 targeting site in the 3'UTR and quantify luciferase activities with or without SNHG7 (or SNHG7 mut34 variant). This assay could quantify the impact of SNHG7 on individual miR34 targets.*

    We will include the analysis of the scRNAseq data to estimate the copy number of SNHG7 in the epidermal populations.

    We will also perform in situ hybridization staining for miR-34 in human and mouse epidermis as well as mouse keratinocytes.

    Finally, we will carry out the luciferase reporter experiments.

    Reviewer #2

    Major comments:

    1) The premise of the study relies on the observation that SNHGs have low levels of sequence conservation. Indeed the authors aim to prove that biological functions can be identified even in the absence of evolutionary conservation. However, the extent of evolutionary conservation strongly depends on the phylogenetic scale at which it is analyzed. Here, the authors evaluate sequence conservation using PhastCons and PhyloP scores determined using alignments of human and 99 other vertebrate species. These scores reflect the extent of long-term sequence conservation. At this scale, only a small percentage of the human genome can be considered to be "conserved". It is thus not surprising that SNHG and other lncRNAs are not conserved at this scale, even if they carry some biological functions in the human genome. Here, it would be useful to redo the sequence conservation analyses using PhastCons and PhyloP scores computed on less distant species. Pre-computed scores exist for alignments containing human and other mammalian species, mainly primates (UCSC genome browser). It would also be good to provide comparisons of sequence conservation levels on the snoRNA genes and on the non-snoRNA parts of SNHGs. In addition to protein-coding genes, pseudogenes and lncRNAs, it would be good to add a perfectly neutral control for the sequence conservation in Figure 1C - for example, flanking intergenic regions for SNHGs. It might also be a good idea to analyze the GC content of SNHGs compared to other lncRNAs, since GC content can be correlated with sequence conservation levels, in particular in noncoding regions.

    Importantly, the SNHG selected for detailed investigation (SNHG7) appears to be more conserved than the bulk of human lncRNAs, given that it is found in another primate and in mouse. It would be interesting to analyze in details what sequence features of this lncRNAs are conserved among species - for example, are the SNHG7 splice sites and promoter regions conserved? Are the snoRNA genes always located in the introns?

    The reviewer raises very good points. We have added the conservation data of snoRNA genes (all or SNHG-resident, both significantly more conserved than SNHGs or lncRNA), and a “true neutral” control (we used the introns of 10,000 randomly sampled genes) to our PhastCons analysis (Fig.1C). We also added a conservation analysis of the promoters (defined as the 500 bp upstream of the transcription start site) of coding genes, pseudogenes, lncRNA and SNHGs (Fig, S1B). We have now performed all our conservation analyses using both PhastCons scores generated from the 100-vertebrate alignment and PhastCons scores generated from the 30-mammals (28 primate) alignment (Fig. S1E-G). We do not detect any marked difference between the two alignment sets. We have also added a comparison of the GC content in lncRNA, SNHGs, coding genes and introns (Fig.S1C).

    Due to the nature of the PhyloP scores, the 30-mammal PhyloP track (phyloP30) would be unsuitable to detect additional conservation in the primate lineage using the thresholding analysis we employed in Fig. 1E. The PhyloP track gives base-wise p-values for conservation (positive values) or accelerated evolution (negative values). Using alignments of genomes that are overall more similar to each other (as in the 30-mammal alignment set) makes it more difficult to distinguish between conserved and neutrally evolving regions, because even segments that are not under constraint will look relatively similar due to the evolutionary proximity of the species in the set. For the same reason this alignment set is quite sensitive to accelerated evolution, as it contains many relatively similar genomes it the alignment.

    This causes the PhyloP30 scores to be very asymmetrical around zero: the conservation (positive) scores never reach 2 (p-value of 0.01) in the whole track (not even coding regions of very well-conserved genes), while acceleration scores can reach very significant values, down to -20. Conversely, the PhyloP100 track (used in Fig.1E) is quite symmetrical around 0 and is thus better suited for the purposes of the analysis in Fig.1E, which are to detect both conserved and accelerated portions of SNHGs. We have however also inspected the PhyloP30 track manually and do not observe any clear evidence of presence of additional conserved elements in SNHG7. We have added all conservation tracks for SNHG7 to Fig. 3A.

    While lncRNA orthologs can be identified by using a combination of sequence conservation, conserved synteny with surrounding genes and in some cases conserved gene structure, SNHG orthologs can additionally be identified by the embedded (conserved) intronic snoRNA sequence, which makes them easier to find even when the transcript sequence bears no similarity across species. The mouse SNHG7 sequence, for example, does not match with human SNHG7 even using the least stringent BLAST parameters. The monkey sequence is similar enough to match with the human in the 3’ of the gene, but the intron-exon structure of the 5’ is completely different. We agree with the reviewer’s assessment, however, when it comes to identification of SNHG orthologs in more evolutionary distant species, closer to the root of the vertebrate clade.

    Regarding splice sites, they are often conserved among lncRNA in general (gene structure conservation occurs more frequently than sequence conservation, see Ulitsky, Nat Rev Genet, 2016). In the case of SNHG7 the structure of the gene appears conserved in the mouse (though this annotation has likely not been fully confirmed experimentally), and in the monkey based on genome alignments. However, our RACE experiments show that the 5’ end of SNHG7 in the monkey has a radically different splicing pattern when compared to human, so it is difficult to assess splicing conservation in the absence of full isoform characterization.

    2) I am not perfectly convinced by the enrichment of miRNA target genes among the genes that are downregulated upon SNHG knockdown. The methods do not clearly explain how this enrichment is calculated. What is the background used for this enrichment analysis? In Figure 5C, we see that the genes predicted to be targeted by the top 10 microRNAs tend to have negative fold changes in the differential expression analysis (downregulation following knockdown). However, from Figure 5A it seems that the great majority of significantly differentially expressed (DE) genes have negative fold changes. How do the miRNA target genes differ from all other DE genes? What proportions of all predicted miRNA target genes (expressed in keratinocytes) are DE following knockdown, and how does this compare with the target genes of other miRNAs?

    We have added a description of the statistical test used by MIENTURNET for the enrichment analysis to the methods section. More details can be found in the original publication. The significance of the enrichment is calculated by performing a hypergeometric test using as background the total number of miRNA-target interactions in the database and the total interactions the individual miRNA being tested engages in.

    Figure 5C only includes the genes we used in our enrichment analysis (i.e. the significantly downregulated genes, not all DE genes) and it’s meant to show the extent of downregulation exhibited by the targets of the most significantly enriched miRNAs within this group of genes.

    The reviewer is correct in pointing out that the imbalance between downregulated and upregulated genes (which we now further highlight in the plots in Fig. 5A-B) will tend to skew any group of genes towards having a relatively large number of downregulated genes. However, we found this bias to be particularly strong in the case of our candidate miRNAs. We now show this in volcano plots for validated targets of the candidate miRNAs and a control miRNA (Fig. S8E). In a similar way, when looking at the cumulative distributions of the fold changes of all predicted targets for a certain miRNA and comparing it to the fold change cumulative distribution of all other genes, our candidate miRNAs displayed a more pronounced shift towards downregulation than the control miR-21-5p (which we now added in Fig. S8F).

    3) If keratinocyte RNA-seq data is available for other species (for example mouse), it would be interesting to test whether the high expression levels of SNHG7 and the other analyzed SNHGs are also conserved in the other species.

    We will include the RNA-seq data, if available.

    Minor comments:

    1) The AD (atopic dermatitis) abbreviation should be explained the first time it is mentioned in the text.

    We thank the reviewer for pointing out the missing abbreviation, we have now added it.

    2) More details are needed regarding the MIENTURNET analyses in the methods and in the main text

    We have added more details about the statistics involved to the methods (see above).

    *3) Figure 5C, it is not clear how to interpret the color code for the boxplot. Does this represent the median or mean FDR of the target genes? Are only genes with FDRThe genes included in the enrichment analysis are all downregulated genes with an adjusted p-value (or FDR adjustment after the Wald test) adj”. The color scale in Fig 5C reports the significance of the enrichment for targets of the single miRNAs within the significantly downregulated genes list (FDR adjustment after the hypergeometric test), not the significance of the downregulation itself. FDR values are also used for all other enrichment analyses (GO terms, REACTOME Pathways). We apologise to the reviewer for the confusion, we have now modified the text and figures to make this clearer.

    4) Figure 6D, I am not sure how to the D panel. Do the gray rectangles represent the exonic length of the SNHGs? Do the dots correspond to the positions of the miRNA target sites? Here, a more quantitative comparison with the extent of sequence conservation of miRNA binding sites in SNHGs, other lncRNAs and in protein-coding genes would be perhaps better suited.

    The grey rectangles in Fig. 6D represent the total exonic length of the SNHGs (basically all exons are “stitched together” head to tail irrespective of the actual isoforms) and the dots represent the positions of the miRNA binding sites within this “maximum exonic coverage”. Since not all individual isoforms are analysed it is possible that some additional miRNA sites can be created at alternatively spliced junctions, however we would estimate the number of such sites to be small. We have added this caveat to the methods section.

    The degree of conservation that is estimated by TargetScan to underlie a functional MRE in coding genes is taken into account in this analysis, as sites within SNHGs that pass this threshold are highlighted with yellow borders in the figure. We have now added a plot of the distribution of Branch length scores for MREs in SNHGs and the distribution of Branch length scores for MREs in a random sample of 250 Coding genes UTRs (Fig. S9E). A similar comparison for lncRNA is more challenging as the data is not readily available and is likely to be confounded by the nuclear localisation of a majority of lncRNA species.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    Summary

    The manuscript submitted by M. Vietri Rudan and co-authors presents a functional analysis of a specific class of long non-coding RNA transcripts, namely the small nucleolar RNA host genes (SNHGs). These genes are defined by the fact that snoRNA molecules are embedded within their loci, often in the SNHGs introns. The rationale presented by the authors for studying this specific class of lncRNA genes is the fact that they display very low levels of evolutionary sequence conservation (even compared to the generally low conserved lncRNA transcripts) and that they are expressed at remarkably high levels (contrary to most lncRNA genes, which are very weakly expressed). The overarching goal of this study thus appears to be determining whether functional transcripts can exist in the absence of evolutionary conservation.

    The authors study SNHGs in the human keratinocytes model system. They analyze the expression levels of SNHGs in skin affected by atopic dermatitis and psoriasis, compared to normal skin. They identify several SNHGs that are differentially expressed between diseased and normal skin, and which are also detected at strong expression levels in single cell RNA-seq assays of human keratinocytes. The authors use knockdown assays to investigate the potential roles of these SNHGs in keratinocytes and show that the knockdown of each of 5 selected SNHGs results in a loss of clonogenicity.

    The last part of the manuscript focuses on the functional characterization of SNHG7, chosen because it is dysregulated in both atopic dermatitis and psoriasis and because its knockdown strongly affects clonogenicity. Additional assays showed that SNHG7 knockdown results in a reduced rate of proliferation and an increase in the fraction of differentiating cells. To ensure that the effects of the knockdown are not simply due to the absence of the snoRNA molecules embedded in the SNHG7 locus, the authors overexpressed the spliced form of SNHG7 (which lacks the snoRNA genes), successfully rescuing the cellular phenotype. They also verified that the knockdown does not affect the abundance of the corresponding snoRNA molecules.

    To propose a potential mechanism for the involvement of SNHG7 in keratinocyte proliferation, the authors investigated its capacity to act as a miRNA "decoy". They searched for an enrichment of miRNA binding sites among the genes that are downregulated following SNHG7. They identified several miRNAs which are predicted to target SNHG7 as well as (a substantial fraction of) the genes that are downregulated following SNHG7 knockdown. The transfection of two of these miRNAs (miR-193-3p and miR-34-5p) has a negative effect on keratinocyte proliferation, supporting the authors's hypothesis that SNHG7 may act via "sponging" these miRNAs, thereby positively contributing to the control of the expression of the other miRNA target genes. Consistent with this hypothesis, the authors show that overexpression of a SNHG7 mutant sequence that lacks the miRNA binding sites is not able to rescue the knockdown phenotype.

    However, this hypothesis is not supported (or at leat not further reinforced) by the evolutionary analysis performed by the authors. They were able to identify homologues for SNHG7 in another primate species (night monkey) and in the mouse. Transfection of the human SNHG7 sequence was able to increase clonogenicity in night monkey cells, but not in mouse cells. Given that miR-34 is also present in mouse and was previously shown to affect keratinocyte proliferation, it is not clear why the human SNHG7 sequence is not able to act as a miRNA sponge in this species. Likewise, only a slight effect on clonogenicity is observed upon SNHG7 transfection in night monkey. The authors conclude (correctly, in my view) that further investigations are needed to confirm the potential functions of SNHG7.

    Overall, I find that this study is interesting and carefully conducted. Nevertheless, I have several comments that I hope can improve this manuscript.

    Major comments:

    1. The premise of the study relies on the observation that SNHGs have low levels of sequence conservation. Indeed the authors aim to prove that biological functions can be identified even in the absence of evolutionary conservation. However, the extent of evolutionary conservation strongly depends on the phylogenetic scale at which it is analyzed. Here, the authors evaluate sequence conservation using PhastCons and PhyloP scores determined using alignments of human and 99 other vertebrate species. These scores reflect the extent of long-term sequence conservation. At this scale, only a small percentage of the human genome can be considered to be "conserved". It is thus not surprising that SNHG and other lncRNAs are not conserved at this scale, even if they carry some biological functions in the human genome. Here, it would be useful to redo the sequence conservation analyses using PhastCons and PhyloP scores computed on less distant species. Pre-computed scores exist for alignments containing human and other mammalian species, mainly primates (UCSC genome browser). It would also be good to provide comparisons of sequence conservation levels on the snoRNA genes and on the non-snoRNA parts of SNHGs. In addition to protein-coding genes, pseudogenes and lncRNAs, it would be good to add a perfectly neutral control for the sequence conservation in Figure 1C - for example, flanking intergenic regions for SNHGs. It might also be a good idea to analyze the GC content of SNHGs compared to other lncRNAs, since GC content can be correlated with sequence conservation levels, in particular in noncoding regions.

    Importantly, the SNHG selected for detailed investigation (SNHG7) appears to be more conserved than the bulk of human lncRNAs, given that it is found in another primate and in mouse. It would be interesting to analyze in details what sequence features of this lncRNAs are conserved among species - for example, are the SNHG7 splice sites and promoter regions conserved? Are the snoRNA genes always located in the introns?

    1. I am not perfectly convinced by the enrichment of miRNA target genes among the genes that are downregulated upon SNHG knockdown. The methods do not clearly explain how this enrichment is calculated. What is the background used for this enrichment analysis? In Figure 5C, we see that the genes predicted to be targeted by the top 10 microRNAs tend to have negative fold changes in the differential expression analysis (downregulation following knockdown). However, from Figure 5A it seems that the great majority of significantly differentially expressed (DE) genes have negative fold changes. How do the miRNA target genes differ from all other DE genes? What proportions of all predicted miRNA target genes (expressed in keratinocytes) are DE following knockdown, and how does this compare with the target genes of other miRNAs?
    2. If keratinocyte RNA-seq data is available for other species (for example mouse), it would be interesting to test whether the high expression levels of SNHG7 and the other analyzed SNHGs are also conserved in the other species.

    Minor comments:

    1. The AD (atopic dermatitis) abbreviation should be explained the first time it is mentioned in the text.
    2. More details are needed regarding the MIENTURNET analyses in the methods and in the main text
    3. Figure 5C, it is not clear how to interpret the color code for the boxplot. Does this represent the median or mean FDR of the target genes? Are only genes with FDR<5% included in this analysis?
    4. Figure 6D, I am not sure how to the D panel. Do the gray rectangles represent the exonic length of the SNHGs? Do the dots correspond to the positions of the miRNA target sites? Here, a more quantitative comparison with the extent of sequence conservation of miRNA binding sites in SNHGs, other lncRNAs and in protein-coding genes would be perhaps better suited.

    Significance

    Overall, this study is highly relevant in the field of lncRNA functionality and evolution. It presents evidence for a potential involvement in the regulation of cell proliferation for SNHGs, a role that appears to be independent of the snoRNAs produced by these loci. This study reinforces the current body of work suggesting that lncRNAs and other noncoding transcripts can sometimes function as miRNA "decoy" targets. This study will be of interest for a specialized audience, oriented towards understanding lncRNA biological functions.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    Vietri Rudan and colleagues present an interesting study to examine the function and mechanism of small nucleolar RNA Host Gene 7 (SNHG7) in human keratinocytes. The key findings are that a widely expressed SNHG7, a long noncoding RNA hosting two snoRNAs in the introns, promotes keratinocyte proliferation and inhibits differentiation likely by sequestrating miR34a from its targets. The experiments are generally well-executed with proper controls. Notably, they leveraged human, night monkey and mouse keratinocytes to reveal primate specific functions of SNHG7 in a miR34 dependent manner. I have a few comments and suggestions that should be addressed to further strengthen the study.

    1. Single-cell RNAseq (Fig. 1B) and in situ (Fig. 3D) results both indicate that SNHG7 is broadly expressed in multiple epidermal layers but more enriched in the spinous layer. Although most assays, such as colony formation and Ki67 staining, did not specifically examine the role of SNHG7 in the spinous layer, the raft culture experiment seemed to indicate specific reduction of the spinous layer (Fig. 3H), which was more prominent than basal defects. The authors should examine the defects more carefully in the raft culture system by using basal, spinous and granular markers. It is possible that SNHG7 functions to maintain limited cell proliferation while restrict premature differentiation. In addition, they should perform serial passage experiments to distinguish whether overexpression of SNGH7 can indeed confer self-renewal in long-term experiments. Based on these results, they may need to refine their hypothesis/conclusion whether SNHG7 functions primarily on stem cell self-renewal or transiently maintain proliferation in transitioning cells.
    2. The main proposed mechanism is the sequestration of miR34 by SNHG7. While miR34 is well known for its function in inhibiting cell proliferation, the ability of coding or noncoding RNA to sequestrate miRNAs is highly dependent on the stability and copy number of these RNAs. Since they have single-cell data with UMI information, they should estimate the copy number of SNHG7 in epithelial cell populations, and this could provide a range for the "buffering" capacity of SNHG7. They should also examine, ideally by in situ hybridization, the expression patterns of miR34 in human vs mouse skin. While miR34 expression can be induced by p53 activation, it is possible that its expression varies in different species. It'll be interesting to determine whether the lack of miR34 expression in mouse keratinocyte or mouse skin could explain the insensitivity of mouse keratinocytes to SNHG7. Finally, to further demonstrate the competition between SNHG7 and miR34 targets, they can use a heterologous luciferase reporter system with a canonical miR34 targeting site in the 3'UTR and quantify luciferase activities with or without SNHG7 (or SNHG7 mut34 variant). This assay could quantify the impact of SNHG7 on individual miR34 targets.

    Significance

    This study reveals a potential function and mechanism of primate specific noncoding RNA for its role in modulating gene expression and cellular functions in the skin. It provides a new paradigm for identifying molecular functions of poorly conserved RNAs.