Human-specific lncRNAs contributed critically to human evolution by distinctly regulating gene expression

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This valuable study uses population and functional genomics to examine long non-coding RNAs (lncRNAs) in the context of human evolution. Computational prediction of human-specific lncRNAs and their DNA binding sites and analyses of these loci lead to the development of hypotheses regarding the potential roles of these genetic elements in human biology. The evidence supporting the conclusions is, however, still incomplete, as key details regarding the methodology and analyses are lacking.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

What genomic sequences make conserved genes generate divergent expression in closely related species, which may have critically driven human evolution, has puzzled researchers for decades. Genomic studies have examined species-specific gene birth, gene loss, and changes in promoters and transcription factor binding sites, but species-specific epigenetic regulation remains barely explored. This study identified human-specific long noncoding RNAs (lncRNAs) from GENCODE-annotated human lncRNAs, predicted their DNA binding sites (DBSs) genome-wide, analyzed these DBSs and their counterparts in modern humans (CEU, CHB, and YRI), archaic humans (Altai Neanderthals, Denisovans, and Vindija Neanderthals), and chimpanzees, and analyzed the impact of DBSs on gene expression in modern and archaic humans. The results suggest that human-specific lncRNAs and their DBSs have substantially rewired gene expression human-specifically and that the rewiring has evolved continuously from archaic to modern humans. Rewired gene expression promotes brain development, makes humans adapt to new environments and lifestyles, and causes differences in modern humans. These results uncover a critical dimension of human evolution and underscore the diverse functions of species-specific lncRNAs.

Article activity feed

  1. Author Response

    The following is the authors’ response to the original reviews.

    We thank the two reviewers very much for their careful review and valuable comments. Upon these comments, the following revisions have been made. First, we have performed a new analysis on human accelerated regions (HARs) recently reported by the Zoonomia Project. Second, we have presented more data on experimentally detected and computationally predicted DBSs of MALAT1, NEAT1, and MEG3. Third, we have added details on the RNA-seq data processing and subsequent differential expression testing to the Materials and Methods section. Fourth, we have clarified some details on the human ancestor sequence and the use of parameters and thresholds. Six new citations are added. In addition, we have also carefully polished the main text. We hope these revisions, together with the Responses-to-Reviewers, would help the reader better get the information from the paper.

    eLife assessment

    In this valuable manuscript, the authors attempt to examine the role of long non-coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are at times inadequate - for example, suitable methods and/or relevant controls are lacking at many points, and selection is inferred sometimes too quickly - the results nonetheless point towards a possible contribution of long non-coding RNAs to the evolution of human biology and they suggest clear directions for future, more rigorous study.

    Public Reviews:

    Reviewer #1 (Public Review):

    Summary

    While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

    Strengths/weaknesses

    By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

    Comments

    1. Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.
    1. There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

    Responses:

    (1) Thanks for the good suggestion. We have checked the Zoonomia reported genomes and found that new primate genomes are monkeys and lemurs but not apes (Zoonomia Consortium. Nature 2023. https://doi.org/10.1038/s41586-020-2876-6), and the phylogenetic relationships between monkeys and humans are much more remote than those between apes and humans. In addition, the Zoonomia project did target identifying new lncRNA genes.

    (2) We have examined the Zoonomia-reported HARs (Keough et al. Science 2023. DOI: 10.1126/science.abm1696). Of the 312 HARs reported by Keough et al, 8 overlap 26 DBSs of 14 HS lncRNAs; moreover, DBSs greatly outnumber HARs, suggesting that HAR and DBS are different sequences with different functions.

    (3) In the revised manuscript, a new paragraph (the second one) has been added to the section “HS lncRNAs regulate diverse genes and transcripts” to describe the HAR analysis result.

    1. The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

    Responses:

    (1) Upon the archaic human genomes, the genomic distances from the three modern humans are shorter to Denisovan than to Altai Neanderthal; however, upon the related studies we cite, the phylogenetic relationship between the three modern humans is more remote to Denisovan than to Altai Neanderthal. Thus, the finding that 2514 and 1256 DBSs have distances >0.034 in Denisovans and Altai Neanderthals is not unreasonable. The numbers of DBSs, of course, depend on the cutoff of 0.034, which is somewhat subjective but not unreasonable.

    (2) The second paragraph is added to the Discussion, discussing parameters and cutoffs.

    (3) Regarding the two types of distance, the distances computed in the first way were not further analyzed because, as we note, “This anomaly may be caused by that the human ancestor was built using six primates without archaic humans”.

    1. Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

    Responses:

    Indeed, TFBSs are more comparable to DBSs than promoters. However, many more methods have been developed to predict TFBSs than to predict DBSs, making us concerned about TFBS prediction's reliability. Since most QTLs in DBSs are mQTLs (Supplementary Table 13), but many QTLs in TFBSs are eQTLs (Flynn et al. PLoS Genetics 2021. DOI: 10.1371/journal.pgen.1009719), it is pretty safe to conclude that DBSs are enriched in mQTLs.

    1. In the Discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

    Responses:

    (1) The Discussion on human adaptation to high sugar intake is based on both enriched GO terms (Supplementary Table 4, 7) and a set of genes in modern humans with the most SNP-rich DBSs (Table 2). These glucose-related GO terms are not at the tail of the list because, of the 614 enriched GO terms (enriched in genes with strongest DBSs), glucose metabolism-related ones are ranked 208, 212, 246, 264, 504, 522, 591, and of the 409 enriched GO terms (enriched in the top 1256 genes in Altai Neanderthals), glucose metabolism-related ones are ranked 152 and 217.

    (2) Indeed, there are other top-ranked enriched GO terms; some (e.g., neuron projection development (GO:0031175) and cell projection morphogenesis (GO:0048858)) have known impact on human evolution, but the impact of others (e.g., cell junction organization (GO:0034330)) remain unclear. We specifically report human adaptation to high sugar intake because the DBSs in related genes show differences in modern humans (Table 2).

    Reviewer #2 (Public Review):

    Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

    At this point, my suggestions are mostly focused on tightening and strengthening the methods; it is hard for me to predict the consequence of these changes on the results or their interpretation, but as a general rule I also encourage the authors to not over-interpret their conclusions in terms of what phenotype was selected for when as they do at certain points (eg glucose metabolism).

    Responses:

    (1) Now, we use more cautious wording to describe the results.

    (2) A paragraph (the second one) is added to Discussion to explain parameters and cutoffs.

    (3) We make the caution at the end of the third paragraph that “We note that these are findings instead of conclusions, and they indicate, suggest, or support something revealing the primary question of what genomic differences critically determine the phenotypic differences between humans and apes and between modern and archaic humans”.

    I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

    1. Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

    Responses:

    (1) Binding affinity and length of all DBSs of HS lncRNAs are given in Supplementary Table 2 and 3. Since a triplex (say, 100 bp in length) may have 50% or 70% of nucleotides bound, it is necessary to differentiate binding affinity and length, and the two measures can differentiate DBSs of the same length but with different binding affinity and DBSs with the same binding affinity but different length.

    (2) Differentiating DBSs into strong and weak ones is somewhat subjective, accurately differentiating them demands experimental data that are currently unavailable, and it is advisable to separately analyze strong and weak DBSs because they may likely influence different aspects of human evolution.

    1. There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

    Responses:

    (1) We do not assume/think that identified sites will always be bound. Instead, lncRNA/DBS binding is highly context-dependent (including tissue-specific).

    (2) An extra supplementary table (Supplementary Table 15) is added to show what predicted DBSs overlap experimentally detected DBSs for NEAT1, MALAT1, and MEG3. By the way, it is more accurate to say “experimentally detected” than “experimentally validated”, because experimental data have true/false positives and true/false negatives, and different sequencing protocols (for detecting lncRNA/DNA binding) may generate somewhat different results.

    It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

    Responses:

    (1) We analyzed each and every GENCODE-annotated transcript (Supplementary Table 2). For example, if a gene has N TSS and N transcripts, DBSs are predicted in N promoter regions. When analyzing gene expression in tissues, each and every transcript is analyzed.

    (2) Ideally, it would be better to do many draws, but statistically, a huge number is needed due to the number of total genes in the human genome.

    (3) We feel that doing many draws of 40 non-HS lncRNAs and determining an empirical null distribution is not as straightforward as comparing HS lncRNA-target transcript pairs (45% show significant expression correlation) with random lncRNA-random transcript pairs (2.3% show significant expression correlation).

    1. Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

    Responses:

    (1) The over-representation analysis using g:Profiler was applied to the top and bottom 2000 genes with the whole genome as the background. The number “2000” was chosen somewhat subjectively. If more or fewer genes were chosen, more or fewer enriched GO terms would be identified, but GO terms with adjusted P-values <0.05 would be quite stable.

    (2) A paragraph (the second one) is added to the Discussion to explain parameters and cutoffs.

    Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

    Responses:

    We examined Tajima’s D in DBSs (Supplementary Figure 9) and in HS lncRNA genes (Supplementary Figure 18). We compared the Tajima’s D values with the genome-wide background in both cases.

    1. There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same cutoff of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

    Responses:

    (1) The cutoff of 0.034 was chosen upon that DBSs in the top 20% (4248) genes in chimpanzees have distances larger than this cutoff, and accordingly, 4248, 1256, 2514, and 134 genes have DBSs distances >0.034 in chimpanzees, Altai Neanderthals, Denisovans, and Vindija Neanderthals. These numbers of genes qualitatively agree with the phylogenetic distances from chimpanzees, archaic humans to modern humans. If a percentage larger or smaller than 20% (e.g., 10% or 30%) is chosen, and so is a cutoff X, the numbers of genes with DBSs distance >X would not be 4248, 1256, 2514, and 134, but could still qualitatively agree with the phylogenetic distances from chimpanzees, archaic humans to modern humans.

    (2) The second paragraph in the Discussion now explains the parameters and cutoffs.

    1. Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

    Responses:

    (1) We analyzed brain tissues separately instead of taking the whole brain as a tissue, see Supplementary Table 12 and Figure 3.

    (2) We make the caution at the end of the third paragraph that “We note that these are findings instead of conclusions, and they indicate, suggest, or support something revealing the primary question of what genomic differences critically determine the phenotypic differences between humans and apes and between modern and archaic humans”.

    Reviewer #1 (Recommendations For The Authors):

    Some figures are impossible to see/read so I wasn't able to evaluate them - Fig, 1B, 1E, 1F are small and blurry.

    Responses:

    High-quality figures are provided.

    Typo in line 178: in these archaic humans, the distances of HS lncRNAs are smaller than the distances of DBSs.

    Responses:

    This is not a typo. We use “distance per base” to measure whether HS lncRNAs or their DBSs have evolved more from archaic humans to modern humans. See also Supplementary Note 4 and 5.

    Reviewer #2 (Recommendations For The Authors):

    1. There's some inconsistency in the genome builds and the database versions used, eg, sometimes panTro4 is used and sometimes panTro5 (line 456). Likewise, the version of GENCODE used is very old (18), the current version is 43. The current version contains 19928 lncRNAs, which is a big difference relative to what is being tested!

    Responses:

    (1) panTro4 was used to search orthologues of human lncRNAs; this time-consuming work started several years ago when the version of GENCODE was V18 (see Lin et al., 2019).

    (2) Regarding “the version of GENCODE used is very old (V18)”, we have later examined the 4396 human lncRNAs reported in GENCODE V36 and found that the set of 66 HS lncRNAs remains the same.

    (3) The counterparts of HS lncRNAs’ DBSs in chimpanzees were predicted recently using panTro5.

    1. Table 1: What does 'mostly' mean in this context? I understand that it refers to sequence differences between humans and the other genomes, but what is the actual threshold, and how is it defined?

    Responses:

    The title of Table 1 is “Genes with strongest DBSs and mostly changed sequence distances from modern humans to archaic humans and chimpanzees”. Instead of using two cutoffs, choosing genes with the two features seems easy and sensible.

    1. Line 117: The methods do not include information on the RNA-seq data processing and subsequent DE testing.

    Responses:

    The details are added to the section “Experimentally validating DBS predictiom” (The reads were aligned to the human GRCh38 genome using Hiasat2 (Kim et al., 2019), and the resulting sam files were converted to bam files using Samtools (Li et al., 2009). Stringtie was used to quantify gene expression level (Pertea et al., 2015). Fold change of gene expression was computed using the edgeR package (Robinson et al., 2010), and significant up- and down-regulation of target genes after DBD knockout was determined upon |log2(fold change)| > 1 with FDR < 0.1).

    1. Line 180: I looked at the EPO alignment and it's not clear to me what 'human ancestor' means, but it may well explain the issues the authors have with calculating distances (I agree those numbers are weird). Is it the reconstructed ancestral state of humans at around 300-200,000 years ago (coalescence of most human uniparental lineages), or the inferred sequence of the human-chimpanzee most recent common ancestor? If it's the former, it's not surprising it skews results towards shorter distances for modern humans, since the tree distance from that point to archaic hominins is significantly larger than to modern humans.

    Responses:

    The “human ancestor” is constructed by the EBI team upon the genomes of six primates in the Ensembl website. We find that the reconstructed ancestral state of humans may be unlikely around 300,000-200,000 years, and may be much earlier. We also find that many DNA sequences of the “human ancestor” are low-confidence calls (i.e., the ancestral states are supported by only one primate’s sequence).

    1. Line 221: SNP-rich DBS: Is this claim controlled for the length of the DBS?

    Responses:

    No. Long DBSs tend to have more SNPs. When comparing the same DBS in modern humans, archaic humans, and chimpanzees, both the length and SNP number reflect evolution, so it is not necessary to control for the length.

    1. Given that GTEx is primarily built off short-read data and it is impossible to link binding of a lncRNA to a DBS with its impact with a specific transcript

    Responses:

    As written in the section “Examining the tissue-specific impact of HS lncRNA-regulated gene expression”, we calculated the pairwise Spearman's correlation coefficient between the expression of an HS lncRNA (the representative transcript, median TPM value > 0.1) and the expression of each of its target transcripts (median TPM value > 0.1) using the scipy.stats.spearmanr program in the scipy package. The expression of an HS lncRNA gene and a target transcript was considered to be significantly correlated if the |Spearman's rho| > 0.3, with Benjamini-Hochberg FDR < 0.05.

    1. Line 429: should TTO be TFO?

    Responses:

    Here TTO should be TFO; the typo is corrected.

    1. Methods, section 7: Some of the text in this section should perhaps be moved to the results section?

    Responses:

    Each of the two paragraphs in Methods’ section 7 is quite large, and some contents in Supplementary Notes are also very relevant. Thus, moving them to the Results section could make the Results too lengthy and specific.

    1. Line 587: GTEx is built from samples of primarily European ancestry and has poor representation of African ancestry and negligible representation of Asian ancestry (see the GTEx v8 paper supplement). This means that it is basically impossible to find a non-European population-specific eQTL in GTEx, which in turn impacts these results.

    Responses:

    (1) Indeed, this is a serious issue of data analysis, and this issue cannot be solved until more Africans are sequenced.

    (2) Anyway, one can still find considerable African-specific eQTLs in GTEx, such as rs28540058 (with frequency of 0, 0, 0.13 in CEU, CHB, YRI) and rs58772997 (with frequency of 0, 0, 0.12 in CEU, CHB, YRI (see Supplementary Table12 and Supplementary Figure 22).

  2. eLife assessment

    This valuable study uses population and functional genomics to examine long non-coding RNAs (lncRNAs) in the context of human evolution. Computational prediction of human-specific lncRNAs and their DNA binding sites and analyses of these loci lead to the development of hypotheses regarding the potential roles of these genetic elements in human biology. The evidence supporting the conclusions is, however, still incomplete, as key details regarding the methodology and analyses are lacking.

  3. Reviewer #1 (Public Review):

    Summary
    While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

    I no longer have any concerns about the manuscript as the authors have addressed my comments in the first round of review.

  4. Reviewer #2 (Public Review):

    Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

    I thank the authors for their revisions to the manuscript; however, I find that the bulk of my comments have not been addressed to my satisfaction. As such, I am afraid I cannot say much more than what I said last time, emphasising some of my concerns with regards to the robustness of some of the analyses presented. I appreciate the new data generated to address some questions, but think it could be better incorporated into the text - not in the discussion, but in the results.

  5. Author Response

    Reviewer #1 (Public Review):

    Summary

    While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

    Strengths/weaknesses

    By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

    Comments

    1. Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.
    1. There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.
    1. The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

    (1) According to Figure 1A (according actually to Meyer et al., 2012, Prufer et al., 2017, and Prüfer et al., 2013), the phylogenetic distance from modern humans to Denisovan is shorter than the distance to Altai Neanderthal. However, also according to these studies, the branch of Denisovan is more remote to modern humans than Altai Neanderthal. Thus, it is not unreasonable to find that 2514 and 1256 DBSs have distances > 0.034 in genes in Denisovans and Altai Neanderthals, respectively. Probably, both the phylogenetic distances and DBS distances depend considerably on the sampled genomes of Altai and Denisovan who lived on the earth for quite long. When new samples are obtained, these distances may be somewhat changed.

    (2) Regarding “they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3”, the second type of distances were discussed in section 3, and the distances computed in the first way were not further analyzed because “This defect may be caused by that the human ancestor was built using six primates without archaic humans”.

    1. Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

    Indeed, the TFs-TFBSs and lncRNAs-DBSs relationships are comparable, and which one contains more QTLs is an interesting question. In this sense, it is reasonable to use TFBSs as the control. However, for three reasons, we did not perform the comparison and use TFBSs as the control. First, most TFBSs are predicted by varied methods, making us concern the reliability of comparing two sets of predictions. Second, most QTLs in DBSs are mQTLs but most QTLs in TFBSs are eQTLs. Third, probably a greater portion of TFBSs than DBSs are not in promoters, and the time consumption of LongTarget made us unable to predict DBSs truly genome-wide. Nevertheless, this is an interesting question deserving further exploring.

    1. In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

    Multiple sugar metabolism-related pathways, including “glucose homeostasis” and “glucose metabolic process”, are found to be enriched only in Altai Neanderthal but not in chimpanzees (Figure 2). Indeed, HS lncRNAs are across a much longer time frame than the transition to agriculture. However, given that apes and monkeys know picking the ripe, sugar-rich fruits at the right time and place, we conjecture that archaic humans as hunter-gatherer could effectively explore natural sugars.

    Reviewer #2 (Public Review):

    Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

    At this point, my suggestions are mostly focused on tightening and strengthening the methods; it is hard for me to predict the consequence of these changes on the results or their interpretation, but as a general rule I also encourage the authors to not over-interpret their conclusions in terms of what phenotype was selected for when as they do at certain points (eg glucose metabolism).

    I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

    1. Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

    Length is an important metric of DBS, but it has a defect – a triplex of 100 bp may have 50% or 70% of nucleotides bound; in the two situations, the binding affinity of DBD and DBS is very different.

    1. There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

    More details are described in the citation Wen et al. 2022. We will put the sites into Supplementary Tables in the revised version.

    It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

    (1) If, say, three transcripts of a gene share the same promoter region (i.e., they have the same TSS) but differ only in 3’UTR, the promoter region was used to predict DBSs just for once. Otherwise, if the three transcripts have different TSS, the three promoter regions were used to predict DBSs.

    (2) A gene may have many DBSs if it has many transcripts, or few ones if it has just a few transcripts. We did not correct for this uneven distribution of transcripts, because our GTEx analysis was on the transcript level; it is well recognized that transcripts of the same gene can be expressed in different tissues.

    (3) We randomly sampled a pair of non-HS lncRNA and a transcript for 10000 times (i.e., 10000 pairs). It is a point that multiple draws of 40 non-HS lncRNAs should be made to make the statistics more robust.

    1. Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

    The over-representation analysis using g:Profiler was performed taking the whole genome as the background. Analyzing more DBSs (especially weak DBSs) would generate more results, but the results could be less reliable. Thus, there is a trade-off between analyzing fewer DBSs with relatively high reliability and analyzing more DBSs with relatively low reliability. Inevitably, the handling of this trade-off is somewhat subjective, and to carefully compare the two classes of DBSs per can be an independent question. Although weak DBSs were not systematically analyzed, the results from the strong DBSs undoubtedly suggest that HS lncRNAs have contributed greatly to human evolution.

    Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

    We examined Tajima’s D in DBSs (Supplementary Figure 9) and in HS lncRNA genes (Supplementary Figure 18). In both cases, we compared the Tajima’s D values with the genome-wide background.

    1. There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same cutoff of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

    We used the same workflow (and the same cutoff 0.034) to analyze Vindija and Altai Neanderthal and Denisovan. If a smaller cutoff was used, one would see more Vindija genes. The question again is that there is a trade-off. Analyzing epigenome and epigenetic regulation in archaic genomes is an interesting direction, and much more studies are needed before more reasonably setting related parameters and cutoffs.

    1. Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.
  6. eLife assessment

    In this valuable manuscript, the authors attempt to examine the role of long non-coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are at times inadequate - for example, suitable methods and/or relevant controls are lacking at many points, and selection is inferred sometimes too quickly - the results nonetheless point towards a possible contribution of long non-coding RNAs to the evolution of human biology and they suggest clear directions for future, more rigorous study.

  7. Reviewer #1 (Public Review):

    Summary
    While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

    Strengths/weaknesses
    By and large, the analysis performed is dependent on their ability to identify HSlncRNAs and their DBS. I think that they have done a good job of showing the performance metrics of their methods in previous publications. Thereafter, they perform a series of enrichment-type analyses that have been used in the field for quite a while now to look at tissue-specific enrichment, or region-specific enrichment, or functional enrichment, and I think these have been carried out well. The authors achieved the aims of their work. I think one of the biggest contributions that this paper brings to the field is their annotation of these HSlncRNAs. Thus a major revisionary effort could be spent on applying their method to the latest genomes that have been released so that the community could get a clean annotation of newly identified HSlncRNAs (see comment 2).

    Comments

    1. Though some of their results about certain HSlncRNAs having DBSs in all genes is rather surprising/suspicious, I think that broadly their process to identify and validate DBSs is robust, they have multiple lines of checks to identify such regions, including functional validation. These predictions are bound to have some level of false positive/negative rate and it might be nice to restate those here and on what experiment/validation data these were conducted. However, the rest of their analysis comprises different types of enrichment analysis which shouldn't be affected by outlier HSlncRNAs if indeed their FPR/FNR are low.

    2. There are now several new genomes available as part of the Zoonomia consortium and 240 Primate consortium papers released. These papers have re-examined some annotations such as Human Accelerated Regions (HARs) and found with a larger dataset as well as better reference genomes, that a large fraction of HARs were actually incorrectly annotated - that is that they were also seen in other lineages outside of just the great apes. If these papers have not already examined HSlncRNAs, the authors should try and re-run the computational predictions with this updated set and then identify HSlncRNAs there. This might help to clarify their signal and remove lncRNAs that might be present in other primates but are somehow missing in the great apes. This might also help to mitigate some results that they see in section 3 of their paper in comparing DBS distances between archaics and humans.

    3. The differences between the archaic hominins in their DBS distances to modern humans are a bit concerning. At some level, we expect these to be roughly similar when examining African modern humans and perhaps the Denisovan being larger when examining Europeans and Asians, but they seem to have distances that aren't expected given the demography. In addition, from their text for section 3, they begin by stating that they are computing two types of distances but then I lost track of which distance they were discussing in paragraph 3 of section 3. Explicitly stating which of the two distances in the text would be helpful for the reader.

    4. Isn't the correct control to examine whether eQTLs are more enriched in HSlncRNA DBSs a set of transcription factor binding sites? I don't think using just promoter regions is a reasonable control here. This does not take away from the broader point however that eQTLs are found in DBSs and I think they can perform this alternate test.

    5. In the discussion, they highlight the evolution of sugar intake, which I'm not sure is appropriate. This comes not from GO enrichment but rather from a few genes that are found at the tail of their distribution. While these signals may be real, the evolution of traits is often highly polygenic and they don't see this signal in their functional enrichment. I suggest removing that line. Moreover, HSlncRNAs are ones that are unique across a much longer time frame than the transition to agriculture which is when sugar intake rose greatly. Thus, it's unlikely to see enrichment for something that arose in the past 6000-7000 years would in the annotation that is designed to detect human-chimp or human-neanderthal level divergence.

  8. Reviewer #2 (Public Review):

    Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lnc RNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

    At this point, my suggestions are mostly focused on tightening and strengthening the methods; it is hard for me to predict the consequence of these changes on the results or their interpretation, but as a general rule I also encourage the authors to not over-interpret their conclusions in terms of what phenotype was selected for when as they do at certain points (eg glucose metabolism).

    I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

    1. Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

    2. There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

    It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

    3. Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

    Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

    4. There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same cutoff of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

    5. Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.