Article activity feed

  1. Was this evaluation helpful?
  2. What were your criteria for positive protein/ORF-product identification by MS? Where applicable, were utORFs identified by more than one unique peptide? Each additional unique peptide could add a lot of confidence to utORF discovery.

    Was this evaluation helpful?
  3. Did you look into any epigenetic modification states (modENCODE) in these areas? Any predictions that they might be regulatory regions or do they seem epigenetically transcriptionally active? Similarly, was there any SNP/variation data available for these regions (that might be a human genome biased question though!)?

    Was this evaluation helpful?
  4. I am curious if you ever saw any predicted structures that were indicative of future functional domains - such as transmembrane/DNA binding/etc. Might be useful to predict future function?

    Was this evaluation helpful?
  5. This isn't critical but curious how many N-terminally modified peptides were observed overall and how that compares to the overall number of overall PSMs observed without this variable modification?

    Was this evaluation helpful?
  6. Was this evaluation helpful?
  7. Was this evaluation helpful?
  8. Was this evaluation helpful?
  9. Evaluation Summary:

    By integrating in silico predictions and mass-spectrometry, this manuscript tackles the problem of annotating the currently nameless stretches of genomic sequence that actually code for proteins. The hundreds of protein coding fruit fly genes described here offer new inroads for studying some of the very youngest functional elements in genomes, particularly those that have recently emerged from non-coding DNA sequences. To clarify the biological significance of the present study, the authors should both highlight the genes mostly like to encode functional products and conduct a comparison to published datasets that used different methods to identify such genes in fruit flies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    Was this evaluation helpful?
  10. Reviewer #1 (Public Review):

    In this study, Zheng and Zhao identified the unannotated open reading frames (ORFs) in Drosophila, termed utORF, mainly based on proteomics datasets. The authors extended their analyses to the birth and the evolutionary heterogeneity of utORF. These analyses uncovered several types of utORFs that bear different feature, including transcription, age, distribution, and evolutionary conservation.

    The origin of de novo protein-coding genes is interesting. The authors' attempts to uncover utORFs from proteomics datasets are much appreciated, but crucial cross-validation is missing. Given a high potential of false positives in MS datasets, it is difficult to evaluate the evolutionary aspects of the identified ORFs. Some experimental validation is needed to confirm the translational potential of utORFs with or without start codons.

    Was this evaluation helpful?
  11. Reviewer #2 (Public Review):

    Zhang & Zhao developed an advanced approach to recombining the full-reading-frame search with the ms-based translation evidence for evolutionary new genes. Several hundreds of previously unannotated but clearly translated genes were identified and dated for their origination. Their properties in genome, transcription, structure, and ages were characterized. These findings with the advent of technical development are a significant addition to the literature of evolutionary new genes. In addition, this study pointed out the insufficiency of present-day gene annotation in Drosophila genomes, a widely influencing issue to the Drosophila community that this manuscript should have emphasized.

    Was this evaluation helpful?
  12. Reviewer #3 (Public Review):

    The goal of this work is to understand the role that previously neglected, unannotated ORFs play in the evolution of gene novelty in the Drosophila melanogaster lineage. These are ORFs that mostly code for small proteins, most of them having noncanonical start codons. The authors sought to identify translated ORFs using published MS proteomics datasets, making sure to achieve a balance between false positives and false negatives; they succeed rather convincingly. They then focused on when these ORFs first appeared and how they evolved, mainly aiming to understand whether some of them have emerged de novo and the evolutionary trajectories that they have taken.

    The major strengths of the manuscript lie in its scope, as it takes advantage of recently published data to exhaustively search the entire ORF catalogue of D. melanogaster for translation, in the application of rigorous methodologies for the identification of MS-supported ORFs and in the inference of the phylogenetic age of the ORF using a novel synteny-based approach. About this last point, however, I feel that some methodological details are missing. I understand that the genomic MSA of the D. melanogaster ORF and its orthologous region is extracted and that a search for the optimally aligning segment in the sequence of each species is conducted. Does that search include only ORFs in each orthologous region? I assume this is the case because the similarity cut-off of 2.5 is then calculated from protein alignments. If that is the case, why not use global alignments of entire ORFs? Furthermore, why is there no gap penalty used? Finally, I cannot see where the genomic similarity scoring part detailed in the methods is used, which adds to my confusion.

    Albeit not a major one, an additional weakness comes from the use of Latent Class Analysis to identify subpopulations of ORFs within the greater set, and examine their differences. I see why the authors did it and in theory, I have no objection, but given the small number of factors (8 if I'm counting correctly), it's unclear if it's worth the added level of complexity. Plus there's some potential bias involved since it requires binning continuous variables and hence defining bins. It seems to me that the authors could have achieved more or less the same by looking for specific subgroups based on criteria that they set themselves a priori.

    A crucial part of the work is the attribution of de novo origin to utORFs. Here, I find the initial analysis, wherein a single outgroup species is sufficient to invoke de novo origination, relatively unnecessary. Especially since the authors go on to state themselves that only two or more supporting outgroups can provide convincing evidence. I would add that at least two of the outgroups should be non-monophyletic. It is also unclear why an ORF needs to be present in the outgroups at all (and lacking significant similarity). Is there a limit to how small that ORF can be? If so, and if there happens to be no such ORF in a region, why would that not count as evidence?

    I feel that the authors achieve most of their aims, at least the ones that I perceive as the most important.
    There are however some findings that are not sufficiently well supported.

    Was this evaluation helpful?
  13. Was this evaluation helpful?
  14. Was this evaluation helpful?
  15. Was this evaluation helpful?
  16. Was this evaluation helpful?