Evolutionary remodeling of non-canonical ORF translation in mammals
Curation statements for this article:-
Curated by eLife
eLife Assessment
This study presents a large, systematically curated catalog of non-canonical open reading frames (ncORFs) in human and mouse through the reanalysis of nearly 400 Ribo-seq datasets using a standardized pipeline; the resulting atlas consolidates ncORF annotations across tissues and provides a valuable resource for investigating non-canonical translation and ORF emergence. The main conclusions are supported by consistent data processing and multiple computational measures of translation and conservation. While the pipeline is transparent and technically robust, some analytical criteria and dataset limitations could be described more explicitly, and several downstream conclusions would benefit from more cautious interpretation, some evolutionary inferences are primarily correlative; dataset heterogeneity, uneven tissue representation, and limited experimental validation also constrain the strength of a subset of the findings. Overall, the evidence is solid, and the resource is likely to be broadly beneficial to the community.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
- Evaluated articles (Arcadia Science)
Abstract
Non-canonical open reading frames (ncORFs) are pervasive within transcripts annotated as “non-coding” or “untranslated regions” of mRNAs, yet their landscape under normal physiological conditions remains to be fully resolved, particularly outside humans. Here we applied a stringent and standardized pipeline to hundreds of high-quality ribosome profiling libraries from normal mammalian tissues and cell types, identifying 11,623 human and 16,485 mouse ncORFs. Evolutionary analyses revealed that thousands of ncORFs are subject to coding constraint and exhibit lineage-specific conservation, underscoring their functional potential. Ancient ncORFs are preferentially highly translated, broadly expressed, and enriched for lineage-specific conservation. Co-expression patterns further indicate that many ncORFs, especially ancient ones, are cotranslated with canonical coding sequences, consistent with functions mediated through protein–protein interactions. Together, these findings establish a comprehensive atlas of mammalian ncORFs and provide fundamental insights into their evolutionary dynamics and functional integration within the proteome.
Article activity feed
-
-
-
eLife Assessment
This study presents a large, systematically curated catalog of non-canonical open reading frames (ncORFs) in human and mouse through the reanalysis of nearly 400 Ribo-seq datasets using a standardized pipeline; the resulting atlas consolidates ncORF annotations across tissues and provides a valuable resource for investigating non-canonical translation and ORF emergence. The main conclusions are supported by consistent data processing and multiple computational measures of translation and conservation. While the pipeline is transparent and technically robust, some analytical criteria and dataset limitations could be described more explicitly, and several downstream conclusions would benefit from more cautious interpretation, some evolutionary inferences are primarily correlative; dataset heterogeneity, uneven tissue …
eLife Assessment
This study presents a large, systematically curated catalog of non-canonical open reading frames (ncORFs) in human and mouse through the reanalysis of nearly 400 Ribo-seq datasets using a standardized pipeline; the resulting atlas consolidates ncORF annotations across tissues and provides a valuable resource for investigating non-canonical translation and ORF emergence. The main conclusions are supported by consistent data processing and multiple computational measures of translation and conservation. While the pipeline is transparent and technically robust, some analytical criteria and dataset limitations could be described more explicitly, and several downstream conclusions would benefit from more cautious interpretation, some evolutionary inferences are primarily correlative; dataset heterogeneity, uneven tissue representation, and limited experimental validation also constrain the strength of a subset of the findings. Overall, the evidence is solid, and the resource is likely to be broadly beneficial to the community.
-
Reviewer #1 (Public review):
This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.
Strengths:
In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such purpose. The catalog could facilitate future investigations into ncORF biology and broaden understanding …
Reviewer #1 (Public review):
This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.
Strengths:
In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such purpose. The catalog could facilitate future investigations into ncORF biology and broaden understanding of the coding potential of the "non-coding" genome.
Weaknesses:
Based on the ncORF catalog, some of the analyses were not properly done. Some of the results are descriptive.
(1) Bias and representations of data source. Public ribo-seq datasets are unevenly distributed across tissues and cell lines, raising concerns about heterogeneity and underrepresentation of certain contexts. This may limit the generalizability of the catalog.
(2) The discussion on modular domains of ncORFs is unclear, and the claim that they may originate via TE-related mechanisms is not well supported. Stronger evidence or clearer reasoning is needed.
(3) The conservation comparisons are not fully convincing. Figure S7 shows only mild differences between ncORFs and CDS, and statistical significance is not clearly demonstrated. Comparisons with other non-coding RNAs should be added, and overlapping sequences between ncORFs and CDS should be excluded to avoid bias.
(4) Figure 3 indicates that some ncORFs are subject to evolutionary constraints. This is not surprising. The authors should provide further analyses on more detailed features of these "conserved" ncORFs vs. the "non-conserved" ones. Some pretty informative works have been done in drosophila, worms, mouse, and human. Figure 3 suggests some ncORFs are under evolutionary constraint, but this is not unexpected. More granular analyses contrasting "conserved" versus "non-conserved" ncORFs would be informative. In fact, small ORFs, especially uORFs, have been extensively studied, for their functions and corss-species conservations. The authors should explicitly show what is new here in their analyses.
(5) Translation levels are reported using RPF counts. However, translation efficiency (normalized by RNA expression) is a more appropriate measure to account for expression heterogeneity.
(6) The correlation analyses between ncORF translation levels and PhyloCSF are confusing and largely descriptive. These sections need sharper framing and clearer conclusions.
(7) Public ribo-seq datasets, generated by different research labs, are known for their strong batch effects. Representations of tissues and cells are also very unbalanced. Therefore, the co-translation analysis between ncORFs and canonical CDS is not well controlled. This should be done by referring to a recent large-scale ribo-seq meta-analysis (Nat Biotechnol. 2025. doi: 10.1038/s41587-025-02718-5).
Comments on revisions:
The authors have made efforts to address most of the previous concerns, and several points have been clarified or improved in the revision. However, in a number of cases, the responses rely more on acknowledgment and reframing rather than substantive analytical strengthening. Overall, the manuscript is improved, particularly in terms of clarity, transparency, and positioning of claims. I support its publication and look forward to seeing how the field engages with and discusses these claims.
-
Reviewer #2 (Public review):
Summary:
Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.
Strengths:
(1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.
(2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.
Weaknesses:
(1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.
(2) Some analytical methods and standards were not clearly presented in the manuscript.
-
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.
Strengths:
In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such a …
Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.
Strengths:
In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such a purpose. The catalog could facilitate future investigations into ncORF biology and broaden understanding of the coding potential of the "non-coding" genome.
We thank the reviewer for the positive evaluation of our manuscript and for recognizing the significance of our contribution.
Weaknesses:
Based on the ncORF catalog, some of the analyses were not properly done. Some of the results are descriptive.
(1) Bias and representations of the data source. Public ribo-seq datasets are unevenly distributed across tissues and cell lines, raising concerns about heterogeneity and underrepresentation of certain contexts. This may limit the generalizability of the catalog.
We agree with the reviewer that the uneven distribution of public Ribo-seq datasets across tissues can inevitably introduce bias in the ncORF composition of our catalog. This bias is likely more pronounced in humans due to the narrower tissue coverage. We have addressed this point in the Discussion section of the revised manuscript.
(2) The discussion on modular domains of ncORFs is unclear, and the claim that they may originate via TErelated mechanisms is not well supported. Stronger evidence or clearer reasoning is needed.
We thank the reviewer for highlighting this point. We have revised the manuscript to more clearly explain the rationale behind our analysis of ncORF modular domains and have adopted more cautious language regarding their potential transposable element–related origins, limiting interpretations to what is directly supported by the data.
(3) The conservation comparisons are not fully convincing. Figure S7 shows only mild differences between ncORFs and CDS, and statistical significance is not clearly demonstrated.
Comparisons with other non-coding RNAs should be added, and overlapping sequences between ncORFs and CDS should be excluded to avoid bias.
We thank the reviewer for this comment and apologize for the lack of clarity in the original figure. Both CDSs and ncORFs show significant deviation from zero Gnocchi scores (two-sided Wilcoxon signed-rank tests), which is now stated explicitly in the revised legend and text. CDS-overlapping ncORFs were already excluded in the original analysis; this has been clarified to avoid confusion.
As suggested, we have added lncRNAs for comparison. ncORFs display modestly higher Gnocchi scores than lncRNAs, and this difference persists when restricting the analysis to lncRNA-derived ncORFs and their corresponding full-length lncRNAs (see revised Fig. S7). These additions strengthen the conservation comparison while controlling for transcript context.
(4) Figure 3 indicates that some ncORFs are subject to evolutionary constraints. This is not surprising. The authors should provide further analyses on more detailed features of these "conserved" ncORFs vs. the "non-conserved" ones. Some pretty informative works have been done in Drosophila, worms, mice, and humans. Figure 3 suggests some ncORFs are under evolutionary constraint, but this is not unexpected. More granular analyses contrasting "conserved" versus "non-conserved" ncORFs would be informative. In fact, small ORFs, especially uORFs, have been extensively studied for their functions and cross-species conservation. The authors should explicitly show what is new here in their analyses.
We thank the reviewer for this insightful comment. We agree that cross-species conservation of ncORFs (particularly uORFs) has been extensively investigated in prior studies, including our own.
However, most prior analyses have focused on conservation of start codons or overall ORF integrity, which does not distinguish selection acting on translational activity from selection acting on the encoded peptide sequence itself. In contrast, our analysis leverages codon-level periodic PhyloP signals across the full ORF. The observed three-nucleotide periodicity is consistent with selective constraint at the amino acid level, rather than merely preservation of initiation sites or translational potential. Furthermore, our newly developed branch-length statistic uncovers lineage-restricted conservation patterns among ncORFs, enabling resolution of evolutionary dynamics not captured by conventional conservation metrics.
Thus, while the existence of conserved ncORFs is not unexpected, the conceptual advance of our study lies in demonstrating that a subset exhibits coding-like evolutionary constraint consistent with selection on their peptide products, as well as revealing lineage-specific conservation patterns. We have clarified this distinction in the revised Discussion.
(5) Translation levels are reported using RPF counts. However, translation efficiency (normalized by RNA expression) is a more appropriate measure to account for expression heterogeneity.
We agree that translation efficiency (TE), which normalizes ribosome footprint counts by RNA abundance, is in principle an appropriate metric. We initially calculated TE and compared ncORFs with CDSs. However, we found that TE estimates for short ncORFs were substantially inflated by RPF enrichment near start and stop codons, leading to unstable and potentially misleading values.
For CDSs, this bias is commonly addressed by excluding the first and last 10 to 20 codons when quantifying RPF density. This strategy is not feasible for ncORFs because of their short length. We therefore used RPF counts in the final analysis, applying stringent positional filtering. Only RPFs whose P sites fall within the ORF body, excluding start and stop codons, were counted. RPFs overlapping the ORF but with P sites outside the annotated frame, likely derived from adjacent ORFs or initiation or termination pausing, were excluded.
TE and RPF counts both measure translation but capture different aspects. TE reflects ribosome density relative to transcript abundance, whereas RPF counts quantify overall ribosome engagement. Given the short lengths of ncORFs, count-based quantification provides a more robust and conservative estimate of their translational activity.
(6) The correlation analyses between ncORF translation levels and PhyloCSF are confusing and largely descriptive. These sections need sharper framing and clearer conclusions.
We thank the reviewer for this comment. We agree that the original presentation lacked clear framing. The relationship between PhyloCSF scores and mean ncORF translation levels across tissues is influenced by both evolutionary age and tissue specificity. Older ncORFs with higher coding potential tend to exhibit stronger tissue-restricted expression. As a result, their mean translation levels across all tissues appear lower, not because they are weakly translated, but because their translation is concentrated in specific tissues. This point is addressed in the revised manuscript.
(7) Public ribo-seq datasets, generated by different research labs, are known for their strong batch effects. Representations of tissues and cells are also very unbalanced. Therefore, the co-translation analysis between ncORFs and canonical CDS is not well controlled. This should be done by referring to a recent large-scale ribo-seq meta-analysis (Nat Biotechnol. 2025. doi: 10.1038/s41587-025-02718-5).
We thank the reviewer for highlighting this important study and for raising concerns regarding batch effects and tissue imbalance in public Ribo-seq datasets. We are aware that public Ribo-seq data generated by different laboratories are subject to substantial batch effects. During the ncORF annotation phase, we applied stringent quality-control criteria to minimize technical variability. For the co-translation analysis, inclusion criteria were relaxed to increase tissue and cell-type coverage. To partially mitigate representation bias, libraries derived from the same tissue or cell type were merged when quantifying ORF translation levels, thereby reducing overrepresentation from heavily sampled contexts.
Nevertheless, we acknowledge that these measures cannot completely eliminate batch effects or imbalance inherent to public datasets. We agree that co-translation analysis would benefit from uniformly processed, high-quality datasets generated under standardized protocols with balanced tissue representation, representing a valuable direction for future research.
Reviewer #2 (Public review):
Summary:
Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.
Strengths:
(1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.
(2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.
We thank the reviewer for the positive evaluation of our manuscript. It is encouraging to know that the analytical framework was found to be sound and appropriate.
Weaknesses:
(1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.
We thank the reviewer for this comment and acknowledge this limitation. We agree that functional validation through wet-lab experiments would provide important mechanistic insight into individual ncORFs. However, this study was designed as a systematic, genome-wide computational analysis to characterize translated ncORFs across species and tissues. Our objective was to define global patterns of translation, conservation, and structural features using large-scale datasets. Given the breadth and scale of these analyses, experimental validation of specific ncORFs falls beyond the scope of the current study. We have clarified this point in the dicussion and noted that our results provide a framework for future targeted experimental investigation.
(2) Regarding the evolution of non-canonical ORFs, a considerable amount of prior work already exists. The authors need to further clarify what new insights and discoveries they have made based on the analysis of such a large dataset.
We thank the reviewer for this suggestion. Similar concerns were also raised by Reviewer #1. In response, we have revised the Discussion to more clearly delineate the conceptual advances enabled by our large-scale dataset.
Recommendations for the authors:
Reviewing Editor Comments:
Several aspects of the downstream analyses would benefit from additional refinement. The heterogeneity and tissue imbalance inherent in public Ribo-seq datasets introduce potential biases in ncORF detection and inferences about co-translation. Given the breadth of the dataset, it would also be informative to quantify how consistently the newly identified ncORFs are detected across samples-distinguishing those observed broadly across tissues, those enriched in specific contexts, and those detected in only a few datasets. Such stratification would help differentiate reproducibly translated ORFs from candidates requiring further validation.
We thank the editor for the helpful comments. We agree that heterogeneity and tissue imbalance in public Ribo-seq datasets can influence ncORF detection and downstream interpretations. We have added discussion of this limitation in the revised manuscript.
Detection of ncORF translation depends not only on biological activity but also on sequencing depth and data quality. Although all ncORFs reported here were reproducibly identified by multiple methods across independent libraries, we agree that those detected in a larger number of datasets represent stronger candidates for functional validation. Accordingly, we now report the number of methods and libraries in which each ncORF was detected in the final catalog (Supplementary Table 3). Overall, 22.3–26.3% of ncORFs were detected in more than 10 libraries, whereas more than half were observed in only two to five libraries (Fig. S1B), enabling clearer stratification of broadly translated versus more context-specific candidates.
Some evolutionary and functional interpretations are largely descriptive or consistent with established findings for small ORFs, and the authors should more clearly articulate what is novel in their analyses. The criteria separating "young," "old," and "ancient" ORFs require clearer definition, and conservation analyses would be strengthened by improved statistical rigor and explicit exclusion of regions overlapping annotated coding sequences. Evidence for modular domain features or transposable element-related origins is limited and warrants either stronger support or more cautious framing. Proteomics validation is currently minimal and could be substantially reinforced using existing public MS resources.
We thank the reviewer for these constructive comments. In the revised manuscript, we more clearly delineate the novel insights derived from our evolutionary analyses of ncORFs, distinguishing them from established findings on small ORFs.
We have clarified the criteria used to classify ORFs by evolutionary age in figure 6E and refined the terminology describing “young,” “old,” and “ancient” categories to ensure precise definition. The conservation analyses have been strengthened through more rigorous statistical treatment and by explicitly excluding regions overlapping annotated coding sequences.
With respect to modular domain features and potential transposable element–related origins, we have adopted more cautious language and limited our interpretations to what is directly supported by the data. Finally, we acknowledge that current proteomic validation remains limited and have clarified this point in the manuscript while outlining the potential for future integration of large-scale public mass spectrometry datasets in Discussion.
The authors additionally report an interesting observation that many ncORFs on mRNA co-translate with the main CDS of the same gene. Because canonical models often posit that uORF translation suppresses downstream CDS translation, further analysis would be valuable. In particular, it would be useful to determine whether patterns of co-translation differ among ORF types or evolutionary categories and to discuss possible regulatory mechanisms underlying these relationships.
We thank the editor for this thoughtful comment. As noted in our response to Reviewer #2, uORF–CDS co-translation does not contradict the canonical model in which uORFs repress downstream CDS translation. Co-translation reflects concurrent ribosome occupancy, whereas repression concerns the fraction of initiating ribosomes that ultimately reach and translate the CDS. Following the editor’s suggestion, we further examined whether co-translation patterns differ across ORF types or evolutionary categories. We found that ncORFs co-translating with their corresponding main CDSs are predominantly uORFs. However, these uORFs do not show statistically significant differences in conservation metrics or evolutionary age compared with other non-overlapping uORFs. Thus, we did not detect clear subtype- or age-specific distinctions among co-translating ncORFs. We have clarified these analyses in the revised manuscript.
Addressing these points would enhance the precision, interpretability, and robustness of the study's conclusions.
Reviewer #2 (Recommendations for the authors):
(1) The authors developed and refined a standardized pipeline to analyze nearly 400 ribo-seq datasets, identifying over 10,000 novel non-canonical ORFs in both human and mouse samples. Given the scale of this analysis, it is intriguing to consider how many of the newly identified non-canonical ORFs are consistently detected across multiple sample types (conservatively expressed ORFs), how many are restricted to specific tissues/ or tissue-specific ORFs), and how many were detected in only a single or very few samples (ORFs requiring further validation). Providing these data could offer new insights into understanding ORF translation.
Thanks for this constructive suggestion. This information has been presented in the revised Supplementary Table 3 and in a newly added supplementary figure (Fig. S1B), which together provide a clearer overview of ncORF detection consistency and context specificity.
(2) The authors' validation of MS data lacks specific details in the paper. Regarding the MS-supported ORF mentioned in Lane 117, which dataset's MS data is being referenced? Or does it refer to the content in Reference 20? At present, substantial research exists in both public general proteomics studies (e.g., CPTAC) and MS investigations targeting non-canonical ORFs. We recommend the authors incorporate additional MS data or public MS-based databases to strengthen validation in this area (PMID: 34129944, 39794466, 37823596,39413795).
We thank the reviewer for this comment and for the helpful suggestions. The MS-supported ORFs mentioned in line 117 refer to the compilation reported in Reference 20, which integrates evidence from multiple independent proteomics studies. In addition, we examined MS-supported ORFs curated by GENCODE and PeptideAtlas, which are shown in Fig. 1E.
We agree that incorporating additional MS datasets would further strengthen validation of ncORFs. Studies cited by the reviewer and recent community efforts such as the GENCODE and PeptideAtlas analyses (PMID: 39314370) provide valuable examples in this direction. However, performing a comprehensive reanalysis of more than 95,000 public human MS runs is computationally demanding and currently infeasible for our group given resource and funding constraints.
To our knowledge, ongoing community-wide initiatives are working toward more comprehensive catalogs of translated human ncORFs. Large-scale, exhaustive MS searches will be particularly effective once a community consensus annotation framework for ncORFs is established. We have added discussion of these limitations and future directions in the revised manuscript.
(3) The authors classified ncORFs into three groups-"Ancient," "Young," and "Old"-based on their origin nodes. However, both the "Young" and 'Old' groups appear to be "mammalian-specific," yet the specific criteria for their division remain unclear. It is recommended to more clearly define in the figure legend or main text how "Young" and "Old" are categorized (e.g., based on specific evolutionary nodes or distance thresholds from nodes to the end) to avoid reader confusion.
In Fig. 5, “old” and “young” were intended as qualitative descriptors of relative evolutionary age based on the position of ncORF origination nodes along the phylogeny, as indicated on the x-axis. They were not meant to represent discrete categories. To avoid confusion, we have revised the manuscript to use “older” and “younger” throughout when referring to relative age differences. A binary classification is used only in Fig. 6E, where ncORFs are grouped into ancient (pre-mammalian) and younger (mammalian-specific) categories. This distinction is clearly defined in both the main text and the corresponding figure legend.
(4) The authors observed an intriguing phenomenon: ncORFs on mRNA tend to co-translate with the main CDS of the same gene. However, the conventional view holds that uORF translation often inhibits the translation of the main CDS. I suggest the authors could refine their analysis in this section further. For instance, do different types of ORFs or ORFs at different evolutionary levels exhibit distinct levels of cotranslation with the main CDS? Additionally, while observing this phenomenon, the authors should also propose hypotheses regarding the regulatory mechanisms involved in these processes.
We thank the reviewer for these constructive suggestions. After excluding CDS-overlapping ORFs, we identified 258 human and 128 mouse ncORFs that co-translate with their corresponding main CDSs. With the exception of 10 human dORFs, all remaining cases were uORFs. We compared these cotranslating ncORFs with other non-overlapping uORFs and dORFs but did not detect statistically significant differences in evolutionary age and conservation metrics. Because no clear distinguishing features emerged, we did not include these results in the manuscript.
Importantly, the observation of uORF–CDS co-translation does not contradict the established repressive role of uORFs. Co-translation reflects concurrent ribosome occupancy, whereas repression concerns the proportion of initiating ribosomes that ultimately translate the CDS. For example, if two ribosomes initiate within a given interval and one translates the uORF while one translates the CDS, CDS output is reduced by 50% relative to a uORF-free transcript. If four ribosomes initiate under the same repressive regime, two may translate the uORF and two the CDS. In this case, absolute translation of both ORFs increases, while the fractional repression remains unchanged. Thus, co-translation is compatible with a regulatory model in which uORFs reduce CDS translation efficiency without abolishing it. This has been clarified in the revised manuscript.
-
older ncORFs with positive coding potential tend to display increased tissue specificity
Is this trend pronounced in any tissues in particular, or is this random/across all tissue types?
-
-
eLife Assessment
This study presents a large, systematically curated catalog of non-canonical open reading frames (ncORFs) in human and mouse by reanalyzing nearly 400 Ribo-seq datasets using a standardized pipeline; the resulting atlas consolidates ncORF annotations across tissues and provides a valuable reference for understanding non-canonical translation and ORF emergence. The main conclusions are supported by consistent data processing and multiple computational measures of translation and conservation. While the pipeline is transparent and robust, several downstream analyses are descriptive, and some evolutionary interpretations remain correlative; dataset heterogeneity, uneven tissue representation, and limited experimental validation also constrain the strength of a subset of the findings. Overall, the evidence is solid, and the …
eLife Assessment
This study presents a large, systematically curated catalog of non-canonical open reading frames (ncORFs) in human and mouse by reanalyzing nearly 400 Ribo-seq datasets using a standardized pipeline; the resulting atlas consolidates ncORF annotations across tissues and provides a valuable reference for understanding non-canonical translation and ORF emergence. The main conclusions are supported by consistent data processing and multiple computational measures of translation and conservation. While the pipeline is transparent and robust, several downstream analyses are descriptive, and some evolutionary interpretations remain correlative; dataset heterogeneity, uneven tissue representation, and limited experimental validation also constrain the strength of a subset of the findings. Overall, the evidence is solid, and the resource will be broadly used by the community.
-
Reviewer #1 (Public review):
This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.
Strengths:
In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such a purpose. The catalog could facilitate future investigations into ncORF biology and broaden …
Reviewer #1 (Public review):
This work compiles a comprehensive atlas of ncORFs across mammalian tissues and cell types, derived from reanalysis of ~400 public ribosome profiling datasets. The authors then evaluate cross-species conservation and functional signatures, proposing that evolutionarily ancient ncORFs tend to have higher translation potential, stronger expression, and closer relationships with canonical coding sequences.
Strengths:
In general, the study provides a large-scale and timely resource of annotated ncORFs, which could be broadly useful for the community. The authors collected ~400 public ribosome profiling datasets for annotations of ncORFs, which, to my best knowledge, is the largest collection of data for such a purpose. The catalog could facilitate future investigations into ncORF biology and broaden understanding of the coding potential of the "non-coding" genome.
Weaknesses:
Based on the ncORF catalog, some of the analyses were not properly done. Some of the results are descriptive.
(1) Bias and representations of the data source. Public ribo-seq datasets are unevenly distributed across tissues and cell lines, raising concerns about heterogeneity and underrepresentation of certain contexts. This may limit the generalizability of the catalog.
(2) The discussion on modular domains of ncORFs is unclear, and the claim that they may originate via TE-related mechanisms is not well supported. Stronger evidence or clearer reasoning is needed.
(3) The conservation comparisons are not fully convincing. Figure S7 shows only mild differences between ncORFs and CDS, and statistical significance is not clearly demonstrated.
Comparisons with other non-coding RNAs should be added, and overlapping sequences between ncORFs and CDS should be excluded to avoid bias.(4) Figure 3 indicates that some ncORFs are subject to evolutionary constraints. This is not surprising. The authors should provide further analyses on more detailed features of these "conserved" ncORFs vs. the "non-conserved" ones. Some pretty informative works have been done in Drosophila, worms, mice, and humans. Figure 3 suggests some ncORFs are under evolutionary constraint, but this is not unexpected. More granular analyses contrasting "conserved" versus "non-conserved" ncORFs would be informative. In fact, small ORFs, especially uORFs, have been extensively studied for their functions and cross-species conservation. The authors should explicitly show what is new here in their analyses.
(5) Translation levels are reported using RPF counts. However, translation efficiency (normalized by RNA expression) is a more appropriate measure to account for expression heterogeneity.
(6) The correlation analyses between ncORF translation levels and PhyloCSF are confusing and largely descriptive. These sections need sharper framing and clearer conclusions.
(7) Public ribo-seq datasets, generated by different research labs, are known for their strong batch effects. Representations of tissues and cells are also very unbalanced. Therefore, the co-translation analysis between ncORFs and canonical CDS is not well controlled. This should be done by referring to a recent large-scale ribo-seq meta-analysis (Nat Biotechnol. 2025. doi: 10.1038/s41587-025-02718-5).
-
Reviewer #2 (Public review):
Summary:
Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.
Strengths:
(1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.
(2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.
Weaknesses:
(1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.
(2) Regarding the evolution of non-canonical ORFs, a considerable amount of prior work already exists. The authors need to further clarify what new insights and discoveries they have made based on the …
Reviewer #2 (Public review):
Summary:
Chang et al. attempted to analyze a large number of ribo-seq datasets through a standardized pipeline, identifying novel non-canonical ORFs and elucidating their evolutionary and expression characteristics.
Strengths:
(1) The datasets analyzed by the authors are sufficiently comprehensive, and the use of standardized pipelines ensures excellent analytical consistency.
(2) Their analyses of ORF evolution and co-expression further deepen our understanding of these ORFs.
Weaknesses:
(1) The authors primarily conducted analyses through bioinformatics, lacking sufficient wet-lab experimental evidence.
(2) Regarding the evolution of non-canonical ORFs, a considerable amount of prior work already exists. The authors need to further clarify what new insights and discoveries they have made based on the analysis of such a large dataset.
-