Unraveling the influences of sequence and position on yeast uORF activity using massively parallel reporter systems and machine learning

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    Protein abundance is the result of many layers of regulation, including at the levels of transcription, mRNA stability, translation and protein degradation. Many transcripts contain short upstream ORFs (uORFS), but their effects on the translation of the main ORFs are difficult to predict as they are sometimes negative, positive and of different magnitudes. Here, the authors identify features of uORFs using massively parallel reporter assays, and these features help predict uORF effects on translation of main ORFs. The results will be an important resource for the community of researchers using this model organism and for the molecular and cell biology community in general as they allow to better understand how genes are regulated. There are also areas in which the authors' claims or conclusions are not fully justified and require either additional statistical analysis or new experimentation.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Upstream open-reading frames (uORFs) are potent cis -acting regulators of mRNA translation and nonsense-mediated decay (NMD). While both AUG- and non-AUG initiated uORFs are ubiquitous in ribosome profiling studies, few uORFs have been experimentally tested. Consequently, the relative influences of sequence, structural, and positional features on uORF activity have not been determined. We quantified thousands of yeast uORFs using massively parallel reporter assays in wildtype and ∆ upf1 yeast. While nearly all AUG uORFs were robust repressors, most non-AUG uORFs had relatively weak impacts on expression. Machine learning regression modeling revealed that both uORF sequences and locations within transcript leaders predict their effect on gene expression. Indeed, alternative transcription start sites highly influenced uORF activity. These results define the scope of natural uORF activity, identify features associated with translational repression and NMD, and suggest that the locations of uORFs in transcript leaders are nearly as predictive as uORF sequences.

Article activity feed

  1. eLife assessment

    Protein abundance is the result of many layers of regulation, including at the levels of transcription, mRNA stability, translation and protein degradation. Many transcripts contain short upstream ORFs (uORFS), but their effects on the translation of the main ORFs are difficult to predict as they are sometimes negative, positive and of different magnitudes. Here, the authors identify features of uORFs using massively parallel reporter assays, and these features help predict uORF effects on translation of main ORFs. The results will be an important resource for the community of researchers using this model organism and for the molecular and cell biology community in general as they allow to better understand how genes are regulated. There are also areas in which the authors' claims or conclusions are not fully justified and require either additional statistical analysis or new experimentation.

  2. Reviewer #1 (Public Review):

    The authors' objectives were to identify the features of uORFs that determine their effects on the translation of the main ORF found in the same transcript. The major strengths of the paper are the creative and powerful experimental platforms used to measure translation, the computational approaches used to identify the key features that determine the effect of uORFs on translation and the comparative analysis of two closely related species to understand how uORF activity evolves. The authors successfully and convincingly identify features associated with the regulatory effects of uORFs and have results suggesting that uORFs that would have strong repressive effects would be selected against. Although these insights regarding evolution are very interesting and may contribute to our understanding of regulatory evolution, at a level that is rarely explored, this section could benefit from additional analyses of existing data to fully support the conclusions. Another aspect that would need to be considered is the possible interaction between the uORFs and the main ORFs. Here, all experiments are performed with the same main ORFs (YFP) for practical and essential reasons, but it would be useful to know whether some uORF features would have effects whose sign and magnitude may depend on which main ORFs they associate with. Overall, there are several areas in which the authors' claims or conclusions are not fully justified and require either additional statistical analysis or new experimentation.

  3. Reviewer #2 (Public Review):

    This report uses massively parallel reporter assays to examine the impact on gene expression of >2000 uORFs found in yeast mRNAs with 5'UTR lengths <181nt, by comparing expression of two YFP reporters for each uORF, one containing the WT 5'UTR and the other with the uORF AUG codon mutated to a near-cognate AAG triplet. All of the mRNAs were expressed from the same promoter from the ENO2 gene, which is expected to produce the predicted 5' ends for all of the mRNAs being sampled. The results indicated that most AUG uORFs are repressive, while most nonAUG (near-cognate) uORFs have little effect on reporter expression; and a small fraction of AUG uORFs are stimulatory to YFP expression. They corroborated these results by sequencing the reporter library mRNAs in polysome vs monosome fractions and showing a good correlation (R=0.78) between the effects of the uORF AUG mutations on YFP expression versus fraction of the mRNA in polysomes. The reporter library was assayed in in both WT and upf1 mutants to evaluate the impact of NMD on uORF regulation of reporter expression and polysome association, which allowed them to determine that, on average, NMD accounts for ~35% of the uORF-mediated repression of reporter expression, ie. the magnitude of the repression is blunted in the upf1 mutant. Consistent with this, the reductions in YFP expression are frequently associated with reductions in reporter mRNA levels, measured by RNA-seq. Moreover, the repressive effects of the uORFs calculated from YFP expression versus polysome association of reporter mRNAs are more congruent in the upf1 mutant where NMD effects are absent versus the WT. Their bioinformatic analyses provide some evidence that NMD control is lessened by inefficient termination at uORFs with UGAC stop codons, for long vs. short uORFs, and by decreasing the distance of the uORF stop codon from the mRNA cap. Their large dataset allowed them to conduct machine learning to identify features of uORFs that are associated with their effects on YFP expression, finding that repression by the uORF is associated about equally with a good Kozak context for the start codon, a shorter distance of the uORF from the cap, and shorter distance of the uORF stop codon to the downstream CDS, with a somewhat weaker association with a longer uORF CDS. These findings for Kozak context were predictable from prior work, as were the associations with uORF length and distance to the YFP AUG in the context of known effects of these parameters on reinitiation. However, the association with distance of the uORF from the cap is more novel. They provide some additional support for the latter by analyzing the influence of different TSSs/5'UTR lengths on uORF repressive function for a subset of 333 uORFs, finding that the repressive effect can vary depending on the TSS, with several instances in which the uORF was less inhibitory when the TSS is located further upstream from the uORF AUG. Finally, they provide some evidence that uORFs conserved between closely related yeast species are generally less repressive and have poorer AUG contexts, leading to the conclusion that they are under purifying selection to make them less inhibitory.

    This study is valuable in providing an unprecedented, comprehensive analysis of the regulatory effects of naturally occurring AUG and near-cognate uORFs on gene expression in a manner that distinguishes between repression of translation versus repression of mRNA stability via NMD. Owing to the large number of uORFs analyzed in a system that eliminates variations in transcription rate, it was possible to identify certain statistically significant associations between uORF features and the extent to which they repress translation or evoke NMD.

    There are several areas in which the authors' claims or conclusions are not fully justified and require either additional statistical analysis or new experimentation to support the claims. In particular, additional experiments are needed to confirm that the reporter mRNAs initiate at the predicted TSS; to bolster the novel conclusion that moving a uORF farther from the cap reduces its inhibitory effect on translation initiation downstream, independently of the inclusion of other uORFs in the intervening interval; and to test their interpretations concerning the differences in uORF function between S. cerevisiae and S. paradoxus for particular mRNAs.