TISCalling: Leveraging Machine Learning to Identify Translational Initiation Sites in Plants and Viruses

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The recognition of translational initiation sites (TISs) offers complementary insights into identifying genes encoding novel proteins or small peptides. Conventional computational methods primarily identify Ribo-seq-supported TISs and lack the capacity of systematical and global identification of TIS, especially for non-AUG sites in plants. Additionally, these methods are often unsuitable for evaluating the importance of mRNA sequence features for TIS determination. In this study, we present TISCalling , a robust framework that combines machine learning (ML) models and statistical analysis to identify and rank novel TISs across eukaryotes. TISCalling generalized and ranks important features common to multiple plant and mammalian species while identifying kingdom-specific features such as mRNA secondary structures and G-contents. Furthermore, TISCalling achieved high predictive power for identifying novel viral TISs. Importantly, TISCalling provides prediction scores for putative TIS along plant transcripts, enabling prioritization of those of interest for further validation. We offer TISCalling as a command-line-based package [ https://github.com/yenmr/TIScalling ], capable of generating prediction models and identifying key sequence features. Additionally, we provide web tools [ https://predict.southerngenomics.org/TIScalling ] for visualizing pre-computed potential TISs, making it accessible to users without programming experience. The TISCalling framework offers a sequence-aware and interpretable approach for decoding genome sequences and exploring functional proteins in plants and viruses.

Article activity feed