From Nucleotides to Numbers: A Comprehensive Review of RNA Feature Extraction Methods for Computational Modelling

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Machine learning is a powerful approach for analysing RNA sequences, particularly for understanding the function and regulation of non-coding RNAs. A critical step in this process is feature extraction, which transforms biological sequences into numerical representations that allow computational models to capture and interpret complex biological patterns. Despite its central role, the field of RNA feature extraction remains broad and fragmented, with limited standardization and accessibility hindering consistent application. In this comprehensive review, we address the fragmentation of the field by systematically organizing over 25 feature extraction strategies into sequence- and structure-based approaches. We further conduct a comparative analysis highlighting how the choice of feature sets impacts model performance, reinforcing the importance of integrated feature engineering. To facilitate practical adoption, it also provides a curated list of publicly available tools and software packages. By consolidating methodologies and resources, this work seeks to improve reproducibility, scalability, and interpretability in machine learning-driven RNA research.

Article activity feed