Predicting RNA:DNA Triplex Structures from Sequence Features Using Deep Learning Architecture

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Long non-coding RNAs (lncRNAs) can perform their regulatory roles by forming triple helices through RNA–DNA interactions. Although this has been verified by a few in vivo and in vitro methods, robust in silico approaches that predict the potential of lncRNAs and DNA sites to form triplex structures are still required. Tools such as Triplexator have predicted vast numbers of lncRNAs and DNA sites with triplex forming potential, yet there remains a pressing need for advanced computational methods that can refine and extend these predictions. In this study, we developed ten (10) deep neural network models that predict the potential of lncRNAs and DNA sites to form triple helices on a genome-wide scale. To prepare our dataset, we first used Triplexator to screen out lncRNAs and DNA sites with low triplex-forming potential. We then trained different deep learning architectures, including two-layer convolutional neural networks (CNN), residual neural networks (ResNN), long short-term memory recurrent neural networks (LSTM-RNN), and multilayer perceptron (MLP). Among these architectures, our lncRNA_CNN and LSTM3-RNN both achieved a mean AUC of 0.99 for lncRNA features at a kernel size of 32 and a learning rate of 1e-3. For DNA site features, our DNA_CNN achieved the best performance with a mean AUC of 0.98 under the same conditions. In conclusion, we demonstrate that deep neural network architectures can effectively learn sequence features of lncRNAs and DNA to accurately predict RNA:DNA triplex formation potential, providing a scalable in silico framework for studying genome-wide triplex biology.

Article activity feed