Transposable Elements and piRNAs interaction prediction with Predictive Bi-Clustering Trees

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

PIWI-interacting RNAs (piRNAs) are a class of noncoding RNAs whose actions range from regulating gene expression to silencing Transposable Elements, characterized for being from 21 to 35 nucleotides long, displaying a uracil bias at the 5’ end, and a 2’-O-methylation at the 3’ end. Transposable Elements (TEs) are genetic elements that move within host genomes. TE replication can promote harmful recombination events by generating breaks in DNA double strands, in addition to interfering with expression. Silencing of these elements by piRNAs occurs in the germ line in most animals and is essential for maintaining genome integrity. In this work, the problem of in silico interaction prediction between piRNAs and TEs was addressed by a decision tree-based algorithm, namely Predictive Bi-Clustering Trees (PBCT). In order to improve the algorithm’s performance, the piRNA-TE interaction matrix was reconstructed using a Beta-distribution-rescored Neighborhood Regularized Logistic Matrix Factorization (NRLMFβ) algorithm. PBCT was tested in 5-fold and 10-fold cross-validation configurations, both with the original interaction matrix (BICT) and the interaction matrix reconstructed by NRLMFβ (BICTR). Although not being able to predict positive interactions satisfactorily given the huge dataset imbalance, advantages could be observed when using matrix factorization. Comparatively, in the BICT method, PBCT presented higher values of AUROC and AUPRC. However, in the BICTR method, PBCT was able to correctly predict more positive interactions, which are, in fact, the primary interest of this study. Potential biological applications and ways to improve the algorithm’s performance were also discussed.

Author summary

piRNAs and transposable elements are biomolecules that interact in the germ lime in most animals, such that piRNAs silence these elements to keep genome integrity. However, detecting which piRNA interacts with which TE is a laborious task with low results, given that the rules that govern these interactions still need to be fully elicited. In this paper, we addressed the interaction prediction pair piRNA-TE using a multi-label decision-tree-like algorithm called PBCT applied to in vivo known interactions. Given that it is a Positive-Unlabeled Learning problem, since we cannot be sure of a biological negative interaction, we reconstructed the interaction matrix employing an NRLMFβ algorithm. We compared the results given the original interaction matrix and the reconstructed matrix. The results with this algorithm and parameters could have been better, even though the reconstruction has proven fruitful. Further, we addressed our problem with other multi-label learning approaches and briefly compared them. We also discussed potential biological applications and ways to improve the algorithm’s performance.

Article activity feed