Comparative analysis of sequential and thermodynamic features of pre-miRNA in insects with various organisms and applying XGBoost for one-vs-rest binary classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
MicroRNAs are found to regulate various biological processes which are produced from precursor microRNA. As the length of such microRNA are small, homology-based searching is not very useful. Hence, various machine learning based tools have been designed for prediction of such hairpin loops using various thermodynamic and sequential features. In this research, we discuss about the comparative statistical analysis of various features used the in development of machine learning based predictive tools. The sequence features of insect precursor microRNA were compared with precursor microRNA of other available organisms. We initially established that features such as Length, GC content, Minimum Free Energy (MFE) of folding, etc., differs in insects as compared to other organisms using Kolmogorov-Smirnov (KS) test. We further trained a predictive model for one-vs-rest binary classification using XGBoost between insects, human, monocots, aves, ruminants, sauria, dogs and rodents. We performed PCA and retained 14 principal components for classification using cumulative explained variance. Various parameters of XGBoost was tuned with 5-fold CV and the parameter values with highest CV score were considered. We used independent held-out data test the models. The accuracy of insect, monocots, rodents, human, ruminants, sauria, aves and dogs was found to be 0.8549, 0.8626, 0.6835, 0.7005, 0.8875, 0.6972, 0.7591 and 0.6588 respectively. This shows that ancestral lineage specific ML models can be developed for detection of precursor microRNA for different classes of organism.