The utility of single-cell RNA sequencing data in predicting plant metabolic pathway genes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
-
It is an ever challenging task to make genome-wide predictions for plant metabolic pathway genes (MPGs) encoding enzymes that catalyze the biosynthesis of plant natural products.
-
Here, starting from 1,130 benchmark MPGs that have experimental evidence in Arabidopsis thaliana , we investigate the utilities of single-cell RNA sequencing (scRNA-seq) data—a recently arisen omics data that has been used in several other fields—in predicting MPGs using four machine learning (ML) algorithms that support multi-label tasks.
-
Compared with traditional bulk RNA-seq data, scRNA-seq data lead to different and tighter co-expression networks among MPGs within metabolic classes, but relatively lower prediction accuracy of MPGs into classes. Splitting the scRNA-seq data into tissue-specific subsets can improve the gene co-expression network tightness and prediction accuracy of MPGs for some classes. Expression features from the same tissue types in bulk RNA-seq and scRNA-seq data have different contributions to the prediction of MPGs into classes. Models built using the ensemble algorithm AutoGluon outperforms those using other three classical ML algorithms.
-
Our results demonstrate the usefulness and characteristics of scRNA-seq data in predicting MPGs into metabolic classes, and propose that more effort is needed in the future to improve the model prediction performance.