The utility of single-cell RNA sequencing data in predicting plant metabolic pathway genes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

  • It is an ever challenging task to make genome-wide predictions for plant metabolic pathway genes (MPGs) encoding enzymes that catalyze the biosynthesis of plant natural products.

  • Here, starting from 1,130 benchmark MPGs that have experimental evidence in Arabidopsis thaliana , we investigate the utilities of single-cell RNA sequencing (scRNA-seq) data—a recently arisen omics data that has been used in several other fields—in predicting MPGs using four machine learning (ML) algorithms that support multi-label tasks.

  • Compared with traditional bulk RNA-seq data, scRNA-seq data lead to different and tighter co-expression networks among MPGs within metabolic classes, but relatively lower prediction accuracy of MPGs into classes. Splitting the scRNA-seq data into tissue-specific subsets can improve the gene co-expression network tightness and prediction accuracy of MPGs for some classes. Expression features from the same tissue types in bulk RNA-seq and scRNA-seq data have different contributions to the prediction of MPGs into classes. Models built using the ensemble algorithm AutoGluon outperforms those using other three classical ML algorithms.

  • Our results demonstrate the usefulness and characteristics of scRNA-seq data in predicting MPGs into metabolic classes, and propose that more effort is needed in the future to improve the model prediction performance.

Article activity feed