POCALI: Prediction and insight On CAncer LncRNAs by Integrating multi-omics data with machine learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Long non-coding RNAs (lncRNAs) are receiving increasing attention as cancer markers for cancer diagnosis and treatment. Although there are many computational methods to identify cancer lncRNAs, they do not comprehensively integrate multi-omics features for predictions or systematically evaluate the contribution of each class of omics to the multifaced landscape of cancer lncRNAs. In this study, we developed an algorithm, POCALI, to identify cancer lncRNAs by integrating 44 omics features across six categories. We explored the contributions of different omics to identifying cancer lncRNAs and, more specifically, how each feature contributes to a single prediction. We also evaluated our model and benchmarked POCALI with existing methods. Finally, we validated the cancer phenotype and genomics characteristics of the novel cancer lncRNAs that were predicted. POCALI identified secondary structure and gene expression-related features as strong predictors of cancer lncRNAs, and epigenomic features as moderate predictors. POCALI performed better than other methods, especially in terms of sensitivity, and predicted more candidates. Novel POCALI-predicted cancer lncRNAs had strong relationships with cancer phenotypes, similar to known cancer lncRNAs. Overall, this study facilitated the identification of previously undetected cancer lncRNAs and the comprehensive exploration of the multifaceted feature contributions to cancer lncRNA prediction.