RNAseq-Based Machine Learning Models for Prognostication of Multiple Myeloma
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Multiple myeloma (MM) is characterized by abnormal plasma cell proliferation in the bone marrow, leading to symptoms like osteolytic lesions, anemia, hypercalcemia, and elevated serum creatinine. RNA-sequencing-based prognostic indicators for MM have shown promise in stratifying risk and assessing first-line treatment options. This study uses machine learning techniques and leverages RNA-sequencing, clinical, and biochemical data from the Multiple Myeloma Research Foundation (MMRF) CoMMpass cohort to predict patient prognosis.
Methods
RNAseq data of 60,623 genes from bone marrow samples of 708 MM patients were pre-processed for batch effect correction and split into training (70%) and testing (30%) sets. Feature selection involved MAD, mRMR, and iterative permutation importance filtering for predicting PFS and OS. Machine learning survival models like Random Survival Forest (RSF), Gradient Boosted (GB), and Component-wise Gradient Boosted (CGB) were developed and optimized. Performance was evaluated using C-index and integrated Brier score (IBS).
Results
The RSF and GB models showed the highest performance for predicting progression-free survival (PFS) and overall survival (OS) on the testing dataset. Significant features for PFS included stem cell transplant status, serum β2-microglobulin levels, germline mutational status, and expression of C12orf75 and ENSG00000256006. For OS, stem cell transplant status, age, serum β2-microglobulin levels, germline mutational status, and expression of NUTM2B-AS1 and ENSG00000287022 were prominent. Gene ontology analyses confirmed the biological relevance of enriched pathways related to cell division, protein localization, and cancer.
Conclusion
Integrating RNAseq and clinical data with advanced machine learning models presents a robust approach for predicting MM prognosis, highlighting gene expression programs, germline mutational status, and clinical markers as significant features. Future research should focus on independent validation to confirm findings and explore additional genomic data for enhanced prognostication.