BC-Predict: Mining of signal biomarkers and multilevel validation of cascade classifier for early-stage breast cancer subtyping and prognosis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Disease heterogeneity is the hallmark of breast cancer, which remains a scourge and the most common malignancy among women. With a steep increase in breast cancer morbidity and mortality, there exists a critical need for effective early-stage theragnostic and prognostic biomarkers. This would help in patient stratification and optimal treatment selection towards better disease management. In this study, we examined four key problems with respect to the characterization of breast cancer heterogeneity, namely: (i) cancer screening; (ii) identification of metastatic cancers; (iii) molecular subtype (TNBC, HER2, or luminal); and (iv) histological subtype (ductal or lobular). We mined the available public-domain transcriptomic data of breast cancer patients from the TCGA and other databases using stage-encoded statistical models of gene expression, and identified stage-salient, monotonically expressed, and problem-specific biomarkers. Next we trained different classes of machine learning algorithms targeted at the above problems and embedded in these feature spaces. Hyperparameters specific to each algorithm were optimized using 10-fold cross-validation on the training dataset. The optimized models were evaluated on the holdout testset to identify the overall best model for each problem. The best model for each problem was validated with: (i) multi-omics data from the same cohort (miRNA and methylation profiles); (ii) external datasets from out-of-domain cohorts; and (iii) state of the art, including commercially available breast cancer panels. External validation of our models matched or bested available benchmarks in the respective problem domains (balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping). We have translated the results into BC-Predict, a freely available web-server that forks the best models developed for each problem, and provides the cascade annotation of input instance(s) of expression data, along with uncertainty estimates. BC-Predict is meant for academic use and has been deployed at: https://apalania.shinyapps.io/BC-Predict .

Article activity feed