Comprehensive machine-learning survival framework develops a consensus model in large-scale multicenter cohorts for pancreatic cancer
Curation statements for this article:-
Curated by eLife
Evaluation Summary:
This work sets out to develop a better machine learning-based predictor of survival/prognosis for patients diagnosed with pancreatic cancer, by developing a large combinatorial family of machine learning methods based on a high-dimensional set of -omics and other patient data features; using ten publicly available data sets. A reduced set of features (giving rise to a signature called AIDPS that involves 9 genes) was identified. Unfortunately, the authors used all ten data sets both in the discover stage and in the validation stage of their study. There was also a large mismatch between the initial number of covariates (15,288 genes) and the number of samples (n=1280). The combinatorial ensemble of ML models makes for an unwieldy methodology that is difficult to interpret or duplicate.
(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (eLife)
- Computational and Systems Biology (eLife)
Abstract
As the most aggressive tumor, the outcome of pancreatic cancer (PACA) has not improved observably over the last decade. Anatomy-based TNM staging does not exactly identify treatment-sensitive patients, and an ideal biomarker is urgently needed for precision medicine. Based on expression files of 1280 patients from 10 multicenter cohorts, we screened 32 consensus prognostic genes. Ten machine-learning algorithms were transformed into 76 combinations, of which we selected the optimal algorithm to construct an artificial intelligence-derived prognostic signature (AIDPS) according to the average C-index in the nine testing cohorts. The results of the training cohort, nine testing cohorts, Meta-Cohort, and three external validation cohorts (290 patients) consistently indicated that AIDPS could accurately predict the prognosis of PACA. After incorporating several vital clinicopathological features and 86 published signatures, AIDPS exhibited robust and dramatically superior predictive capability. Moreover, in other prevalent digestive system tumors, the nine-gene AIDPS could still accurately stratify the prognosis. Of note, our AIDPS had important clinical implications for PACA, and patients with low AIDPS owned a dismal prognosis, higher genomic alterations, and denser immune cell infiltrates as well as were more sensitive to immunotherapy. Meanwhile, the high AIDPS group possessed observably prolonged survival, and panobinostat may be a potential agent for patients with high AIDPS. Overall, our study provides an attractive tool to further guide the clinical management and individualized treatment of PACA.
Article activity feed
-
-
Evaluation Summary:
This work sets out to develop a better machine learning-based predictor of survival/prognosis for patients diagnosed with pancreatic cancer, by developing a large combinatorial family of machine learning methods based on a high-dimensional set of -omics and other patient data features; using ten publicly available data sets. A reduced set of features (giving rise to a signature called AIDPS that involves 9 genes) was identified. Unfortunately, the authors used all ten data sets both in the discover stage and in the validation stage of their study. There was also a large mismatch between the initial number of covariates (15,288 genes) and the number of samples (n=1280). The combinatorial ensemble of ML models makes for an unwieldy methodology that is difficult to interpret or duplicate.
(This preprint has been …
Evaluation Summary:
This work sets out to develop a better machine learning-based predictor of survival/prognosis for patients diagnosed with pancreatic cancer, by developing a large combinatorial family of machine learning methods based on a high-dimensional set of -omics and other patient data features; using ten publicly available data sets. A reduced set of features (giving rise to a signature called AIDPS that involves 9 genes) was identified. Unfortunately, the authors used all ten data sets both in the discover stage and in the validation stage of their study. There was also a large mismatch between the initial number of covariates (15,288 genes) and the number of samples (n=1280). The combinatorial ensemble of ML models makes for an unwieldy methodology that is difficult to interpret or duplicate.
(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)
-
Reviewer #1 (Public Review):
In this work, the authors sought to develop a better predictor of prognosis/survival for pancreatic cancer patients. They did this by beginning with a long list of measured quantities in patients, and feeding these features into a combinatorial family of many different multi-part machine learning models built from classic ML tooling in order by studying agreement between these models to identify a small subset of features (called the AIDPS) that could serve as a 9-gene biomarker with companion ML.
The major strengths of this paper are that it is clear about its goal in potential clinical impact, it does identify a biomarker set that shows improved accuracy for predicting prognosis in a way that may be useful for future stratification. The paper also carries out various follow-up analyses to characterize …
Reviewer #1 (Public Review):
In this work, the authors sought to develop a better predictor of prognosis/survival for pancreatic cancer patients. They did this by beginning with a long list of measured quantities in patients, and feeding these features into a combinatorial family of many different multi-part machine learning models built from classic ML tooling in order by studying agreement between these models to identify a small subset of features (called the AIDPS) that could serve as a 9-gene biomarker with companion ML.
The major strengths of this paper are that it is clear about its goal in potential clinical impact, it does identify a biomarker set that shows improved accuracy for predicting prognosis in a way that may be useful for future stratification. The paper also carries out various follow-up analyses to characterize possible implications for the AIDPS signal in different biological, immunological, and clinical terms.
The major weaknesses of the paper:
The authors do not make very clear why this giant combinatorial family of ML models was generated. It's clear by implication that this is an approach to the challenge of feature selection. But if the problem here is feature selection, why did the authors take this approach to feature selection? Is it known that this should be a particularly effective one? Is it a particularly easy one to implement (it does not seem to be from the outside)? On the one hand, this isn't a machine learning paper, so it might feel not required to prove that a different method of feature selection couldn't as easily identify the genes that are important for this predictive problem, but the truth is that if this is not a machine learning paper, then in a sense the important thing is that the 9 genes in the AIDPS have been identified, and now what we need is to show this makes for a better predictor and try to focus on the interpretation of these features and their relationships. If that is the case, this paper takes a sidelong approach to that task, because of the identities of the genes or their success as features in a variety of ML approaches that assume those features should be focused (such as by using neural network architectures designed for greater interpretability)
The impact on the field of this work is currently unclear. The combinatorial ensemble of ML models makes for an unwieldy methodology that is difficult to interpret or duplicate, so if this is a methods paper focused on how to do better feature selection, it has not made that case well by not comparing to other methods of feature selection. On the other hand, if this paper is about clinical implications and the origin of the AIDPS gene set is beside the point, then staying mired in the original ML methodology used to select the features once we have found a good set of features to predict from and can throw other methods at those features and learn more by doing so seems the wrong path also.
-
Reviewer #2 (Public Review):
The authors seek to create a reliable prognostic signature for pancreatic cancer from gene expression data. They develop such a signature, which they call AIDPS, and which involves 9 genes. The AIDPS was created and tested using ten publicly available data sets and involved gene selection and regression algorithm selection protocols. The authors study the performance of their signature and claim that `AIDPS exhibited robust and dramatically superior predictive capability' compared to other signatures in the literature.
The strengths of the paper are the use of multiple data sets and the comparison of multiple regression methods.
The main weakness of this paper is that the authors used all 10 data sets both in the discovery stage and in the validation stage of their study. This is clear when they define their …
Reviewer #2 (Public Review):
The authors seek to create a reliable prognostic signature for pancreatic cancer from gene expression data. They develop such a signature, which they call AIDPS, and which involves 9 genes. The AIDPS was created and tested using ten publicly available data sets and involved gene selection and regression algorithm selection protocols. The authors study the performance of their signature and claim that `AIDPS exhibited robust and dramatically superior predictive capability' compared to other signatures in the literature.
The strengths of the paper are the use of multiple data sets and the comparison of multiple regression methods.
The main weakness of this paper is that the authors used all 10 data sets both in the discovery stage and in the validation stage of their study. This is clear when they define their consensus prognostic genes (CPGs) by doing univariate Cox regression in all ten cohorts, and then select those genes that show significant and more or less uniform association with outcome across all these ten data sets (as described on page 8). From this stage onwards, the 32 genes that are taken forward will inevitably show up as prognostic when tested within these same 10 data sets. On page 9, after having carried out further multivariate regressions comparing different models, which reduced the number of selected genes further to 9, the authors use the same 10 data sets used in the discovery stage again (plus the combination of these sets, which they call the `meta-cohort') for `validating the prognostic value of AIDPS in 11 datasets'. This, I regret, is not at all a validation, but what would be called quantification of performance on the discovery set. One could simply be looking at spurious association signals caused by overfitting, especially given the large mismatch between the initial number of covariates (15,288 genes) and the number of samples (n=1280).
As a further consequence, when the authors test the performance of AIDPS against other existing signatures by again using the original ten data sets (plus their union), they are effectively comparing the performance of their own signature on its associated discovery set to the performance of competitor signatures on a validation set. Of course, the performance of a signature on its own discovery set would nearly always be superior, so this 'test' tells us nothing yet.
I cannot of course claim that the authors' AIDPS is non-reproducible, but on the basis of the material in this manuscript one cannot claim that it is reproducible. Validation of unseen data is really mandatory but has not been done. At the very least, the claim that 'AIDPS exhibited robust and dramatically superior predictive capability' is premature.
-