Multidimensional proteomics and explainable AI feature selection identify cross-platform lung cancer molecular signature in blood plasma
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Lung cancer is the leading cause of cancer mortality worldwide with 70% diagnosed late stage despite low-dose computed tomography (LDCT) screening availability. We combined data-independent acquisition mass-spectrometry (DIA-MS) and proximity extension assay (PEA) with explainable artificial intelligence (XAI)-led machine learning (ML) for plasma-based biomarker discovery. From a 490 lung cancer and 124 matched-control cohort, ML models were trained to predict lung cancer achieving an AUROC of 0.91 [95% CI: 0.88–0.93] and 0.97 [95% CI: 0.92–0.98] in DIA-MS and PEA, respectively. XAI characterised networks of model-consistent features primarily related to infection and inflammatory responses. We then introduced a DNA-aptamer proteomics method and identified a cross-platform concordance panel, with performances of 0.88 [95% CI: 0.80–0.90] and 0.88 [95% CI: 0.81–0.95] in DIA-MS and PEA, respectively. This study demonstrates that combining multi-dimensional proteomics with XAI-ML can characterise robust biomarker signatures.