Leveraging proteomics and transfer learning for head and neck cancer detection in saliva
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Early detection of Head and neck cancer (HNC) has the potential to substantially improve patient survival, yet no biomarker tests for early detection are currently in clinical practice. Case-control studies that could be used to derive diagnostic biomarkers tend to be underpowered. Recent evidence suggests that we may be able to address this challenge by applying deep learning on pan-cancer data from large population studies.
Methods
We evaluate a range of machine learning methods and training scenarios to use proteome data to distinguish between HNC cases and controls. Models were trained on blood plasma proteomes from the UK Biobank (UKB) with n = 13,208 pan-cancer cases. To assess model’s generalisability across tissue types, we tested in a cross-tissue comparison using an independent saliva based proteome dataset from the SensOrPass HNC case-control study (n = 156).
Findings
We obtain best performance (AUC=0.88 versus AUC < 0.77 for others) using a transfer learning approach called CNN-Synth. This convolutional neural network was trained on UKB to distinguish between profiles from a set of controls and cases including synthetic profiles generated by a pretrained variational autoencoder. Post-hoc model explainability using SHapley Additive explanations identified IL6, CXCL17, CXCL13, IGF1R and FASLG as the top five proteins contributing most to predictor performance.
Interpretation
Our findings underscore the potential for deep learning and explainable AI to leverage data from large-scale population datasets for advancing early cancer detection and improving clinical outcomes.
Funding
This work was supported by Cancer Research UK (grant numbers EDDISA-Jan22\100003 and C18281/A29019).