Computational screening for lung cancer with protein biomarkers: biomarker selection, longitudinal strategy and evaluation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The high mortality rate of lung cancer is primarily attributable to late-stage diagnosis. While low dose computed tomography has been proved effective, issues such as radiation exposure have limited its use mostly to smokers. Although there are promising serum protein biomarker candidates, there are currently no universally accepted protein biomarkers for lung cancer screening. Moreover, while some studies have explored the use of basic (non-longitudinal) machine learning techniques, evaluations of using longitudinal approaches for improving predictive performance remain lacking. We analysed 94 proteins in longitudinal serum samples from 98 lung cancer cases and 150 controls from the UKCTOCS trial. We proposed a new computational biomarker selection strategy which enables us to derive 6 candidate biomarkers in the whole dataset and in the smoker subgroup, respectively. The performance of the combinations of selected biomarkers was evaluated with three machine learning techniques, including two longitudinal approaches. We demonstrated the potential benefits of integrating CEACAM5 and MUC-16 in the current screening framework for smokers assisted by the Bayesian change-point approach. Additionally, we showed the exceptional performance of using the logistic regression model along with the selected 6 candidate biomarkers in the whole dataset, paving the way for introducing routine biomarker screening for non-smokers.