Early detection of non-small cell lung cancer using electronic health record data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Rationale
Specific patient characteristics increase the risk of cancer, necessitating personalized healthcare approaches. For high-risk individuals, tailored clinical management ensures proactive monitoring and timely interventions. Electronic Health Records (EHR) data are crucial for supporting these personalized approaches, improving cancer prevention and early diagnosis.
Objectives
We leverage EHR data and build a prediction model for early detection of non-small cell lung cancer (NSCLC).
Methods
We utilize data from Mass General Brigham’s EHR and implement a three-stage ensemble learning approach. Initially, we generate risk scores using multivariate logistic regression in a self-control and case-control design to distinguish between cases and controls. Subsequently, these risk scores are integrated and calibrated using a prospective Cox model to develop the risk prediction model.
Results
We identified 127 EHR-derived features predictive for early detection of NSCLC. The highly predictive features include smoking, relevant lab test results, and chronic lung diseases. The predictive model reached area under the ROC curve (AUC) of 0.801 (positive predictive value (PPV) 0.0173 with specificity 0.02) for predicting one-year NSCLC risk in a population aged 18 and above, and AUC of 0.757 (PPV 0.0196 with specificity 0.02) in a population aged 40 and above.
Conclusions
This study identified EHR derived features which are predictive of early NSCLC diagnosis. The developed risk prediction model exhibits superior performance for early detection of NSCLC compared to a baseline model that only relies on demographic and smoking information, demonstrating the potential of incorporating EHR derived features for personalized cancer screening recommendations and early detection.