A hybrid computer vision model to predict lung cancer in diverse populations

Abdul J. Zakkar
Nazia Perwaiz
Vikram Harikrishnan
Weiheng Zhong
Vijeth Narra
Alex Krule
Farah Yousef
Daniel Kim
Mason Burrage-Burton
Abdul Afeez Lawal
Vijayakrishna K. Gadi
Mark C. Korpics
Sage J. Kim
Zhengjia Chen
Aly A. Khan
Yamilé Molina
Yang Dai
G. Elisabeta Marai
Hadi Meidani
Ryan H. Nguyen
Ameen A. Salahudeen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

PURPOSE

Disparities of lung cancer incidence exist in Black populations and screening criteria underserve Black populations due to disparately elevated risk in the screening eligible population. Prediction models that integrate clinical and imaging-based features to individualize lung cancer risk is a potential means to mitigate these disparities.

PATIENTS AND METHODS

This Multicenter (NLST) and catchment population based (UIH, urban and suburban Cook County) cross-sectional study utilized participants at risk of lung cancer with available lung CT imaging and follow up between the years 2015 and 2024. 53,452 in NLST and 11,654 in UIH were included based on age and tobacco use based risk factors for lung cancer. Cohorts were used for training and testing of deep and machine learning models using clinical features alone or combined with CT image features (hybrid computer vision).

RESULTS

An optimized 7 clinical feature model achieved ROC-AUC values ranging 0.64-0.67 in NLST and 0.60-0.65 in UIH cohorts across multiple years. Incorporation of imaging features to form a hybrid computer vision model significantly improved ROC-AUC values to 0.78-0.91 in NLST but deteriorated in UIH with ROC-AUC values of 0.68-0.80, attributable to Black participants where ROC-AUC values ranged from 0.63-0.72 across multiple years. Retraining the hybrid computer vision model by incorporating Black and other participants from the UIH cohort improved performance with ROC-AUC values of 0.70-0.87 in a held out UIH test set.

CONCLUSION

Hybrid computer vision predicted risk with improved accuracy compared to clinical risk models alone. However, potential biases in image training data reduced model generalizability in Black participants. Performance was improved upon retraining with a subset of the UIH cohort, suggesting that inclusive training and validation datasets can minimize racial disparities. Future studies incorporating vision models trained on representative data sets may demonstrate improved health equity upon clinical use.

Version published to 10.1101/2024.10.07.24315011 on medRxiv
Oct 7, 2024

Smart Diagnosis: AI and ML Powered Breast Cancer Classification

This article has 2 authors:
1. Sagar Verma
2. Vaibhav Sabale
This article has no evaluationsLatest version Jan 28, 2026
Lung Cancer Multimodal Auxiliary Diagnosis Based on Entropy Weight Decision Fusion

This article has 5 authors:
1. Haixiang Zhang
2. Yuhong Tang
3. Peipei Li
4. Weijian Fan
5. Xiangzi Chen
This article has no evaluationsLatest version Jan 28, 2026
Examining the Trajectory of Ground Glass Nodule Patients in China: A Real-World Perspective

This article has 6 authors:
1. Chenlu Yang
2. Xuewen Zhang
3. Lei Sun
4. Shun Xu
5. Rusi Zhang
6. Lanjun Zhang
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

PURPOSE

PATIENTS AND METHODS

RESULTS

CONCLUSION

Article activity feed

Related articles

Smart Diagnosis: AI and ML Powered Breast Cancer Classification

Lung Cancer Multimodal Auxiliary Diagnosis Based on Entropy Weight Decision Fusion

Examining the Trajectory of Ground Glass Nodule Patients in China: A Real-World Perspective