Risk prediction for lung cancer screening: a systematic review and meta-regression

Ramin Rezaeianzadeh
Crystal leung
Soo Jeong Kim
Kayly Choy
Kate Johnson
Miranda Kirby
Stephen Lam
Benjamin Smith
Mohsen Sadatsafavi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Lung cancer (LC) is the leading cause of cancer mortality, often diagnosed at advanced stages. Screening reduces mortality in high-risk individuals, but its efficiency can improve with pre- and post-screening risk stratification. With recent LC screening guideline updates in Europe and the US, numerous novel risk prediction models have emerged since the last systematic review of such models. We reviewed risk-based models for selecting candidates for CT screening, and post-CT stratification.

Methods

We systematically reviewed Embase and MEDLINE (2020–2024), identifying studies proposing new LC risk models for screening selection or nodule classification. Data extraction included study design, population, model type, risk horizon, and internal/external validation metrics. In addition, we performed an exploratory meta-regression of AUCs to assess whether sample size, model class, validation type, and biomarker use were associated with discrimination.

Results

Of 1987 records, 68 were included: 41 models were for screening selection (20 without biomarkers, 21 with), and 27 for nodule classification. Regression-based models predominated, though machine learning and deep learning approaches were increasingly common. Discrimination ranged from moderate (AUC≈0.70) to excellent (>0.90), with biomarker and imaging-enhanced models often outperforming traditional ones. Model calibration was inconsistently reported, and fewer than half underwent external validation. Meta-regression suggested that, among pre-screening models, larger sample sizes were modestly associated with higher AUC.

Conclusion

75 models had been identified prior to 2020, we found 68 models since. This reflects growing interest in personalized LC screening. While many demonstrate strong discrimination, inconsistent calibration and limited external validation hinder clinical adoption. Future efforts should prioritize improving existing models rather than developing new ones, transparent evaluation, cost-effectiveness analysis, and real-world implementation.

Version published to 10.1101/2025.09.10.25335529 on medRxiv
Sep 12, 2025

Machine Learning–Based Analysis of Factors Associated With Colonoscopy Screening Adherence and Development of a Predictive Model Among High-Risk Individuals for Colorectal Cancer

This article has 7 authors:
1. Jiaoping Zhang
2. Yan Wang
3. Jing Wang
4. Rong Zhao
5. Xuling Gao
6. Xianyan Xu
7. Qiankun Liu
This article has no evaluationsLatest version Jan 22, 2026
A Hybrid Pharmacovigilance Method for National-Scale Comorbidity Discovery: Association Rules with FDA-Approved PRR/Chi-square and EBGM Validation.

This article has 1 author:
1. Kaossara Osseni
This article has no evaluationsLatest version Dec 24, 2025
Gastric cancer burden, risk factors, and early detection strategies in Latin America and Colombia

This article has 7 authors:
1. Omaira Valencia
2. Gustavo Reyes
3. Maria Carolina Lopez-Mateus
4. Angelica Otalora-Bernal
5. David Suarez
6. Luisa Fernanda Cardona
7. Gabriel Herrera
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Article activity feed

Related articles

Machine Learning–Based Analysis of Factors Associated With Colonoscopy Screening Adherence and Development of a Predictive Model Among High-Risk Individuals for Colorectal Cancer

A Hybrid Pharmacovigilance Method for National-Scale Comorbidity Discovery: Association Rules with FDA-Approved PRR/Chi-square and EBGM Validation.

Gastric cancer burden, risk factors, and early detection strategies in Latin America and Colombia