Curating retrospective multimodal and longitudinal data for community cohorts at risk for lung cancer

Thomas Z. Li
Kaiwen Xu
Neil C. Chada
Heidi Chen
Michael Knight
Sanja Antic
Kim L. Sandler
Fabien Maldonado
Bennett A. Landman
Thomas A. Lasko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large community cohorts are useful for lung cancer research, allowing for the analysis of risk factors and development of predictive models.

Objective

A robust methodology for (1) identifying lung cancer and pulmonary nodules diagnoses as well as (2) associating multimodal longitudinal data with these events from electronic health record (EHRs) is needed to optimally curate cohorts at scale.

Methods

In this study, we leveraged (1) SNOMED concepts to develop ICD-based decision rules for building a cohort that captured lung cancer and pulmonary nodules and (2) clinical knowledge to define time windows for collecting longitudinal imaging and clinical concepts. We curated three cohorts with clinical data and repeated imaging for subjects with pulmonary nodules from our Vanderbilt University Medical Center.

Results

Our approach achieved an estimated sensitivity 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), positive predictive value of 0.979 (95% CI: [0.959, 1.000]), and negative predictive value of 0.987 (95% CI: [0.976, 0.994]) for distinguishing lung cancer from subjects with SPNs.

Conclusions

This work represents a general strategy for high-throughput curation of multi-modal longitudinal cohorts at risk for lung cancer from routinely collected EHRs.

Version published to 10.3233/cbm-230340
Mar 7, 2024
Version published to 10.1101/2023.11.03.23298020 on medRxiv
Nov 4, 2023

Screening of key variables and development and validation of a prognostic model for hepatocellular carcinoma

This article has 8 authors:
1. Jiang Chen
2. Hangyu Zhi
3. Mian Guo
4. Xin Meng
5. Yibo Zhang
6. Huan Xia
7. Cong Yao
8. Kai Qu
This article has no evaluationsLatest version Mar 23, 2026
Bloodwork-free Early Screening for Alzheimer’s Disease via Comorbid Pattern Recognition in Electronic Health Records

This article has 3 authors:
1. Dmytro Onishchenko
2. James A. Mastrianni
3. Ishanu Chattopadhyay
This article has no evaluationsLatest version Feb 9, 2026
Predicting pituitary neuroendocrine tumor risk based on clinical and MRI features nomogram: A multicenter study

This article has 10 authors:
1. Caiqiang Xue
2. Suixiang Qiu
3. Chengjian Wang
4. Ming Liu
5. Tao Han
6. Qing Zhou
7. Peng Zhang
8. Kai Huang
9. Junlin Zhou
10. Xuejun Liu
This article has no evaluationsLatest version Mar 11, 2026

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Screening of key variables and development and validation of a prognostic model for hepatocellular carcinoma

Bloodwork-free Early Screening for Alzheimer’s Disease via Comorbid Pattern Recognition in Electronic Health Records

Predicting pituitary neuroendocrine tumor risk based on clinical and MRI features nomogram: A multicenter study