Adaptive Transfer Learning for Time-to-Event Modeling with Applications in Disease Risk Assessment
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
To address the challenges for modeling time-to-event outcomes in small-sample settings, we propose a novel transfer learning approach, termed CoxTL, based on the widely used Cox proportional hazards model, accounting for potential covariate and concept shifts between source and target datasets. CoxTL utilizes a combination of density ratio weighting and importance weighting techniques to address multi-level data heterogeneity, including covariate and coefficient shifts between source and target datasets. Additionally, it accounts for potential model misspecification, ensuring robustness across a wide range of settings. We assess the performance of CoxTL through extensive simulation studies, considering data under various types of distributional shifts. Additionally, we apply CoxTL to predict End-Stage Renal Disease (ESRD) in the Hispanic population using electronic health record-derived features from the All of Us Research Program. Data from non-Hispanic White and non-Hispanic Black populations are leveraged as source cohorts. Model performance is evaluated using the C-index and Integrated Brier Score (IBS). In simulation studies, CoxTL demonstrates higher predictive accuracy, particularly in scenarios involving multi-level heterogeneity between target and source datasets. In other scenarios, CoxTL performs comparably to alternative methods specifically designed to address only a single type of distributional shift. For predicting the 2-year risk of ESRD in the Hispanic population, CoxTL achieves an increase in C-index up to 6.76% compared to the model trained exclusively on target data. Furthermore, it demonstrates up to 17.94% increase in the C-index compared to the state-of-the-art transfer learning method based on Cox model. The proposed method effectively utilizes source data to enhance time-to-event predictions in target populations with limited samples. Its ability to handle various sources and levels of data heterogeneity ensures robustness, making it particularly well-suited for real-world applications involving target populations with small sample sizes, where traditional Cox models often struggle.