Transfer Learning for Survival-based Clustering of Predictors with an Application to TP53 Mutation Annotation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
TP53 is the most frequently mutated gene in human cancers, and germline mutations in TP53 cause Li-Fraumeni syndrome (LFS), a hereditary predisposition to diverse cancers. Accurate annotation of TP53 mutations based on their survival effects is critical for informed LFS patient management. Motivated by this need, we develop a new approach for Survival-based Clustering of Predictors (SCP) by identifying homogeneous coefficients in Cox regression. We formulate this task as a fusionpenalized Cox regression problem and provide an efficient computational algorithm. A nonconvex distance-to-set penalty is adopted to facilitate parameter tuning and improve estimation accuracy. To overcome data limitations, we further develop TLSCP, a transfer learning extension that borrows coefficient ranking information from a source dataset under the assumption of similar ranking patterns between source and target. TL-SCP integrates ranking information through weighted rank averaging, allowing flexibility in accommodating cohort heterogeneity while maintaining model simplicity. Simulation studies demonstrate TL-SCP’s superior performance over SCP in clustering recovery and coefficient estimation. In the application of TP53 mutation annotation where we utilize non-LFS germline TP53 mutation carriers as a source cohort for the target LFS cohort, TL-SCP identifies biologically meaningful TP53 mutation clusters and offers improved clinical interpretability compared to experiment-based annotations.