Prospective Germline Exome and Machine Learning-Based Risk Score Identify Predictive and PrognosticBiomarkers of Immunotherapy Outcomes in Advanced Non-Small Cell Lung Cancer
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Immune checkpoint inhibitors (ICIs) have transformed the treatment landscape of advanced non-small cell lung cancer (NSCLC). However, a substantial proportion of patients do not experience durable clinical benefits, and established tumour biomarkers, such as PD-L1 expression and tumour mutational burden, often show limited predictive value. The potential of inherited germline variants to predict immunotherapy outcomes in NSCLC remains a critical and underexplored area. Methods: We prospectively enrolled 117 patients with advanced NSCLC treated with ICI-based regimens at two centres in Spain. Germline whole-exome sequencing (WES) was performed on pretreatment blood samples. Exonic and intronic variants were annotated and integrated with comprehensive clinical data. We applied XGBoost and LASSO machine learning models to identify predictive germline variants and clinical features, and subsequently trained them to predict treatment response and progression-free survival (PFS). This approach produced a novel clinical–germline risk score, generating both a global model and a specific model for the lung adenocarcinoma (LUAD) histological subtype. Results: XGBoost significantly outperformed penalised regression (LASSO), achieving a robust cross-validated area under the curve (AUC) of 0.845 for predicting treatment response in the validation cohort. Our models identified several novel germline loci that were significantly associated with immunotherapy outcomes. Variants in SLC6A16 , SIGLEC11 , PDE4E , and OR10H5 were associated with reduced PFS, whereas variants in CCZ1B and PHLDB1 were associated with extended PFS. Lymph node metastasis was confirmed as the sole independent clinical predictor of poor response (OR 2.07, P=0.008). A predictive algorithm that included these individual variables generated a clinical–germline risk score that successfully stratified patients into high- and low-risk groups with markedly different median PFS (low-risk: 18 months vs. high-risk: 7 months, log-rank P < 0.001), retaining discriminatory power across histological subgroups. Conclusions: The integration of specific germline variants as the output of advanced machine learning analysis of the exome with key clinical features provides accurate and novel predictive information for immunotherapy in NSCLC. This approach not only uncovers new genetic biomarkers but also supports the clinical adoption of composite risk scores for personalised precision immunotherapy, paving the way for improved patient selection and stratification.