GeneLLM: A large language model-based framework reveals age-dependent transcriptomic heterogeneity in sepsis diagnosis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Sepsis is a life-threatening organ dysfunction caused by a dysregulated host response to infection (Sepsis-3 definition). It is highly heterogeneous and life-threatening, with marked age-dependent differences in host immune responses. However, whether such biological heterogeneity limits the performance and generalizability of transcriptomics-based diagnostic models remains unclear. Methods We developed GeneLLM, a large language model-based framework for sepsis classification using whole-blood transcriptomic data. By reformulating gene expression profiles as tokenized representations, the model enabled parameter-efficient transfer learning from a pre-trained LLaMA-7B backbone (only ~ 0.01% parameters updated). Five publicly available GEO datasets were analyzed, comprising three pediatric and two adult cohorts. Model performance was evaluated using standard classification metrics and representation learning analyses, and biological interpretability was assessed using SHAP. Results The model achieved near-perfect performance in pediatric cohorts (AUC = 0.9965, AP = 0.9997) but showed reduced performance in adults (AUC = 0.9011, AP = 0.9873). Representation analysis revealed improved separability in pediatric samples and substantial overlap in adult samples, indicating increased transcriptomic heterogeneity. Consistently, adult predictions exhibited higher uncertainty and more dispersed confidence distributions. Interpretability analysis identified distinct gene signatures, with pediatric sepsis dominated by innate immune responses and adult sepsis involving more diverse regulatory pathways. Notably, no overlapping key genes were observed between cohorts. Conclusions These findings demonstrate that age-dependent transcriptomic heterogeneity fundamentally constrains sepsis diagnostic performance. Pediatric and adult sepsis are biologically distinct entities requiring stratified modeling. GeneLLM provides a scalable LLM-based framework for precision transcriptomic diagnostics.

Article activity feed