Machine learning–driven analysis of celiac disease to elucidate shared transcriptomic signatures with diffuse large B-cell lymphoma

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Celiac disease is an immune-mediated disorder primarily affecting the small intestine, while diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma. Although clinical and epidemiological observations suggest an increased risk of lymphoma among patients with autoimmune diseases, the molecular mechanisms linking celiac disease to DLBCL remain poorly defined. In this study, we investigated shared transcriptomic features between these two conditions. Publicly available gene expression datasets were analyzed independently for celiac disease and DLBCL. Differential gene expression analyses were performed for each disease, followed by principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) to assess sample-level structure. Disease-specific pathway enrichment analyses were conducted prior to identifying overlapping differentially expressed genes, which were subsequently evaluated using a machine learning–based classification framework. Five genes were consistently identified as shared top-ranking candidates across machine learning models. Four genes (GBP1, GBP2, CCL5, and CD3G) exhibited concordant upregulation in both celiac disease and DLBCL, whereas DHRS7 showed consistent downregulation. Pathway enrichment analyses highlighted prominent immune and inflammatory signaling pathways shared between the two diseases. Using these top-ranking genes, an XGBoost classifier demonstrated strong predictive performance, achieving a mean accuracy of 0.96 (SD = 0.049) and a ROC–AUC of 0.991 under five-fold cross-validation. Collectively, these findings indicate a shared immune-related transcriptional signature linking celiac disease and DLBCL and highlight a small set of genes with potential biomarker relevance. This work provides molecular insight into the connection between chronic immune activation and lymphoid malignancy and offers a foundation for future mechanistic and translational studies.

Article activity feed