Construction of a Diagnostic Model for Nasopharyngeal Carcinoma Using a Consensus Machine Learning Approach and Study of Immune Infiltration Characteristics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Nasopharyngeal carcinoma requires reliable diagnostic biomarkers due to its occult location and poor outcomes. Methods: This study analyzed scRNA-seq data from NPC and control tissues to resolve the tumor microenvironment. CellChat was utilized to infer cell-cell communication. We integrated marker genes from cell clusters, differentially expressed genes (DEGs) from bulk RNA-seq, and key module genes identified by WGCNA to screen candidate genes. Feature selection was then performed using four machine learning algorithms (LASSO, SVM-RFE, Boruta, and XGBoost) to build a robust diagnostic model, and its performance was evaluated with ROC curve analysis. An interactive web application for model visualization was developed using the R Shiny package. We further investigated the prognostic value, immune infiltration association, and functional pathways of the core genes. Potential therapeutic compounds were predicted via the CMAP database and validated by molecular docking. Results: Single-cell analysis of 67,535 cells revealed a heterogeneous tumor microenvironment (TME) in NPC. We found that all seven identified cell subpopulations contributed highly and evenly to the tumor. Four core genes—COL4A2, LAMB1, ACTA2, and CCL2—were consistently identified by four machine learning algorithms. A diagnostic model based on these genes demonstrated high accuracy (validation set AUC = 0.933; independent external validation set AUC = 0.966). ACTA2 and COL4A2 showed strong positive correlations with activated dendritic cells and various T-cell subsets, while CCL2 and LAMB1 were strongly associated with M1 macrophages and neutrophils. Functional enrichment analysis indicated that LAMB1, COL4A2, and ACTA2 primarily drive tumor invasion and remodeling processes such as epithelial-mesenchymal transition and angiogenesis, whereas CCL2 predominantly activates the immune-inflammatory microenvironment. High expression of all four genes was associated with poor prognosis. Computational prediction and molecular docking suggested that candidate drugs such as parthenolide and panobinostat may specifically target the CCL2-mediated immune-inflammatory axis or the ACTA2-driven invasive/fibrotic pathway, respectively, offering a potential strategy for combination therapy targeting multiple pathogenic networks in NPC. Conclusion: This study integrated multi-omics data with machine learning to develop a robust four-gene diagnostic model for NPC. The core genes (COL4A2, LAMB1, ACTA2, CCL2) are associated with tumor progression, prognosis, immune regulation, and distinct biological pathways. Our findings provide a valuable tool for the diagnosis and risk stratification of NPC and reveal potential therapeutic targets worthy of further investigation.

Article activity feed