A machine learning and SHAP-based systems analysis of post-translational modification–related genes in nonalcoholic steatohepatitis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Nonalcoholic steatohepatitis (NASH) is a multifactorial metabolic liver disease characterized by marked clinical and molecular heterogeneity, which complicates disease characterization and patient stratification. Post-translational modification (PTM)–related genes are known to participate in metabolic and inflammatory regulation; however, their system-level relevance in NASH remains insufficiently defined. In this study, transcriptomic data from liver tissues of patients with NASH were analyzed across multiple independent cohorts. Machine learning models incorporating SHapley Additive exPlanations (SHAP) were applied to evaluate PTM-related gene patterns associated with disease status, with model performance assessed through cross-validation and external datasets. Among the evaluated approaches, the combination of least absolute shrinkage and selection operator (LASSO) and linear discriminant analysis (LDA) showed the most consistent performance. This analysis repeatedly highlighted three PTM-related genes—PELI2, DUSP2, and TRIM56—that were associated with NASH across cohorts. Expression of these genes was related to inflammatory gene programs, lipid metabolism–associated pathways, fibrosis-related markers, and variations in immune cell infiltration. Stratification based on their expression profiles further delineated molecular subgroups of NASH with distinct immune and metabolic characteristics. Overall, this study provides a system-oriented characterization of PTM-related gene alterations in NASH and illustrates the utility of integrative analytical approaches for exploring molecular heterogeneity in complex metabolic diseases.