Fecal microbiota-based: An interpretable GBM-SHAP machine learning model for the diagnosis of ankylosing spondylitis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Previous studies had shown a significant correlation between ankylosing spondylitis(AS) and the human gut microbiome, and emerging machine learning (ML) techniques can provide a reliable bridge between them. Achieving early diagnosis of AS through human gut microbiota with the help of ML is the paper’s objective. Methods: The fecal metagenomic sequencing data derived from NCBI that pertain to AS, which processed by data filtering tools: Trimmomatic (Trimming sequence) and Bowtie2(decontamination). Then, the processed data were classified by Kraken2 and estimated species abundance by Bracken. And we carried out species diversity analysis and actualized the visualization of species abundance. After that, univariate logistics regression and least absolute shrinkage and selection operator (LASSO) regression were utilized to analyze these fecal microbes. Then, the screened biomarkers were utilized to model construction. During the study, six models (include: LR, AB, HLP, BAG, GBM, XGB) were structured for selection, and the optimum model be applied as the tool of diagnosis AS. During this procedure, the behavior of model was contrasted by ten-fold cross-validation, ROC curve, precision recall curve, calibration curve and radar plot. Besides, confusion matrix and five-fold ROC were utilized to make further efforts to emerge the superiority of model. And Shapley Additive Explanations (SHAP) visualized the contribution of some biomarkers in the model. Results: Totally 211 samples were subsumed in the study which were randomly separated into two sections (training set and test set) in the proportion 3:1 and two parts are utilized to establish and validate ML models. In the alpha diversity analysis, significant difference between the healthy population and AS (P = 0.0237) was demonstrated. In addition, PCA results in beta diversity also showed a significant difference in community structure between the two groups (P=0.001). From the results of LASSO regression analysis, 82 fecal microbes were selected for model construction. According to the performance matrixs of the models, GBM model performed best. In addition, SHAP showed the top ten fecal microbes that contributed to the model construction, which further explaining the model. Conclusion: The construction of ensemble machine learning model GBM based on fecal microbes was meaningful for early diagnosis of AS and targeted clinical treatment.