Data Mining of Public Databases to Identify TCM Syndrome Patterns in Gout: A Retrospective Study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective: This study aims to analyze TCM syndrome patterns in gout patients by integrating multiple data-driven methods—including factor analysis, hierarchical clustering, association rule mining, and machine learning—based on a large-scale, structured dataset of clinical gout case records. The goals are to identify core symptom clusters, objectively classify patient subtypes, uncover symptom association patterns, and construct a predictive model for syndrome differentiation. Materials and Methods: This study was a retrospective data mining analysis. The data were derived from published Traditional Chinese Medicine (TCM) case reports on gout that met the inclusion criteria, retrieved from the China National Knowledge Infrastructure (CNKI) database and the "Ancient and Modern Medical Case Cloud Platform (V3.0)" between 2020 and 2023. A total of 295 cases were included. Demographic characteristics, TCM four-examination data, 41 binary symptom variables, and syndrome classification information were collected. Statistical analyses included exploratory factor analysis (with varimax rotation), hierarchical cluster analysis (Ward's method), association rule mining (Apriori algorithm), and five machine learning classifiers (logistic regression, random forest, gradient boosting, support vector machine, and naive Bayes). The analyses were performed using Python 3.11.0 and SPSS 26.0. Results: The cohort consisted of 277 males (93.9%) and 18 females (6.1%), with an average age of 48.5 ± 12.2 years. Gout syndrome distribution: damp-heat accumulation in 169 cases (57.3 %), spleen deficiency and dampness obstruction in 38 cases (12.9 %), damp-heat combined with phlegm and blood stasis in 37 cases (12.5 %), phlegm and blood stasis obstruction in 28 cases (9.5 %), liver and kidney deficiency in 23 cases (7.8 %). The high-frequency symptoms of gout were joint pain (86.4 %), red tongue (75.9 %), yellow fur (66.8 %), and joint swelling (63.7 %). The results of factor analysis showed that 14 symptom factors were extracted (KMO = 0.5896, Bartlett 's χ2 = 4083.74, p < 0.001), with the main factor (eigenvalue = 6.42) representing the toxic heat dimension. Cluster analysis identified five patient groups, indicating internal heterogeneity in damp-heat syndrome. The association rule mining found 31 significant associations, and the strongest rules (red tongue, slippery pulse, number pulse) → (slippery number pulse) (confidence 100 %, improvement 5.566). In the machine learning model, logistic regression performed best (accuracy 62.92 %, weighted AUC = 0.7634). Conclusion: This study provides objective evidence for TCM syndrome differentiation of gout by integrating multiple data-driven methods. The prevalence of damp-heat syndrome supports the theoretical framework of TCM. Factor analysis validated the concept of syndrome elements from the symptom dimension, while cluster analysis highlighted the need for refined classification. The moderate performance of the machine learning model indicates its potential for clinical decision support. This study advances the standardization of syndrome differentiation by merging traditional wisdom with modern computational methods, aiding in the diagnosis and treatment of gout in TCM.

Article activity feed