Topic and sentiment in comments on diabetes-related Douyin short videos: a cross-sectional text-mining study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Short-form video platforms are increasingly used for diabetes-related health information, and comment sections may capture users’ information needs and affective responses. Methods We analysed publicly visible top-level comments on diabetes-related Douyin (TikTok China) videos using a cross-sectional text-mining design. Videos were drawn from a previously evaluated dataset (n = 276) and stratified by information quality (final consensus modified DISCERN score) and diffusion (Douyin Communication Index) into four quadrants; six videos were selected from each quadrant (24 total). All retrieved comments (raw, n = 3,933) were used for descriptive temporal summaries, while text-based analyses were conducted on valid comments after rule-based cleaning (n = 2,007). We performed Chinese word segmentation (jieba), stop-word removal, term-frequency analysis, keyword co-occurrence network analysis (co-occurrence threshold ≥ 6), LDA topic modelling (K = 5), and SnowNLP sentiment classification (negative < 0.35; neutral 0.35–0.65; positive > 0.65). Results High-frequency terms were concentrated on diabetes, blood glucose, fasting, doctors, and insulin. The most frequent co-occurring pairs included fasting–blood glucose (25) and diabetes–blood glucose (16). Topic modelling identified five topics; Topic 2 accounted for 89.0% of valid comments (1,786/2,007). Sentiment was predominantly neutral (92.18%, 1,850/2,007), with 6.83% positive (137/2,007) and 1.00% negative comments (20/2,007). In the raw corpus, commenting activity peaked on Fridays (16.5%) and during 18:00–22:00 (29.4%), with a single hourly peak at 20:00 (254 comments). Conclusions Comment discourse was primarily oriented toward practice-oriented diabetes self-management, particularly the reporting and interpretation of glycaemic readings and related action-oriented questions. Although negative sentiment was relatively uncommon, such comments often described concrete confusion, worries, or difficulties in disease management. These findings may inform platform-level governance of health-related content and more targeted communication strategies for populations affected by diabetes.