Leveraging social media for public health: NLP implementations for blood donation data analysis in Japan
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Blood donation is crucial for healthcare systems, yet maintaining an adequate supply is a persistent challenge. Traditional methods to understand public sentiment and donor behavior are often limited. Social media, particularly “X” (formerly Twitter), offers a promising alternative for real-time insights. This study explores the viability of using “X” data to analyze blood donation sentiment in Japan, considering the evolving perspectives of younger generations. We replicated previous study results using the Tohoku BERT model and tested a refined blood donation tweets for user classification (BDT-UC) dataset and another customized version of the model for better classification. We also compared various topic modeling methods, including latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), and BERT-based models, using two different preprocessing techniques. Finally, we integrated the classification into the Topic Modeling process, to explore the possible impact of the previous steps in such execution, for a final evaluation. Our findings indicate that although the refined dataset has an overall lower classification performance, it improved the implementation results, ensuring more balanced labeling across the data. Our refined model had a small reduction in overall precision (from 78.4% in the best evaluated model to 75.8% in the refined model). However, we improved the implementation results, ensuring more balanced labeling across the data. For topic modeling, BERT-based topic models, particularly those preprocessed with the MeCab library, achieved higher coherence and diversity scores than traditional methods. Additionally, there were significant differences when the dataset was processed following the categories of the BDT-UC study, which used specific categories related to the tweets role in blood donation. There was increased coherence and diversity for one of the categories but notably lower coherence values for the others. This study underscores the significance of initial classification and preprocessing for effective topic modeling approach when working with Japanese text, which impacts the viability of extracting insights from Japanese social media data. The developed methodologies could support more effective analysis of blood donation groups, and better targeted donation campaigns in Japan.