Benchmarking Multilingual Sentiment Analysis Models for the Hospitality Industry: A Case Study of Hotel Reviews in Vietnam
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The explosive growth of online tourism platforms has generated massive multilingual hotel reviews, creating both opportunities and challenges for hospitality businesses in emerging markets. Vietnamese hotels receive feedback in multiple languages, making manual sentiment analysis impractical. This study benchmarks five state-of-the-art Transformer models (XLM-RoBERTa, mBERT, mDeBERTa, DistilBERT, multilingual-E5) for automated sentiment classification using 59,377 authentic hotel reviews spanning 14 languages and five sentiment categories with highly imbalanced distributions reflecting realistic business scenarios. Results reveal that multilingual-E5 achieves the highest overall performance (82% accuracy, macro F1=0.62) with superior minority class handling, while DistilBERT provides comparable accuracy (80%) with significantly reduced computational requirements. Critically, XLM-RoBERTa exhibits catastrophic failure on minority classes despite strong benchmark performance (77% accuracy, 0.03 recall on negative reviews), demonstrating that standard NLP benchmarks do not predict domain-specific effectiveness. We provide evidence-based model selection guidelines linking business characteristics to appropriate choices, quantitative cost-benefit analysis demonstrating 710% ROI for typical deployments, and actionable implementation strategies. These findings enable small and medium hospitality enterprises to adopt AI-powered sentiment analysis sustainably, supporting UN Sustainable Development Goals 8, 9, and 12 by democratizing access to sophisticated NLP capabilities while promoting responsible computational practices.