Are all Generative AI Chatbots the Same? Analysing the Reliability of Hotel Recommendations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rise of generative AI tools like ChatGPT, Claude, Gemini, DeepSeek and Grok is transforming the way users interact with digital information, particularly in the global hospitality industry. This study evaluates hotel recommendations generated by these AI chatbots across the top 10 most visited cities worldwide. A comprehensive comparative analysis is conducted to evaluate whether these tools provide reliable and unbiased suggestions by comparing their outputs with verified real hotel data, including price, hotel category, and scores from Booking.com and TripAdvisor. The findings reveal a significant difference between AI-generated data and actual real-world values, especially in pricing. ChatGPT consistently recommends higher-category hotels but often underestimates scores and prices. Gemini achieves the closest alignment with star ratings. DeepSeek and Grok present increasingly promising multimodal capabilities. The study highlights the potential and current limitations of AI-driven hotel recommendations, offering strategic insights for hospitality businesses that are adapting to rapidly changing AI-driven search behaviour.

Article activity feed