When AI Tells the Truth? Evaluating Different LLM Approaches to Reliable Trip Planning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents a hybrid architecture for the automatic generation of multi-day, personalized travel itineraries that balance strict logistical constraints with individual traveler preferences. The system combines Large Language Models (LLMs) with retrieval-augmented generation, web search, and geospatial APIs in an iterative plan-execute-reflect loop. Three agent configurations - LLM-only, LLM + Search, and geospatially-aware LLM + Maps - were evaluated across 371 scenarios differing in location, trip length, and transport mode. Five quantitative metrics captured temporal realism, preference alignment, hallucination rate, and computational efficiency. Results show that grounding LLMs in verified, real-time data sources - especially via Google Maps - virtually eliminates hallucinations and unrealistic timing, producing feasible itineraries. The best overall performance was achieved by a geospatially-grounded agent using Claude~3.5 Sonnet~v2, highlighting the role of LLMs as high-level semantic orchestrators rather than autonomous planners.