Cultural Faithfulness in Tourism Chatbots: A Structured Human Adjudication Framework for Traceable Retrieval-Augmented Generation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The integration of large language model-based chatbots into tourism contexts has sparked critical discussions about Cultural Faithfulness, especially concerning the accurate representation of intangible heritage. While Retrieval-Augmented Generation (RAG) enhances factual accuracy, current evaluation methods predominantly depend on automated similarity metrics and seldom incorporate structured human adjudication, often conflating semantic coherence with epistemic validity. To address these limitations, this study proposes a Human-in-the-Loop evaluation framework for traceable RAG systems in tourism chatbots. The framework combines chunk-level retrieval traceability, a granular narrative error taxonomy (E0–E10) designed to capture varying degrees of attribution erosion, automated similarity filtering, and human-based documented consensus adjudication into a cohesive protocol. By treating retrieval as an epistemic constraint on generative processes and operationalizing abstention as a measure of boundary awareness, the framework establishes rigorous evaluation criteria. Empirical validation was conducted using 61 structured questions derived from a corpus of Indonesian cultural narratives, generating 183 independent annotations. Analysis revealed that 73.8% of responses met acceptance criteria after artifact-level review, while 26.2% were excluded due to grounding violations. Notably, high-severity errors were confined to rejected cases, and iterative testing confirmed no progression into hallucination or contradiction categories. These results confirm that Cultural Faithfulness can be systematically achieved through traceable retrieval mechanisms, structured human validation, and governance-aligned artifact preservation. This research extends evaluation methodologies beyond superficial similarity metrics, advancing a unified model of epistemic accountability for generative systems in tourism applications.

Article activity feed