Towards Trustworthy and Effective AI for Academic Policy Navigation: Human Evaluation of a Source-Aware, Domain-Optimized RAG-Based Chatbot

Sofia Meacham
Alireza Sharafzad

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Navigating institutional policies remains a challenge for students and staff due to complex legalistic language, hierarchical structures, and dispersed documentation. While Large Language Models (LLMs) such as GPT-4o offer fluent natural language capabilities, their susceptibility to hallucination limits their perceived trustworthiness in academic contexts where factual accuracy and traceability are critical. This study investigates how a combination of transparency-enhancing tactics—specifically, source citation and human-centered evaluation—and domain-specific performance strategies can support the development of more trustworthy and effective AI systems. We present a source-aware, Retrieval-Augmented Generation (RAG)-based chatbot designed to assist users in interpreting Bournemouth University’s Code of Practice for Research Degrees. The system integrates trust-building interventions with performance-enhancing techniques tailored to policy documents, including layout-aware chunking, hybrid self-reranking, and semantic vector search using Pinecone. Quantitative evaluation using the RAGAS framework and BERTScore shows a high faithfulness score (0.9597), outperforming baseline LLM responses. In a pilot user study with doctoral students, participants reported strong satisfaction with clarity (mean score: 3.60/4.0) and source attribution (92% accuracy). While not a complete solution for trustworthy AI, this work demonstrates how targeted design interventions—combining transparency and domain optimization—can enhance both trust and effectiveness in AI-assisted academic policy navigation.

Version published to 10.21203/rs.3.rs-7535973/v1 on Research Square
Sep 5, 2025

Prompt engineering meets ‘definition of the situation’ and identity theory: Using ChatGPT to study social media datasets from a qualitative symbolic interactionist perspective

This article has 4 authors:
1. J. Patrick Williams
2. Samuel Judah
3. Rolf Lyneborg Lund
4. Yu Xie
This article has no evaluationsLatest version Sep 16, 2025
The Art of Repair in Human-Agent Conversations: A Taxonomy of Repair Strategies by Users and LLM-Based Conversational Agents

This article has 6 authors:
1. Gunnar Stevens
2. Delong Korus-Du
3. Alexander Boden
4. Peter Tolmie
5. Dave Randall
6. Md Shajalal
This article has no evaluationsLatest version Sep 2, 2025
Addressing Trust Requirements in the Design of an Open-Source Multi-Agent LLM-Based Domain-Specific Chatbot

This article has 4 authors:
1. Jonatan Axetorn
2. Felix Edholm
3. Felix Dobslaw
4. Lucas Gren
This article has no evaluationsLatest version Sep 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Prompt engineering meets ‘definition of the situation’ and identity theory: Using ChatGPT to study social media datasets from a qualitative symbolic interactionist perspective

The Art of Repair in Human-Agent Conversations: A Taxonomy of Repair Strategies by Users and LLM-Based Conversational Agents

Addressing Trust Requirements in the Design of an Open-Source Multi-Agent LLM-Based Domain-Specific Chatbot