Evaluating AI-Generated Summaries: A Human vs. Machine Study in the Context of Employee Feedback
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study investigates the effectiveness of an automatic text summarisation (ATS) modelcompared to subject-matter expert (SME) summaries in the context of employee feedbackanalysis. Drawing on qualitative data from 113 employee responses, three SMEs manuallyproduced comprehensive summaries using established qualitative techniques, while the ATSmodel generated an alternative, machine-created summary. A total of 200 participantsevaluated these four summaries on linguistic quality, adequacy, overall quality, andperceived likelihood of AI authorship. Results revealed that participants could not reliablydistinguish the AI-generated summary from those crafted by SMEs, effectively passing theTuring test. Notably, participants showed a clear preference for the ATS-generatedsummary, which matched, or in some cases surpassed, certain SME summaries in ratings oflinguistic quality and adequacy. This suggests strong potential for leveraging automatedtechniques in large-scale feedback processing. Further analyses indicated that trust in AImoderated perceptions: participants more inclined to trust AI rated suspected AI text morefavourably, while sceptics devalued it. Conversely, higher education and extensive AIexperience did not consistently enhance the ability to detect machine authorship. Thesefindings highlight the promise of AI-driven summarization for reducing manual workloads,although challenges around transparency and user acceptance remain. Future researchshould expand across diverse contexts and models, incorporating ethical considerations tooptimize the integration of ATS tools in organizational settings.