Development and validation of a scale assessing perceived trustworthiness in large language models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) are increasingly part of everyday life, yet there is no established way to measure how users evaluate their trustworthiness. This study introduces the Perceived Trustworthiness of LLMs scale (PT-LLM-8), developed from the TrustLLM framework and adapted as a human-centred measure. The scale was designed to measure the perceived trustworthiness of a user’s primary LLM and assesses eight dimensions: truthfulness, safety, fairness, robustness, privacy, transparency, accountability, and compliance with laws. Psychometric properties of the scale were tested with 752 LLM users in the United Kingdom (Mean age = 28.58, SD = 6.11, 50.3% males, 48.8% females). The PT-LLM-8 functions as a unidimensional measure with high internal consistency (Cronbach’s alpha = 0.90, Composite Reliability = 0.91, strong item-total correlations (ranged between 0.62–0.75), and measurement invariance across gender. The measure of perceived trustworthiness of LLM that can be applied as an overall score, along with item-level responses when insight into specific dimensions is needed. For researchers, practitioners, and developers, the PT-LLM-8 offers a practical instrument for evaluating interventions, comparing groups and contexts, and examining whether technical safeguards are reflected in users’ perceived trustworthiness of LLM. The scale can also be applied to guide system design, support policy development, and help organisations monitor shifts in user trust toward LLMs over time, making it applicable across research, practice, and governance.

Article activity feed