Sociodemographic Bias in Large Language Model Clinical Trial Screening

Shelly Soffer
Mahmud Omar
Orly Efros
Donald U. Apakama
Aya Mudrik
Robert Freeman
Girish N Nadkarni
Eyal Klang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Large language models (LLMs) are increasingly used in randomized clinical trial (RCT) screening, but their potential for sociodemographic bias remains unclear.

Objective

To determine whether LLM-based trial screening judgments vary with patient sociodemographic characteristics when clinical details and eligibility criteria are held constant.

Design, Setting, and Participants

Cross-sectional evaluation of Phase II–III RCT protocols from ClinicalTrials.gov (U.S. adult populations; 2023–2024). For each protocol, we created 15 physician-validated clinical vignettes rendered in 34 versions: one control (no identifiers) and 33 identity variants spanning gender, race/ethnicity, socioeconomic status, homelessness, unemployment, and sexual orientation.

Exposures

Identity labels applied to otherwise identical vignettes, evaluated by nine contemporary LLMs.

Main Outcomes and Measures

Primary eligibility domain score (1–5 Likert scale) comparing identity variants versus control. Secondary: adherence, resources, risk–benefit, and trust/attitude domains. Mixed-effects models estimated adjusted mean differences with multiplicity-corrected P values; differences <.10 considered trivial.

Results

Of 69 protocols, 58 met inclusion criteria. Analysis of 5,324,400 model evaluations showed eligibility judgments were largely stable: most identity-related differences fell within ±0.05 (transgender woman −.008 [95% CI −.04 to .02]; White male .036 [.01 to .07]). Only homelessness exceeded the trivial threshold (−.121 [−.15 to −.09], P<.001). Secondary domains revealed socioeconomic gradients, particularly for adherence (homeless −.595, P<.001) and resources (homeless −.715, P<.001), with smaller trust/attitude effects and negligible risk–benefit differences.

Conclusions and Relevance

Bias in LLM–assisted trial screening is conditional. Within fixed criteria, models reason consistently; outside them, they echo the inequities of their data. Responsible deployment in clinical research depends on preserving that boundary so that automation strengthens fairness in trial access rather than inheriting distortion.

Version published to 10.1101/2025.11.15.25340177 on medRxiv
Nov 17, 2025

Sociodemographic and Clinical Predictors of Chronic Disease Outcomes in a Colombian Population: A Cross-Sectional Analysis of 2495 Patients

This article has 6 authors:
1. Adriana Guzmán Sánchez
2. Lilibeth Sánchez-Guette
3. Armando Monterrosa-Quintero
4. Yaneth Herazo-Beltrán
5. Narledis Nuñez-Bravo
6. Carlos Andrés Collazos Morales
This article has no evaluationsLatest version Dec 18, 2025
Large language models for depression assessment from brief daily diaries

This article has 6 authors:
1. Whitney R. Ringwald
2. Aman Taxali
3. Mike Angstadt
4. Colin Vize
5. Chandra Sripada
6. Aidan G.C. Wright
This article has no evaluationsLatest version Jan 6, 2026
University Students’ Mental Health in 2025: Generation Overwhelmed?

This article has 5 authors:
1. Aneliana da Silva Prado
2. Elisabeth Kohls
3. Juliane Hug
4. Konrad Jakob Endres
5. Christine Rummel-Kluge
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Background

Objective

Design, Setting, and Participants

Exposures

Main Outcomes and Measures

Results

Conclusions and Relevance

Article activity feed

Related articles

Sociodemographic and Clinical Predictors of Chronic Disease Outcomes in a Colombian Population: A Cross-Sectional Analysis of 2495 Patients

Large language models for depression assessment from brief daily diaries

University Students’ Mental Health in 2025: Generation Overwhelmed?