AutoPsychDx: An LLM Agent Framework for Automated Psychometric Diagnosis Using Multi-Method Classification

Jihong Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Clinical screening with self-report instruments such as the PHQ-9 typically relies on a single psychometric method — most commonly, a sum-score cut-off. Yet this "one-size-fits-all" approach treats all items as equally informative, ignores measurement error, and can yield prevalence estimates that diverge substantially from those produced by latent-variable alternatives such as Item Response Theory (IRT) and Diagnostic Classification Models (DCMs). Reconciling results across methods demands expertise in multiple analytic frameworks and substantial manual effort, limiting methodological triangulation in practice. This paper introduces AutoPsychDx, an open-source framework that integrates a large language model (LLM) agent with psychometric software to automate the entire diagnostic pipeline: data validation, model fitting, cross-method comparison, and report generation. Given a project folder containing item metadata and response data, a single terminal command triggers a Claude Code agent that generates and executes R scripts for sum-score cut-off, IRT (Graded Response Model), and DCM (General Diagnostic Model), computes a consensus diagnosis (majority vote across methods), and writes a structured markdown report with prevalence tables, method agreement statistics, and plain-language clinical interpretation. The framework is demonstrated with the PHQ-9 depression screener from the Forbes et al. (2018) community sample (*N* = 403). Results show that the moderate cut-off ($\geq$ 10) classified 29.8% as positive, the mild cut-off ($\geq$ 5) classified 69.2%, and the latent-variable methods fell between these extremes (IRT: 52.1%; DCM: 59.6%), with consensus classification at 51.4% and 123 ambiguous cases flagged for review. Every cut-off-positive case ($\geq$ 10) was also DCM-positive, but the DCM identified an additional 120 individuals below the sum-score threshold — consistent with prior findings that DCM-based screening captures cases missed by traditional cut-offs. AutoPsychDx is instrument-agnostic, pip-installable, and extensible to any polytomous or binary self-report scale.

Version published to 10.31234/osf.io/6fw74_v2 on OSF Preprints
Mar 25, 2026
Version published to 10.31234/osf.io/6fw74_v1 on OSF Preprints
Mar 24, 2026

AutoPsychDx: An LLM Agent Framework for Automated Psychometric Diagnosis Using Multi-Method Classification

This article has 1 author:
1. Jihong Zhang
This article has no evaluationsLatest version Mar 25, 2026
Rethinking Medical LLM Hallucinations: A System-Level Survey

This article has 4 authors:
1. Asha Matthews
2. Vijay Vankadaru
3. Tanya Roosta
4. Peyman Passban
This article has no evaluationsLatest version Mar 23, 2026
The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

This article has 4 authors:
1. Michael Williams
2. Raeed Kabir
3. Cody Taylor
4. Tariq Nakhooda
This article has no evaluationsLatest version Apr 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AutoPsychDx: An LLM Agent Framework for Automated Psychometric Diagnosis Using Multi-Method Classification

Rethinking Medical LLM Hallucinations: A System-Level Survey

The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective