AutoPsychDx: An LLM Agent Framework for Automated Psychometric Diagnosis Using Multi-Method Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Clinical screening with self-report instruments such as the PHQ-9 typically relies on a single psychometric method — most commonly, a sum-score cut-off. Yet this "one-size-fits-all" approach treats all items as equally informative, ignores measurement error, and can yield prevalence estimates that diverge substantially from those produced by latent-variable alternatives such as Item Response Theory (IRT) and Diagnostic Classification Models (DCMs). Reconciling results across methods demands expertise in multiple analytic frameworks and substantial manual effort, limiting methodological triangulation in practice. This paper introduces AutoPsychDx, an open-source framework that integrates a large language model (LLM) agent with psychometric software to automate the entire diagnostic pipeline: data validation, model fitting, cross-method comparison, and report generation. Given a project folder containing item metadata and response data, a single terminal command triggers a Claude Code agent that generates and executes R scripts for sum-score cut-off, IRT (Graded Response Model), and DCM (General Diagnostic Model), computes a consensus diagnosis (majority vote across methods), and writes a structured markdown report with prevalence tables, method agreement statistics, and plain-language clinical interpretation. The framework is demonstrated with the PHQ-9 depression screener from the Forbes et al. (2018) community sample (*N* = 403). Results show that the moderate cut-off ($\geq$ 10) classified 29.8% as positive, the mild cut-off ($\geq$ 5) classified 69.2%, and the latent-variable methods fell between these extremes (IRT: 52.1%; DCM: 59.6%), with consensus classification at 51.4% and 123 ambiguous cases flagged for review. Every cut-off-positive case ($\geq$ 10) was also DCM-positive, but the DCM identified an additional 120 individuals below the sum-score threshold — consistent with prior findings that DCM-based screening captures cases missed by traditional cut-offs. AutoPsychDx is instrument-agnostic, pip-installable, and extensible to any polytomous or binary self-report scale.