Unsupervised Statisticians Ignoring Orders: An Investigation into Methodological Assumptions for Biological Psychiatry with UK Biobank Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ordinal data, especially from Likert-scale instruments, are common in biological psychiatry and related fields but pose challenges for statistical modeling due to their discrete and ordered nature. This study investigates the implications of model specification in unsupervised learning for ordinal data, focusing on clustering, exploratory factor analysis (EFA), and network analysis applied to two UK Biobank data sets: the PHQ-9 and a depression-focused instrument derived from the CIDI-SF. We compare the effects of likelihood specification (Gaussian, Multinomial, Ordinal) in model-based clustering and correlation measure choice (Pearson, Spearman, Kendall, Polychoric/Mixed) in EFA and network analysis. We find that theoretically optimal methods (e.g., ordinal likelihoods and polychoric correlations) often underperform or become unstable in large-scale data, while pragmatic alternatives (e.g., Spearman's rho or multinomial likelihoods) offer superior computational stability and interpretability. Our results highlight the trade-offs between statistical fidelity and computational feasibility and caution against uncritical use of either convenience-based or theory-driven assumptions. These findings underscore the need for more robust, scalable tools for unsupervised learning with ordinal data in psychiatric research.

Article activity feed