Unsupervised Statisticians Ignoring Orders: An Investigation into Methodological Assumptions for Biological Psychiatry with UK Biobank Data

Owen Matthew Truscott Thomas
Guro Pauck Bernhardsen
valeria vitelli
Soili Lehto

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Ordinal data, especially from Likert-scale instruments, are common in biological psychiatry and related fields but pose challenges for statistical modeling due to their discrete and ordered nature. This study investigates the implications of model specification in unsupervised learning for ordinal data, focusing on clustering, exploratory factor analysis (EFA), and network analysis applied to two UK Biobank data sets: the PHQ-9 and a depression-focused instrument derived from the CIDI-SF. We compare the effects of likelihood specification (Gaussian, Multinomial, Ordinal) in model-based clustering and correlation measure choice (Pearson, Spearman, Kendall, Polychoric/Mixed) in EFA and network analysis. We find that theoretically optimal methods (e.g., ordinal likelihoods and polychoric correlations) often underperform or become unstable in large-scale data, while pragmatic alternatives (e.g., Spearman's rho or multinomial likelihoods) offer superior computational stability and interpretability. Our results highlight the trade-offs between statistical fidelity and computational feasibility and caution against uncritical use of either convenience-based or theory-driven assumptions. These findings underscore the need for more robust, scalable tools for unsupervised learning with ordinal data in psychiatric research.

Version published to 10.31234/osf.io/24eus_v1 on OSF Preprints
Jul 25, 2025

Regression-based Modeling of Spearman’s Rho for Longitudinal Metabolomics and Mental Wellness in Breast Cancer Patients

This article has 12 authors:
1. Y. Chen
2. T.T. Gui
3. Z. Huang
4. N.E. Quach
5. S. Tu
6. J. Liu
7. T.J. Garrett
8. A.R. Starkweather
9. D.E. Lyon
10. B.E. Shepherd
11. X.M. Tu
12. T. Lin
This article has no evaluationsLatest version Apr 16, 2026
Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

This article has 7 authors:
1. Solomon Beer
2. Andrew J. Simpkin
3. Sherief Y. Eldeeb
4. Heather J Zar
5. Dan J Stein
6. Erin C. Dunn
7. Andrew D.A.C. Smith
This article has no evaluationsLatest version Jun 6, 2026
A Beta-Binomial Model for Estimating Zero- or One-inflated Pain Trajectories

This article has 7 authors:
1. Yanxi Liu
2. Richard E. Harris
3. Daniel Clauw
4. Emine Bayman
5. Andrew Leroux
6. Martin A. Lindquist
7. the A2CPS Consortium
This article has no evaluationsLatest version May 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Regression-based Modeling of Spearman’s Rho for Longitudinal Metabolomics and Mental Wellness in Breast Cancer Patients

Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

A Beta-Binomial Model for Estimating Zero- or One-inflated Pain Trajectories