Encoding of pretrained large language models mirrors the genetic architectures of human psychological traits
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advances in large language models (LLMs) have prompted a frenzy in utilizing them as universal translators for biomedical terms. However, the black box nature of LLMs has forced researchers to rely on artificially designed benchmarks without understanding what exactly LLMs encode. We demonstrate that pretrained LLMs can already explain up to 51% of the genetic correlation between items from a psychometrically-validated neuroticism questionnaire, without any fine-tuning. For psychiatric diagnoses, we found disorder names aligned better with genetic relationships than diagnostic descriptions. Our results indicate the pretrained LLMs have encodings mirroring genetic architectures. These findings highlight LLMs’ potential for validating phenotypes, refining taxonomies, and integrating textual and genetic data in mental health research.