The Synthetic Nomological Net: A search engine to identify conceptual overlap in measures in the behavioral sciences

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Every month, scores of new measures and constructs enter the behavioral science literature. Researchers struggle to keep an overview of their subfields, let alone the wider field, with an estimated number of allegedly different constructs now exceeding 30,000. Proposing constructs is easy, finding redundancies and overlaps is not. The effort of finding redundancies using participant responses grows quadratically with the number of measures. This cost motivates the use of language models, which can simplify and speed up the process without requiring new data collection. Earlier work employed latent semantic analysis (LSA) to make use of the textual nature of questionnaires in the social and behavioral sciences. LSA's reliance on word co-occurrences, however, can produce misleading similarity estimates that fall short of accuracy and may even reproduce the very jingle-jangle fallacies it aims to detect. Recent advances demonstrate that transformer-based sentence embedding models offer a more viable solution for predicting empirical relatedness between survey items. In the presented work, we use a fine-tuned language model, the SurveyBot3000, to introduce the Synthetic Nomological Net, an open-access web application for systematic detection of conceptual overlap. The application indexes over 470,000 items across 74,000 scales from approximately 31,500 instruments in the APA PsycTests database.

Article activity feed