Global metadata catalogue of youth cohorts with genetic and aggression-related data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Aggressive behavior in youth is partially heritable and clinically and socially important. Despite its importance, research is fragmented across disciplines and cohorts, limiting large-scale genetically informed analyses. Therefore, we systematically mapped existing cohorts that include measures of aggression and genetic data from participants under 18 years of age. We conducted a systematic search of PubMed using 5,400 automatically generated queries combining five controlled vocabularies (age, mental health, aggression, data type, study type). From 5,254 unique records (retrieved January 20, 2025), a large language model pipeline extracted explicitly named cohorts, followed by semi-automated alias deduplication. Human reviewers then screened 188 studies using standardized inclusion criteria: documented accessibility, available genetic data, sample size > 1,000, inclusion of minors < 18 years old, and validated aggression phenotypes. Forty-four cohorts met all criteria, representing ∼ 890,000 individuals worldwide. Technical validation assessed the reproducibility of the automated name extraction and inter-rater agreement. The catalog supports discovery of cohorts and facilitates future cross-cohort studies of aggression-related outcomes in youth.