Development and validation of electronic health record-based ascertainment of obsessive-compulsive disorder cases and controls

Bo Wang
Tyne W. Miller-Fleming
Dongmei Yu
Donald Hucks
Emily Gantz
Rebecca Johnston
Angela Maxwell-Horn
Nancy Cox
James Sutcliffe
Carol A. Mathews
Evonne McArthur
Helen Hatfield
Dia Kabir
Evan J. Giangrande
Rebecca G. Fortgang
Shirley B. Wang
Rakesh Karmacharya
Joshua L. Roffman
Jeremiah M. Scharf
Jordan W. Smoller
Takahiro Soda
James J. Crowley
Lea K. Davis

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objectives

Obsessive-compulsive disorder (OCD) is a common psychiatric disorder, with two-thirds of affected individuals reporting severe impairment. Despite its substantial burden and moderate heritability, its etiology remains poorly understood, and treatments are often suboptimal. While recent genome-wide association studies (GWAS) have identified some risk loci, yet OCD remains in the linear phase of sample collection to variant association, with many more OCD-associated variants left to discover. This study aimed to develop and validate an electronic health record (EHR)-based algorithm to identify OCD cases and facilitate large-scale genetic studies.

Methods

We leveraged EHR-linked biobank data from two large hospital systems, namely Vanderbilt University Medical Center (VUMC) and Mass General Brigham (MGB), to develop a high-throughput phenotyping algorithm integrating diagnostic codes, medication records, and natural language processing (NLP) of clinical notes. Algorithm performance was evaluated through expert chart review, and genetic validation was performed using OCD polygenic risk scores (PRS).

Results

Expert chart reviews found that our algorithm combining both ICD codes and NLP achieved higher positive predictive values (PPV) for OCD cases (0.84 at VUMC and 0.91 at MGB) compared to using either ICD codes or NLP alone, albeit with a lower case yield.

Furthermore, at both sites, algorithm-determined cases exhibited significantly elevated PRS derived from the latest OCD GWAS, providing genetic validation of our phenotyping approach.

Conclusion

Our study demonstrates a scalable and cost-efficient approach for EHR-based ascertainment of OCD cases, facilitating large-scale genetic studies and advancing understanding of the disorder’s complex etiology.

Version published to 10.1101/2025.08.05.25332874 on medRxiv
Aug 7, 2025

Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records

This article has 10 authors:
1. Clara Frydman-Gani
2. Alejandro Arias
3. Maria Perez Vallejo
4. John Daniel Londoño Martínez
5. Johanna Valencia-Echeverry
6. Mauricio Castaño
7. Alex A. T. Bui
8. Nelson B. Freimer
9. Carlos Lopez-Jaramillo
10. Loes M. Olde Loohuis
This article has no evaluationsLatest version Aug 12, 2025
Polygenic Risk Scores for Pediatric Obsessive-Compulsive Symptoms and their Mediating Effect in Clinically Diagnosed Samples of Obsessive-Compulsive Disorder, Attention-Deficit/Hyperactivity Disorder, Anxiety, Depression, Autism and Tourette syndrome

This article has 11 authors:
1. Lilit Antonyan
2. S-M Shaheen
3. Christie Burton
4. William Gehring
5. Noam Soreni
6. Pamela Falzarano Szura
7. Julia Bellamy
8. Usha Rajan
9. David Rosenberg
10. Gregory Hanna
11. Paul Arnold
This article has no evaluationsLatest version Aug 6, 2025
Cross-sectional and longitudinal comparison of commonly used screening tools for bipolar disorders

This article has 6 authors:
1. Anna Tröger
2. yiqi zeng
3. Thomas Richardson
4. Emma Claire Palmer-Cooper
5. Allan H Young
6. Becci Strawbridge
This article has no evaluationsLatest version Sep 6, 2025

Listed in

Abstract

Objectives

Methods

Results

Conclusion

Article activity feed

Related articles

Large Language Models for Psychiatric Phenotype Extraction from Electronic Health Records

Polygenic Risk Scores for Pediatric Obsessive-Compulsive Symptoms and their Mediating Effect in Clinically Diagnosed Samples of Obsessive-Compulsive Disorder, Attention-Deficit/Hyperactivity Disorder, Anxiety, Depression, Autism and Tourette syndrome

Cross-sectional and longitudinal comparison of commonly used screening tools for bipolar disorders