A Systematic Process for Assessing Fitness-for-Purpose of Health Outcomes for Computable Phenotyping with Electronic Health Record Data

Nicole M. Gatto
David J. Cronkite
Paige D. Wartko
Robert Ball
David S. Carrell
Rhoda Eniafe
Rishi Desai
James S. Floyd
Terrence Lee
Jennifer C. Nelson
Fatma M. Shebl
Ryan Schoeplein
Sengwee Toh
Mingfeng Zhang
Sascha Dublin
José J. Hernández-Muñoz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

Information from electronic health records (EHRs) may be incorporated into computable phenotype algorithms in efforts to overcome inaccuracies of algorithms based on administrative claims data alone. However, such efforts can be resource-intensive and unsuccessful. Assessing the feasibility of computable phenotyping for a health outcome of interest (HOI) before proceeding is therefore recommended.

Methods

We developed a systematic fitness-for-purpose (FFP) assessment process to implement concepts outlined in a previously described general framework for computable phenotyping incorporating EHR data. Our process includes verifying the HOI is well-defined, reviewing clinical information about the HOI, identifying existing algorithms and their performance, evaluating HOI clinical and data complexity, and determining an overall FFP conclusion and recommendation. We applied this process to ten HOIs lacking high-performing claims-based algorithms, selecting HOIs of public health importance that varied in clinical and data complexity, including neutropenia, pericardial effusion and drug-induced liver injury.

Results

HOIs assessed as having moderate (vs. easy) overall difficulty had characteristics such as the need for natural language processing, integration of multiple laboratory test results, or longitudinal EHR data. HOIs assessed as having high difficulty required using data from multiple EHR sources, ruling out many other potential causes, or relying on low-sensitivity diagnostic tests. Input from experts in EHR data and clinical care was crucial.

Conclusion

EHR data have potential to enhance accuracy of defining certain HOIs for research and surveillance compared to administrative claims data. The process and tools we created will support others in assessing FFP of HOIs for computable phenotyping.

Five key points

Incorporating electronic health record (EHR) data into computable phenotypes could improve accurate identification of health outcomes of interest (HOIs), but such work can be resource intensive.
We developed a systematic fitness-for-purpose (FFP) process and tools to assess the feasibility of computable phenotyping for HOIs.
Steps include identifying existing algorithms and their performance, ensuring the HOI is well-defined, evaluating clinical and data complexity, and determining a feasibility recommendation.
Difficulty increased with a need for natural language processing, multiple laboratory tests, longitudinal EHR data, multiple EHR sources or ruling out other potential causes.
Input from EHR data and clinical care experts was crucial to the FFP assessment process.

Plain Language Summary (PLS)

Attempts to identify diseases and health conditions by applying computer programs to information easily gleaned from insurance claims of tens of thousands of patients (such as FDA’s ongoing safety monitoring of approved drugs or medical products) are often unsuccessful because the data lack nuance. Incorporating information from electronic health records (EHR) and patient chart notes may improve accurate identification of health outcomes. Because this can be resource-intensive, we designed a process and tools to assess the feasibility of including EHR data in computer algorithms to identify health outcomes. Steps included identifying existing algorithms and their performance, building familiarity with the outcome and making sure it is well-defined, evaluating clinical and data complexity, and determining a conclusion about feasibility. We applied our process to ten health outcomes of public health importance. Health outcomes were considered moderately difficult for computerized algorithms if they required natural language processing, integration of multiple laboratory tests, or EHR data from multiple timepoints. Health outcomes having high difficulty required using multiple EHR data types, ruling out many alternative causes of the HOI (other than medications), or relying on diagnostic tests of low accuracy. Input from EHR data and clinical care experts was crucial for the assessment process.

Version published to 10.1101/2025.08.29.25334394 on medRxiv
Sep 4, 2025

Systematic review: the integration and interpretation of Social Determinants of Health (SDH) in Digital Phenotyping research (DP)

This article has 20 authors:
1. Tihare Zamorano
2. Ashley Choucroun
3. Parnia Akhavansaffar
4. Marianne Pouliot
5. Philomène Labilloy
6. Matthew Raymond
7. Timothy Friesen
8. Vincent Paquin
9. Youcef Barkat
10. Saiyara Islam
11. Hilla Abehsera
12. Brianna Beesley
13. Sara Jalali
14. Langfan Chen
15. Axel Constant
16. Delphine Vandycke
17. Ian Matthew Raugh
18. Laurence Kirmayer
19. Jai Shah
20. David Benrimoh
This article has no evaluationsLatest version Dec 18, 2025
Appropriateness and Utility of a Clinical Decision Support System at the Digital Front Door

This article has 11 authors:
1. Andreia Pimenta
2. Nisha Kini
3. Fabienne Cotte
4. Filipa Dias Lourenço
5. Miguel Paiva Pereira
6. Marcel Schmude
7. Athena Lemesiou
8. Stephen Gilbert
9. Tauseef Mehrali
10. Micaela Seemann Monteiro
11. Pedro Flores
This article has no evaluationsLatest version Jan 8, 2026
Health Economic Evaluations of Genomic Newborn Screening: Approaches by studies within the International Consortium on Newborn Sequencing

This article has 20 authors:
1. Hadley Smith
2. Martin Vu
3. Tamara Dangouloff
4. Camille Schubert
5. Camille Level
6. Ramesh Lamsal
7. Kurt Christensen
8. Zornitza Stark
9. Ilias Goranitis
10. Matthew Aujla
11. Thomas Westover
12. Amy Ponte
13. Nidhi Shah
14. Laurent Servais
15. Miranda Bailey
16. Tara Lavelle
17. Scott Grosse
18. Sarah Norris
19. ICoNS Economics Subcommittee
20. James Buchanan
This article has no evaluationsLatest version Jan 14, 2026

Discuss this preprint

Listed in

Abstract

Purpose

Methods

Results

Conclusion

Five key points

Plain Language Summary (PLS)

Article activity feed

Related articles

Systematic review: the integration and interpretation of Social Determinants of Health (SDH) in Digital Phenotyping research (DP)

Appropriateness and Utility of a Clinical Decision Support System at the Digital Front Door

Health Economic Evaluations of Genomic Newborn Screening: Approaches by studies within the International Consortium on Newborn Sequencing