Predicting Annotation Yield in Artificial Intelligence-Ranked Electronic Health Record Cohorts: A Regression-Based Framework for Efficient Manual Review

Assaf Landschaft
Leena Abdelmoity
Fatemeh Mohammad Alizadeh Chafjiri
Molly Ann Puckett
Jennifer Gettings
Tobias Loddenkemper

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Painstaking manual chart review of EHRs is still the chief bottleneck in retrospective studies, especially when rare-disease cohorts demand high specificity. Automated NLP rankers help, yet when trained on dated data they leave teams guessing how long to keep reviewing charts. We therefore present a regression-based ‘screening-saturation’ model that predicts residual yield at every point along the ranked list. Methods Leveraging a previously validated SVM that ranks notes for pediatric status epilepticus, we trained four predictive models: linear, polynomial, and support-vector regressions plus a lightweight neural net, on notes from 2013 and tested them on data from 2020. Our target was the proportion of true positives (ESE or RSE) expected below any score threshold. Results Polynomial regression offered the best balance of generalizability and interpretability, which demonstrated a strong predictive performance even under temporal data shifts. Regression outputs were used to simulate stopping rules for manual review, which captured 80% of positives after reviewing just 16.6% of notes (an 83% workload cut). Conclusion Our scalable, model-agnostic framework turns AI scores into actionable staffing decisions in clinical workflows. This screening-saturation model integrates with clinician-in-the-loop tools and adapts readily across medical domains that need lean chart review.

Version published to 10.21203/rs.3.rs-6966149/v1 on Research Square
Jul 23, 2025

Medical pre-training and fine-tuning improve large-language-model prediction of rheumatoid-arthritis disease activity

This article has 5 authors:
1. Suguru Honda
2. Katsunori Ikari
3. Mayuko Fujisaki
4. Eiichi Tanaka
5. Masayoshi Harigai
This article has no evaluationsLatest version Aug 8, 2025
Leveraging Dynamic Prompting for Outcome Prediction of Cancer Patients Using Large Language Models and Electronic Health Record Notes

This article has 11 authors:
1. Shreyas Anil
2. Bhumika Srinivas
3. Bo Liu
4. Anyi Li
5. Yannet Interian
6. William C. Chen
7. Nicolas D. Prionas
8. Julian C. Hong
9. Steve E. Braunstein
10. Olivier Morin
11. Hui Lin
This article has no evaluationsLatest version Sep 1, 2025
A Multi-Model Evaluation Framework for Accurate and Interpretable Heart Disease Prediction Using Ensemble Machine Learning and Low-Code Deployment Tools

This article has 1 author:
1. Mohammad Subhi Al-Batah
This article has no evaluationsLatest version Aug 7, 2025

Listed in

Abstract

Article activity feed

Related articles

Medical pre-training and fine-tuning improve large-language-model prediction of rheumatoid-arthritis disease activity

Leveraging Dynamic Prompting for Outcome Prediction of Cancer Patients Using Large Language Models and Electronic Health Record Notes

A Multi-Model Evaluation Framework for Accurate and Interpretable Heart Disease Prediction Using Ensemble Machine Learning and Low-Code Deployment Tools