Combination AI-Machine Learning to Diagnose Pulmonary Hypertension: A Real-World Evidence Cohort Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
BACKGROUND
Pulmonary hypertension (PH) is a highly morbid disease, but underdiagnosis is common outside of expert referral centers. Consequentially, there may be opportunities to automate PH diagnosis using artificial intelligence (AI) clinical decision support tools. Analysis of patient-level right heart catheterization (RHC) data is required to optimize AI-based PH diagnosis but has not been reported previously.
METHODS
We performed a retrospective cohort analysis of all RHC studies (January 1, 2016 to December 31, 2024) performed at the University of Maryland Medical System (UMMS), which is a Maryland statewide clinical network of 12 hospitals serving >2 million patients. We developed an automated large language model (LLM)-driven Pattern Repository (LDPR) method, featuring three task-specific LLM agents for extracting unstructured RHC data, which was manually cross-validated independently by two PH experts. To address data missingness, we used machine-learning to develop formulae to calculate mean pulmonary artery pressure (mPAP) from systolic (sPAP) and diastolic (dPAP) PAP, using an 80/20 train-test split.
RESULTS
The study cohort included N=11,029 unique patients and 17,292 RHC reports (age 66±13.5 years; 43% female; 65% White, 30% Black or African American; mPAP, 28±11mmHg; 26% congestive heart failure). The precision for accurate mPAP, sPAP, and dPAP extraction by the LLM was 99.6%, 99.4%, and 99.4%, respectively, with a detection failure of 0.4%. A missing mPAP was noted in N=548 cases and N=507 unique patients (3.2% and 4.6%, respectively). When applying ML to the dataset, the simple, linear equation: mPAP=1.51+0.43*sPAP+0.45*dPAP returned the highest R2 of 0.94 and lowest mean square error of 8.3 mmHg, which outperformed linear equations used currently (all p<0.001). The ML-derived formula was then directed to patients with missing mPAP (N=507) and identified N=382 patients (75.3%) with mPAP >20mmHg, and therefore reclassifying patients from no diagnosis to a diagnosis of PH.
CONCLUSION
In this retrospective cohort analysis, combination LLM-ML-based extraction and interpretation of RHC was used to automate PH diagnosis in a large and heterogenous patient population. This approach is an efficient and scalable solution to preventing under-diagnosis of PH and demonstrates the feasibility of generative AI for advancing clinically-actionable tools that can improve cardiovascular disease phenotyping and diagnosis in real-world settings.