Application of Machine Learning Models Integrating Clinical and Echocardiography in the Prediction of Mean Pulmonary Artery Pressure Grading

Xinpeng Dai
Qiumeng Xi
Jiayi He
Rui Fan
Xinyuan Zhang
Dichen Guo
Juanni Gong
Suqiao Yang
Yuanhua Yang
Yidan Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background As a progressive cardiopulmonary disorder, pulmonary hypertension (PH) necessitates precise assessment of mean pulmonary arterial pressure (mPAP) for clinical staging, treatment planning, and prognostic evaluation. Methods We retrospectively included patients who underwent right heart catheterization (RHC) at our institution between January 2017 and October 2025. The cohort was divided temporally into a training cohort (January 2017 to December 2023) and a validation cohort (January 2024 to October 2025). Echocardiographic parameters and clinical data were collected. A four-category label was constructed based on mPAP grading (0–20 mmHg, 21–35 mmHg, 36–45 mmHg, > 45 mmHg). Key features were selected using Lasso combined with the Boruta method. The Synthetic Minority Over-sampling Technique (SMOTE) balanced the training cohort sample distribution. Ultimately, eight machine learning (ML) models were constructed and their performance evaluated. Model performance was assessed using area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, F1 score, and Brier score. Feature importance for predictive models was interpreted using SHapley Additive exPlanations (SHAP) values. Results A total of 495 patients were included in model construction. Six features were selected from 29 variables for model training: 6-Minute Walk Distance (6MWD), Eccentricity Index (EI), Left Ventricular Diameter (LVD), Right Ventricular Diameter (RVD), Tricuspid Annular Plane Systolic Excursion/Pulmonary Artery Systolic Pressure (TAPSE/PASP), and PASP. Among all ML models, the Naive Bayes model achieved the highest classification accuracy, with an AUC of 0.886, accuracy of 0.736, Brier score of 0.106, and F1 score of 0.736. Its AUC within the training cohort reached 0.894. Furthermore, the mean AUC values for different mPAP classifications were 0.994, 0.878, 0.779, and 0.892, respectively. SHAP value analysis confirmed that TAPSE/PASP was the primary predictive feature for mPAP classification, followed by PASP and EI. These three features demonstrated consistent performance across all subgroups. Conclusions The non-invasive predictive model developed in this study provides a reliable tool for the precise classification of mPAP in PH patients, thereby assisting clinicians in reducing reliance on invasive RHC.

Version published to 10.21203/rs.3.rs-8424355/v1 on Research Square
Mar 31, 2026

Multimodal Model for Predicting Exercise-induced pulmonary hypertension Validated by Invasive Exercise Hemodynamics: A Prospective Study

This article has 10 authors:
1. Xinpeng Dai
2. Rui Fan
3. Junwei Zhang
4. Dichen Guo
5. Qiumeng Xi
6. Jiayi He
7. Juanni Gong
8. Suqiao Yang
9. Yuanhua Yang
10. Yidan Li
This article has no evaluationsLatest version Apr 9, 2026
Predicting Mortality and Risk Factors in Cystic Fibrosis Using a Boruta- Enhanced Machine Learning Pipeline: Comparative Evaluation of Ensemble and Penalized Regression Models

This article has 4 authors:
1. Farzaneh Hamidi
2. Anoshirvan Kazemnejad
3. Maryam Hassanzad
4. Mina Jahangiri
This article has no evaluationsLatest version Mar 27, 2026
Explainable Machine Learning Model for Predicting Early Neurological Deterioration in Patients with Acute Ischemic Stroke

This article has 3 authors:
1. Tingting Huang
2. Shoucai Zhao
3. Kai Wang
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multimodal Model for Predicting Exercise-induced pulmonary hypertension Validated by Invasive Exercise Hemodynamics: A Prospective Study

Predicting Mortality and Risk Factors in Cystic Fibrosis Using a Boruta- Enhanced Machine Learning Pipeline: Comparative Evaluation of Ensemble and Penalized Regression Models

Explainable Machine Learning Model for Predicting Early Neurological Deterioration in Patients with Acute Ischemic Stroke