Enhancing Predictive Modeling for Respiratory Support with LLM-Driven Guideline Adherence
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Optimal respiratory support selection between high-flow nasal cannula (HFNC) and noninvasive ventilation (NIV) for intensive care units (ICU) patients at risk of invasive mechanical ventilation (IMV) remains unclear, particularly in cases not represented in prior clinical trials. We previously developed RepFlow-CFR, a deep counterfactual model estimating individualized treatment effects (ITE) of HFNC versus NIV. However, interpretability and guideline alignment remain challenges for clinical adoption. This study describes the development and integration of a clinical guideline-driven LLM to enhance deep counterfactual model recommendations for NIV versus HFNC in patients at high-risk for invasive mechanical ventilation. Methods We enhanced RepFlow-CFR by incorporating a large language model (LLM, Claude 3.5 Sonnet) to enforce clinical guideline adherence and generate explainable treatment recommendations. The LLM was configured in a HIPAA-compliant AWS environment and prompted using structured patient data, clinical notes, and formal guideline criteria. Recommendations from RepFlow-CFR and LLM were compared to actual treatment decisions to assess concordance. We evaluated IMV and mortality/hospice rates across concordant and discordant groups. Additionally, we conducted a structured chart review of 20 cases to assess the clinical validity and safety of LLM-driven recommendations. Results Among 1,261 ICU encounters, treatments concordant with LLM-enhanced recommendations were associated with significantly lower IMV rates (e.g., 24.47% when concordant versus 52.94% when discordant with the HFNC recommendation, corresponding to a 97.33% relative risk increase when discordant) and reduced odds of mortality or hospice discharge (odds ratio = 0.670, p = 0.046). In the chart review, 95% of LLM recommendations aligned with clinical guidelines, and physicians agreed with 65% of final recommendations. Errors were noted in 11/20 cases, with most deemed low or moderate risk; only 2 were rated as potentially causing severe harm. Conclusions Integrating LLMs for guideline enforcement improves the interpretability and clinical alignment of counterfactual models in respiratory support decision-making. This hybrid framework not only enhances concordance with real-world practice but may also improve patient outcomes. Future work will refine contraindication detection and expand validation to prospective clinical trials.