Towards Inpatient Discharge Summary Automation via Large Language Models: A Multidimensional Evaluation with a HIPAA-Compliant Instance of GPT-4o and Clinical Expert Assessment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) have demonstrated potential to automate clinical documentation tasks that may reduce clinician burden, such as generation of hospital discharge summaries. Prior research used older LLMs and limited data, raising concerns about fabrications and omissions. In this study, we evaluated the automatic generation of inpatient Internal Medicine discharge summaries using a HIPAA-compliant Microsoft Azure instance of OpenAI’s GPT-4o. Both human-written and AI-generated discharge summaries were scored by Internal Medicine hospital faculty for quality, readability/conciseness, factuality and completeness, presence of hallucinations/omissions and their impact on safety, and compared with the actual discharge summaries. Our results showed that the AI-generated discharge summaries significantly outperformed actual human written summaries in both quality and readability/conciseness and were comparable to humans in factuality and completeness, with a minimal cost.