Retrospective Evaluation of a Generative AI-Enabled Electronic Medical Record System in Primary Health Care Facilities in Kenya

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We conducted a retrospective evaluation of an electronic medical record-embedded large language model (LLM) clinical decision support system deployed across 16 primary care clinics in Kenya, between July-September 2024. A panel of trained physicians reviewed 1,469 records. Hallucinations were uncommon (50/1,469; 3.4%), most often involving mis-expanded acronyms or drug names. Clinical management guidance aligned with local guidelines in almost all cases (approximately 100%). Despite this, clinicians did not modify documentation in 62% of encounters. Safety assessments identified actively harmful recommendations from the LLM in 7.8% of encounters, with 67 such recommendations appearing in the final documentation. Conversely, risk present in the clinician’s initial notes was fully mitigated in 118 encounters (8.0% overall; 12.1% of amended cases). Overall, the tool showed strong potential to support quality improvement, but the asymmetric adoption of harmful versus beneficial outputs underscore the need for usability optimization, local guardrails, and prospective trials to confirm patient-level benefit.

Article activity feed