Integrating Agentic AI to Automate ICD-10 Medical Coding

Kitti Akkhawatthanakun
Lalita Narupiyakul
Konlakorn Wongpatikaseree
Narit Hnoohom
Chakkrit Termritthikun
Paisarn Muneesawang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automating ICD-10 coding from discharge summaries remains demanding because coders analyze clinical narratives while justifying decisions. This study compares three automation patterns: PLM-ICD as a standalone deep learning system emitting 15 codes per case, LLM-only generation with full autonomy, and a hybrid approach where PLM-ICD drafts candidates for an agentic LLM filter to accept or reject. All strategies were evaluated on 19,801 MIMIC-IV summaries using four LLMs spanning compact (Qwen2.5-3B, Llama-3.2-3B, Phi-4-mini) through large scale (Sonnet-4.5). Precision guided evaluation because coders still supply any missing diagnoses. PLM-ICD alone reached 55.8% precision while always surfacing 15 suggestions. LLM-only generation lagged severely (1.5--34.6% precision) and produced inconsistent output sizes. The agentic filter delivered the best trade-off: compact LLMs reviewed the 15 candidates, discarded weak evidence, and returned 2--8 high-confidence codes. Llama-3.2-3B, for example, improved from 1.5% as a generator to 55.1% as a verifier while trimming false positives by 73%. These results show that positioning LLMs as quality controllers, rather than primary generators, yields reliable support for clinical coding teams, while formal recall/F1 reporting remains future work for fully autonomous implementations.

Version published to 10.20944/preprints202512.2138.v1
Dec 24, 2025

When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models

This article has 1 author:
1. Binesh Sadanandan
This article has no evaluationsLatest version Feb 3, 2026
A hybrid-reasoner LLM framework toward real-world clinical decision- making support in acute ischemic stroke

This article has 14 authors:
1. Bicong Yan
2. Ruipeng Zhang
3. Yanfeng Fan
4. Ying Li
5. Li Chen
6. Xinyu Song
7. Yixiao Tang
8. Yifan Tu
9. Zhongzheng Cao
10. Li Shen
11. Mengfei Wang
12. Zhuo Li
13. Yijia Xiong
14. Yue-Hua LI
This article has no evaluationsLatest version Dec 11, 2025
Poetic or Prosaic? Evaluating the Linguistic Quality of AI-Generated Draft Replies to Patient Portal Messages

This article has 8 authors:
1. Gavin Hui
2. Laura Prichard
3. Taylor Martin
4. Sitaram Vangala
5. Joshua Khalili
6. Sun M. Yoo
7. Hawkin E. Woo
8. Paul J. Lukac
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models

A hybrid-reasoner LLM framework toward real-world clinical decision- making support in acute ischemic stroke

Poetic or Prosaic? Evaluating the Linguistic Quality of AI-Generated Draft Replies to Patient Portal Messages