Medical Diagnosis Coding Automation: Similarity Search vs. Generative AI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
This study aims to predict ICD-10-CM codes for medical diagnoses from short diagnosis descriptions and compare two distinct approaches: similarity search and using a generative model with few-shot learning.
Materials and Methods
The text-embedding-ada-002 model was used to embed textual descriptions of 2023 ICD-10-CM diagnosis codes, provided by the Centers provided for Medicare & Medicaid Services. GPT-4 used few-shot learning. Both models underwent performance testing on 666 data points from the eICU Collaborative Research Database.
Results
The text-embedding-ada-002 model successfully identified the relevant code from a set of similar codes 80% of the time, while GPT-4 achieved a 50 % accuracy in predicting the correct code.
Discussion
The work implies that text-embedding-ada-002 could automate medical coding better than GPT-4, highlighting potential limitations of generative language models for complicated tasks like this.
Conclusion
The research shows that text-embedding-ada-002 outperforms GPT-4 in medical coding, highlighting embedding models’ usefulness in the domain of medical coding.