MIRACLE - Medical Information Retrieval using Clinical Language Embeddings for Retrieval Augmented Generation at the point of care

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Most sentence transformer models have been trained in English on publicly accessible datasets. Integration of these models into Retrieval Augmented Generation systems is limited in terms of their ability to retrieve relevant patient-related information. In this study, multiple embedding models were fine-tuned on approximately eleven million question and chunk pairs from 400,000 documents documented in diverse medical categories. The questions and corresponding answers were generated by prompting a large language model. The fine-tuned model demonstrated superior performance on real-world German and translated English evaluation datasets, surpassing the state-of-the-art multilingual-e5-large model. Furthermore, models were trained on a pseudonymized dataset and made publicly available for other healthcare institutions to utilize.

Article activity feed