Med-gte-hybrid: A contextual embedding modelfor extracting narrative information from clinical texts
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The extraction of actionable information from unstructured clinical narrativesis a crucial step toward advancing predictive healthcare, particularly for chronicdiseases such as Chronic Kidney Disease (CKD). We present an approach toextract narrative information using a sentence transformer that generates robustembeddings for clinical text, enabling a range of downstream tasks, includingprognosis, mortality prediction, and estimation of kidney function using esti-mated glomerular filtration rate (eGFR). We selected gte-large, a high-performingsentence transformer, as the base model. Through a novel fine-tuning strategythat combines contrastive learning with denoising autoencoder-based approach,we significantly enhance the model’s ability to capture subtle patterns in clinicaltext data. The fine-tuned model, med-gte-hybrid, enables improved patient strati-fication, clustering, and prediction, outperforming current state-of-the-art modelsin key tasks. While CKD is the current focus, the approach is designed to begeneralizable across other medical domains offering the potential to improve clin-ical decision-making and personalized treatment pathways in various healthcarecontexts.