Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background and Aims
Several conditions exist that do not have their own unique diagnosis code in widely-used clinical terminologies, making them difficult to track and study. Acute severe ulcerative colitis (ASUC) is one such condition. There is no automated method to identify patients admitted for ASUC from observational data, nor any specific billing or diagnosis code for ASUC. Accurate, automated, large-scale identification of hospital admissions for non-coded conditions like ASUC may enable further research into them.
Methods
We performed a retrospective cohort study of patients with a history of ulcerative colitis (UC) admitted to a single academic institution from 2014-2019. Clinicians at our institution performed a chart review of these admissions to determine if each was due to a true episode of ASUC or not. Logistic regression, random forest (RF), and support vector machine (SVM) models were trained upon administrative claims data for all admissions.
Results
268 ASUC admissions and 3,725 non-ASUC admissions among UC patients were included. Our RF model exhibited the best performance, correctly classifying 95.5% of admissions as either ASUC or non-ASUC, with a validation AUROC of 0.96 (95% CI 0.94-0.98; AUPRC 0.73). The model had a sensitivity of 81.5% and specificity of 96.5%. The five most important features in the model were endoscopy of sigmoid colon, length of stay, age, endoscopy of rectum, and abdominal x-ray.
Conclusions
There is currently no modality by which ASUC, which does not have its own unique diagnosis code, can be identified from claims databases in a scalable fashion for research or clinical purposes. We have developed a machine learning-based model that identifies clinically significant ASUC and reliably distinguishes them from admissions for non-ASUC reasons among UC patients. The ability to automatically curate large, accurate datasets of non-coded conditions like ASUC episodes can serve as the basis of large-scale analyses to maximize our ability to learn from real-world data, enable future research, and better understand these diseases.
Summary
There is currently no accurate way to identify, track, or study acute severe ulcerative colitis (ASUC) using administrative claims datasets. We have built a machine learning model to identify ASUC from claims data to enable large-scale studies on this condition.