Identification of primary sclerosing cholangitis: ICD-10 code validation and comparison with a large language model approach

Melinda Wang
Mai Dao
Molly Delk
Cynthia Fenton
Michelle Y. Li
Jessica B. Rubin
Jin Ge
Jennifer C. Lai
Michael Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Retrospective studies investigating primary sclerosing cholangitis (PSC) have been limited by the absence of a PSC-specific diagnostic code. In 2018, a new PSC-specific ICD-10 code was introduced. Aims We aimed to validate the new ICD-10 code and compare it to other methods of identifying patients with PSC. Methods All gastroenterology/hepatology and primary clinic notes and discharge summaries were extracted from UCSF Epic Clarity database and potential PSC patients were identified using natural language processing (NLP). PSC diagnosis was determined by physician adjudication through chart review. LASSO regression was used to develop and internally validate a PSC prediction model. Separately, we tested large language model’s (LLM) ability to distinguish PSC from non-PSC patients. Results Among 867 patients identified using NLP, 226 (26%) patients were adjudicated to have a true PSC diagnosis. The LASSO model selected ICD-10 code, alkaline phosphatase > 120 IU/L, ursodiol use, inflammatory bowel disease, and history of cholangitis. ICD-10 code alone had a c-statistic of 0.87, sensitivity 87.6%, and PPV 68.8%. The LASSO model had a c-statistic of 0.92, sensitivity 87.4%, and PPV 70.7%. LLM had a c-statistic 0.77, sensitivity 91.7%, and PPV 51.0%. Conclusions The PSC-specific ICD-10 code had excellent discriminatory capacity for identifying patients with PSC. While an optimized PSC prediction algorithm had slightly improved test characteristics, ICD-10 code alone was sufficient in identifying patients with PSC, supporting the use of the ICD-10 code in future database studies of PSC. In contrast, LLM had inferior discrimination compared to either ICD-10 code or the prediction model.

Version published to 10.21203/rs.3.rs-9152968/v1 on Research Square
Apr 1, 2026

Shared Molecular Features Between Primary Biliary Cholangitis and Hepatocellular Carcinoma Define a Prognostic Gene Signature for Risk Stratification

This article has 2 authors:
1. Yuexin Luo
2. Hongchun Luo
This article has no evaluationsLatest version Apr 6, 2026
Sjögren’s syndrome-associated interstitial lung disease: classification model development, risk factor analysis

This article has 3 authors:
1. QianHui Li
2. XinYu Sun
3. YueYue chen
This article has no evaluationsLatest version Mar 13, 2026
Screening of key variables and development and validation of a prognostic model for hepatocellular carcinoma

This article has 8 authors:
1. Jiang Chen
2. Hangyu Zhi
3. Mian Guo
4. Xin Meng
5. Yibo Zhang
6. Huan Xia
7. Cong Yao
8. Kai Qu
This article has no evaluationsLatest version Mar 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Shared Molecular Features Between Primary Biliary Cholangitis and Hepatocellular Carcinoma Define a Prognostic Gene Signature for Risk Stratification

Sjögren’s syndrome-associated interstitial lung disease: classification model development, risk factor analysis

Screening of key variables and development and validation of a prognostic model for hepatocellular carcinoma