Improving the accuracy and precision of disease identification when utilizing EHR data for research: the case for hepatocellular carcinoma
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
We assessed the performance of International Classification of Diseases (ICD) codes to identify patients with hepatocellular carcinoma (HCC) in a large academic health system and determined whether employing an algorithm using a combination of ICD codes could deliver higher accuracy and precision than single ICD codes in identifying HCC cases using electronic health record (EHR) data.
Results
The use of a single ICD code entry for HCC (ICD-9-CM 155.0 or ICD-10-CM C22.0) in our cohort of 1,007 established ambulatory care patients with potential HCC yielded 58% false positives (not true HCC cases) based on chart reviews. We developed an ICD code-based algorithm that prioritized positive predictive value (PPV), F-score, and accuracy to minimize false positives and negatives. Using manual chart reviews as the gold standard, the highest performing algorithm required at least 10 ICD code entries for HCC and the sum of ICD code entries for HCC to exceed the sum of ICD code entries for non-HCC malignancies. The algorithm demonstrated high performance (PPV 97.4%, F-score 0.92, accuracy 94%), which was internally validated (PPV 92.3%, F-score 0.90, accuracy 91%) using a separate sample of 285 cancer registry cases. Our findings support the need to assess the accuracy and precision of ICD codes before using EHR data to study HCC more broadly.