Improving the Accuracy and Precision of Disease Identification When Utilizing Ehr Data for Research: the Case for Hepatocellular Carcinoma
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective We assessed the performance of ICD codes to identify patients with hepatocellular carcinoma (HCC) in a large academic health system and determined whether employing an algorithm using a combination of ICD codes could deliver higher accuracy and precision than single ICD codes in identifying HCC cases using electronic health record (EHR) data. Results The use of a single ICD code entry for HCC (ICD-9-CM 155.0 or ICD-10-CM C22.0) in our cohort of 1,007 established ambulatory care patients with potential HCC yielded 58% false positives (not true HCC cases) based on chart reviews. We developed an ICD code-based algorithm that prioritized positive predictive value (PPV), F-score, and accuracy to minimize false positives and negatives. The highest performing algorithm required at least 10 ICD code entries for HCC and the sum of ICD code entries for HCC to exceed the sum of ICD code entries for non-HCC malignancies. The algorithm demonstrated high performance (PPV 97.4%, F-score 0.92, accuracy 94%), which was internally validated (PPV 92.3%, F-score 0.90, accuracy 91%) using a separate sample of potential HCC cases. Our findings support the need to assess the accuracy and precision of ICD codes before using EHR data to study HCC more broadly.