Validation of a Composite Mortality Endpoint in a Large, Clinico-Genomic Real-World Database of Patients with Advanced Cancer

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose

Real-world data (RWD) from electronic health records (EHRs) and next-generation sequencing are increasingly used to study treatment effectiveness in molecularly refined patient populations. Incomplete mortality data in EHR can overestimate survival rates in RWD studies. While the National Death Index (NDI) is the gold standard for mortality data in the United States, its limited accessibility and reporting delays hinder timely research. Instead, EHR datasets are often supplemented with external mortality data sources to improve mortality data capture. This study evaluated a composite mortality variable against NDI records using a large cohort of advanced cancer patients from a real-world oncology database.

Methods

De-identified clinical and molecular data from patients with advanced solid tumors were linked with third-party mortality and claims datasets using deterministic tokenization. Vital status and death dates were harmonized across sources. Patient identifiers were submitted to NDI, and true matches were de-identified and joined for analysis. Performance metrics (sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV]) were calculated using NDI as ground truth. Date agreement was assessed at 0, ±15, and ±30-day tolerances. Subgroup analyses and a cumulative cases/dynamic controls (CC/DC) approach were also performed.

Results

Among 17,597 patients, the composite mortality variable demonstrated 82% sensitivity and 95% specificity against NDI. PPV was 96%, and NPV was 77%. Exact date agreement was 86%, increasing to 94% within a ±15-day tolerance and 96% within a ±30-day tolerance. Incorporating third-party mortality and claims data substantially improved sensitivity from 17% (EHR alone) to 82%. Sensitivity remained stable across subgroups but showed variation by age, cancer type, geographic region, and race. With the CC/DC approach, sensitivity was 96% at 6 months, 97% at 12 months, and 98% at 24 months, with specificity above 98% across these timeframes.

Conclusions

The composite mortality variable is a robust, reliable endpoint for real-world evidence analyses. Its high accuracy for identified deaths and appropriate censoring of lost-to-follow-up patients support its use in overall survival analyses. This validation is a foundational step towards high-quality research to improve patient outcomes and advance cancer drug development using this multimodal dataset.

Clinical trial number: not applicable

Article activity feed