Linking the NETSARC+ national sarcoma database with the SNDS to evaluate adjuvant and/or neoadjuvant therapy: report on the linkage process and result (Health Data Hub’s DEEPSARC pilot project)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
DEEPSARC, one of the first project running on the Health Data Hub aimed to identify real-life treatment regimens that could improve overall survival. The project is based on matching the national database of the sarcoma reference network with the SNDS.
Objectives
We aimed to report a transparent description of the linking process and its results.
Methods
The sarcoma database encompasses 33,548 patients matching the selection criteria divided in three subsets: 13,507 patients with a complete dataset gathering clinical and pathological data; 5,844 patients with clinical data alone; and 14,197 patients with pathological data alone. As no ICD-10 code reliably identifies patients with sarcoma the subpopulation extracted from the SNDS was extended to 3 million patients who underwent surgery for their cancer. An indirect record linkage process used a combination (called a signature) of so-called chaining variables to uniquely identify a pair of patients from each of the bases. Two metrics (signature robustness and overall quality) were calculated for ease of interpretation.
Results
The overall matching rate of 73.1% (24,539 pairs out of 33,548 observations), reaching 90.5% in the intersection of the sarcomas databases (with extended data, 12,225 pairs out of 13,507 observations).
Conclusion
Detailed reporting, along with dedicated metrics, contribute to the transparency of the process, as discussion and interpretation of the chaining results are crucial for the validity of the main results of the study.