A critical review of clinical prediction models in oncology: is this SEERious research?

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objectives: The National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) cancer registry covers almost half the United States population. Clinical prediction models developed using the SEER cancer registry have rapidly grown in number. We sought to characterise and examine the extent to which this body of work reflects formulaic research practices, reflected by a lack of clear objectives, substandard methods and practices with limited scientific value.Study Design and Setting: We conducted a systematic review of clinical prediction model development studies using the SEER data. MEDLINE via PubMed was searched for studies published between January 1 and August 10, 2023. Studies were included if they developed a clinical prediction model using two or more predictors from the SEER database. Data extraction covered study context, intended use, predictor characteristics, reporting guideline adherence, external validation practices, and the rationale provided for using SEER.Results: Of 436 eligible studies, 94% (n=412) had first authors affiliated with institutions in China, while 3% (n=14) were from the United States. Developed models mostly targeted lung (11%) and breast (10%) cancers, and most focussed on prognosis (87%). Cox regression was the predominant modelling approach (59%), and nomograms were the most common presentation format (77%). Reporting guideline adherence was low (13%), and study protocols were rarely mentioned (3%). Nearly one third of models included patients’ race as a predictor (32%). Of the China-affiliated studies, only 6% reported an intended clinical setting, and 3% specified the intended country of use. Discussion of the suitability of SEER was absent in 87% of included studies. Among China-affiliated studies, 22% carried out external validation using data from China. Over a ten-year period, expanding our search strategy identified more than 4,000 SEER-based prediction model studies indexed in MEDLINE, with annual output growing from 37 in 2016 to over 700 in 2023.Conclusions: Access to data alone does not justify model development. Most prediction models developed using SEER lack a clearly reported rationale, defined target population, an intended clinical purpose, and fail to address whether the models produced with the SEER data are appropriate for populations outside the United States. Many studies have limited scientific value and exhibit hallmarks of formulaic research created solely to boost researcher’s resumé. Researchers should rigorously justify their model development data source, define their target population for model use, adhere to the TRIPOD+AI reporting guideline, and refrain from concluding that SEER-based models are applicable outside the US without appropriate external validation. Journals, peer reviewers, and data custodians all have a responsibility to ensure high research standards for studies exploiting widely available datasets.

Article activity feed