A critical review of clinical prediction models in oncology: is this SEERious research?

Gary Collins
Paula Dhiman
Biruk Tsegaye

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objectives: The National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) cancer registry covers almost half the United States population. Clinical prediction models developed using the SEER cancer registry have rapidly grown in number. We sought to characterise and examine the extent to which this body of work reflects formulaic research practices, reflected by a lack of clear objectives, substandard methods and practices with limited scientific value.Study Design and Setting: We conducted a systematic review of clinical prediction model development studies using the SEER data. MEDLINE via PubMed was searched for studies published between January 1 and August 10, 2023. Studies were included if they developed a clinical prediction model using two or more predictors from the SEER database. Data extraction covered study context, intended use, predictor characteristics, reporting guideline adherence, external validation practices, and the rationale provided for using SEER.Results: Of 436 eligible studies, 94% (n=412) had first authors affiliated with institutions in China, while 3% (n=14) were from the United States. Developed models mostly targeted lung (11%) and breast (10%) cancers, and most focussed on prognosis (87%). Cox regression was the predominant modelling approach (59%), and nomograms were the most common presentation format (77%). Reporting guideline adherence was low (13%), and study protocols were rarely mentioned (3%). Nearly one third of models included patients’ race as a predictor (32%). Of the China-affiliated studies, only 6% reported an intended clinical setting, and 3% specified the intended country of use. Discussion of the suitability of SEER was absent in 87% of included studies. Among China-affiliated studies, 22% carried out external validation using data from China. Over a ten-year period, expanding our search strategy identified more than 4,000 SEER-based prediction model studies indexed in MEDLINE, with annual output growing from 37 in 2016 to over 700 in 2023.Conclusions: Access to data alone does not justify model development. Most prediction models developed using SEER lack a clearly reported rationale, defined target population, an intended clinical purpose, and fail to address whether the models produced with the SEER data are appropriate for populations outside the United States. Many studies have limited scientific value and exhibit hallmarks of formulaic research created solely to boost researcher’s resumé. Researchers should rigorously justify their model development data source, define their target population for model use, adhere to the TRIPOD+AI reporting guideline, and refrain from concluding that SEER-based models are applicable outside the US without appropriate external validation. Journals, peer reviewers, and data custodians all have a responsibility to ensure high research standards for studies exploiting widely available datasets.

Version published to 10.31222/osf.io/zwexr_v1 on OSF Preprints
Mar 16, 2026

Decision-Analytic Models of Detection Strategies for Upper Gastrointestinal Cancers: A Methodological Systematic Review

This article has 5 authors:
1. Zhezhou He
2. Thurkga Moothathamby
3. Garth Funston
4. Runguo Wu
5. Borislava Mihaylova
This article has no evaluationsLatest version Mar 18, 2026
PATONCOS: A Novel Patient Stratification Tool Integrating Clinical and Economic Data for Benchmarking Oncology and Hematology Care

This article has 14 authors:
1. Raquel Moreno-Díaz
2. Alejandra Melgarejo-Ortuño
3. Beatriz Monje-García
4. Laura Delgado-Téllez de Cepeda
5. Ana Beatriz Fernández-Román
6. Marta Manso-Manrique
7. Javier Letéllez-Fernández
8. Beatriz Candel-García
9. Amelia Sánchez-Guerrero
10. Miguel Ángel Amor-García
11. Mario García-Gil
12. María Isabel Valverde-Merino
13. Francisco Javier García-Sánchez
14. Miguel Ángel Calleja-Hernández
This article has no evaluationsLatest version Apr 14, 2026
A Czech national administrative real-world study of diagnostics and treatment pathways of non-small-cell lung cancer stratified by disease stage: From data to actionable indicators

This article has 11 authors:
1. Gleb Donin
2. Aleš Tichopád
3. Vratislav Sedlák
4. Marian Rybář
5. Martin Rožánek
6. Karla Mothejlová
7. Vladimír Koblížek
8. Pavel Turčáni
9. Milan Sova
10. Ladislav Dušek
11. Zuzana Bielčiková
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Decision-Analytic Models of Detection Strategies for Upper Gastrointestinal Cancers: A Methodological Systematic Review

PATONCOS: A Novel Patient Stratification Tool Integrating Clinical and Economic Data for Benchmarking Oncology and Hematology Care

A Czech national administrative real-world study of diagnostics and treatment pathways of non-small-cell lung cancer stratified by disease stage: From data to actionable indicators