Constructing multicancer risk cohorts using national data from medical helplines and secondary care

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Identification of cohorts at higher risk of cancer can enable earlier diagnosis of the disease, which significantly improves patient outcomes. In this study, we use machine learning to predict cancer diagnosis in the next year. We select nine cancer sites with high incidence of late-stage diagnosis or worsening survival rates, and where there are currently no national screening programmes. We use National Health Service (NHS) data from medical helplines (NHS 111) and secondary care appointments from all hospitals in England. We show that features based on information captured in NHS 111 calls are among the most influential in driving predictions of a future cancer diagnosis. Our predictive models exhibit good discrimination (AUC – 0.78 – SD 0.04), ranging from 0.69 (ovarian cancer) to 0.83 (oesophageal cancer). While our predictive modelling provides patient level risk predictions, our emphasis is on constructing cohorts of patients who may be at risk of cancer rather than individual risk scores. We present an approach of constructing cohorts at higher risk of cancer based on feature importance and considering possible bias in model results. These outputs can be used to develop highly targeted case finding services, which could help increase earlier detection rates and reduce health disparities. This approach is flexible and can be tailored based on the group the intervention targets (i.e. symptomatic or asymptomatic patients) and the data available to those charged with administering the intervention.

Article activity feed