Generation of machine-learning derived Cancer Vulnerability Indicator to determine the spatial burden of cancer outcomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Due to the difficulty of obtaining population-based individual-level data, ecological studies are often used to explore factors related to geographic variations in health outcomes. This study proposes a novel framework to identify area-level predictors of spatial variations in lung cancer outcomes and generate a lung cancer vulnerability index (LcVI) based on these predictors.
Methods
Data on 11,313 persons diagnosed with invasive lung cancer in Queensland, Australia (2016-2019) were sourced from the population-based Queensland Cancer Register. Bayesian spatial models estimated smoothed standardised incidence ratios (SIRs) for 519 geographic areas. Area-level variables (n= 911) were extracted from multiple data collections. Random forest models were fitted to identify important predictors for lung cancer incidence rates. A novel non-parametric dimensionality reduction approach incorporating the final random forest model results was developed to generate the LcVI which ranged from 0-10.
Results
Eight variables were identified as predictors for lung cancer incidence with the top two being the prevalence of diabetes and adequate fruit intake. Areas having incidence rates below the Queensland average had significantly lower LcVI than those with average incidence rates (mean difference = 2.80, 95% CI: 2.34-3.25, p < 0.001) while areas with above average incidence rates had significantly higher LcVI than those with average incidence (mean difference = 2.70, 95% CI: 2.20-3.19, p < 0.001). The LcVI was strongly associated with the continuous SIR, explaining 57% of the variation (R² = 0.57, p< 0.001).
Conclusion
This novel approach identified a small number of important predictors for lung cancer incidence from a high-dimensional dataset. The lung cancer vulnerability index based on these predictors effectively explained the geographic variations in incidence, potentially offering insights into the underlying drivers of these variations. The favourable performance of this approach may promote further ecological studies on other cancer outcomes.