Machine learning predicts liver cancer risk from routine clinical data: a large population-based multicentric study

Jan Clusmann
Paul-Henry Koop
David Y. Zhang
Felix van Haag
Omar S. M. El Nahhas
Tobias Seibel
Laura Žigutytė
Apichat Kaewdech
Julien Calderaro
Frank Tacke
Tom Luedde
Daniel Truhn
Tony Bruns
Kai Markus Schneider
Jakob N. Kather
Carolin V. Schneider

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background and aims

Hepatocellular carcinoma (HCC) is a highly fatal tumor, for which early detection and risk stratification is crucial, yet remains challenging. We aimed to develop an interpretable machine-learning framework for HCC risk stratification based on routinely collected clinical data.

Methods

We leverage data obtained from over 900,000 individuals and 983 cases of HCC across two large-scale population-based cohorts: the UK Biobank study and the “All Of Us Research Program”. For all of these patients, clinical data from timepoints years before diagnosis of HCC was available. We integrate data modalities including demographics, electronic health records, lifestyle, routine blood tests, genomics and metabolomics to offer a unique, multi-modal perspective on HCC risk.

Results

Our random-forest-based model significantly outperforms all publicly available state-of-the-art risk-scores, with an AUROC of 0.88 both for internal and external test sets. We demonstrate robustness of our model across ethnic subgroups, a major advance over previous models with variable performance by ethnicity. Further, we perform extensive feature-importance analysis, showcasing our approach as an interpretable framework. We provide all model weights and an open-source web calculator to facili-tate further validation of our model.

Conclusion

Our study presents a robust and interpretable machine-learning framework for HCC risk stratification, which offers the potential to improve early detection and could ultimately reduce disease burden through targeted interventions.

Lay summary

Finding liver cancer early is crucial for successful treatment. Therefore, screening with abdominal ultra-sound can be performed. However, it is not clear who should receive ultrasound screening, as with the current standard of screening only patients with liver cirrhosis, a severe liver disease, many patients are diagnosed with liver cancer in late stages. Therefore, we trained a machine learning model, acting like many decision trees at the same time, to detect patients with high risk of liver cancer by looking at patterns of almost 1000 cases of liver cancer in a population of 900.000 individuals. In a separate set of patients, which the model has not seen during training, our model worked better than all available models. Additionally, we investigated 1. how the model comes to its prediction, 2. whether it works in males and females alike and 3. which data is most relevant for the model. Like this, our model can help sort patients into categories like “high-risk”, “medium-risk” and “low-risk”, via which screening strategies can then be decided, to help improve early detection of liver cancer.

Version published to 10.1101/2024.11.03.24316662 on medRxiv
Nov 4, 2024

A Multicenter Machine Learning Model Incorporating Circulating Tumor Cells for Postoperative Recurrence Prediction in Localized Renal Cell Carcinoma

This article has 21 authors:
1. Zihao Li
2. Chunzhi Qi
3. Yue Chong
4. Qiang Wei
5. Shaogang Wang
6. Jianbin Bi
7. Jinkai Shao
8. Xiaoping Zhang
9. Xin Gou
10. Wenhao Shen
11. Weiyang He
12. Xiaoming Cao
13. Wei Xiong
14. Guojun Chen
15. Xiaojian Yang
16. Jianxin Qiu
17. Yingyi Li
18. Jianzhou Liu
19. Yuan Shen
20. Tie Chong
21. Zhenlong Wang
This article has no evaluationsLatest version Jan 23, 2026
An Ensemble-Base Machine Learning Approach to Predict 2- and 10-Year Breast Cancer

This article has 10 authors:
1. Patricia Honorato Moreira
2. Arthur Shuzo Owtake Cardoso
3. Rafael de Oliveira
4. Joaquim Gasparini
5. Renata Colombo Bonadio
6. Bruna Salani Mota
7. Alexandre Ferreira Ramos
8. Flavia Santoro
9. Roger Chammas
10. Luciana Rodrigues Carvalho Barros
This article has no evaluationsLatest version Feb 22, 2026
Machine Learning-Based Survival Time Prediction in Colorectal Cancer with Peritoneal Metastasis: A Multi-Institutional Registry-Based Study

This article has 32 authors:
1. Yoshiko Bamba
2. Michio Itabashi
3. Hirotoshi Kobayashi
4. Kenjiro Kotake
5. Masayasu Kawasaki
6. Yukihide Kanemitsu
7. Yusuke Kinurgasa
8. Hideki Ueno
9. Kotaro Maeda
10. Takeshi Suto
11. Kimihiko Funahashi
12. Heita Ozawa
13. Fumikazu Koyama
14. Shingo Noura
15. Hideyuki Ishida
16. Masayuki Ohue
17. Tomomichi Kiyomatsu
18. Soichiro Ishihara
19. Keiji Koda
20. Hideo Baba
21. Kenji Kawada
22. Yojiro Hashiguchi
23. Takanori Goi
24. Yuji Toiyama
25. Naohiro Tomita
26. Eiji Sunami
27. Yoshito Akagi
28. Jun Watanabe
29. Kenichi Hakamada
30. Goro Nakayama
31. Kenichi Sugihara
32. Yoichi Ajioka
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Background and aims

Methods

Results

Conclusion

Lay summary

Article activity feed

Related articles

A Multicenter Machine Learning Model Incorporating Circulating Tumor Cells for Postoperative Recurrence Prediction in Localized Renal Cell Carcinoma

An Ensemble-Base Machine Learning Approach to Predict 2- and 10-Year Breast Cancer

Machine Learning-Based Survival Time Prediction in Colorectal Cancer with Peritoneal Metastasis: A Multi-Institutional Registry-Based Study