Non-invasive Prediction of Ki-67 Status in Hepatocellular Carcinoma Using Interpretable Machine Learning with Clinical and CT Radiomics Features

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective : This study aims to investigate the application value of a radiomics-based machine learning model derived from CT images in the preoperative prediction of Ki-67 proliferation status in hepatocellular carcinoma (HCC). Additionally, the SHapley Additive exPlanations (SHAP) method is employed to visualize the prediction process of the combined model, enhancing its interpretability and facilitating its clinical application. Methods : This study retrospectively collected preoperative enhanced CT images from 172 patients with pathologically confirmed hepatocellular carcinoma (HCC). The patients were randomly divided into training and validation cohorts (7:3 ratio) and categorized into low-expression (≤15%) and high-expression (>15%) groups based on the Ki-67 labeling index. Radiomic features were extracted using the PyRadiomics library in Python. Feature selection was performed using mutual information, LASSO regression, and feature selection networks. Machine learning models, including logistic regression, support vector machine, random forest, and XGBoost, were developed. The optimal model was selected based on ROC curve, AUC, and calibration curve analysis, and an integrated model was built by combining it with clinical features. Model interpretability was visualized using SHAP. Results : Among the radiomics-based models constructed using CT imaging features, the random forest model based on the venous phase (PVP sequence) demonstrated the most stable performance, with six radiomic features selected. This model achieved an average AUC of 0.91 in the training set, with a sensitivity of 85%, specificity of 82%, and accuracy of 83%. In the test set, it yielded an AUC of 0.80, sensitivity of 73%, specificity of 78%, and accuracy of 76%. Furthermore, incorporating these six radiomic features with the clinical variable (AFP), logistic regression, support vector machine, random forest, and extreme gradient boosting (XGBoost) models were constructed. The results indicated that the combined random forest model exhibited the highest discriminative performance. This model achieved an average AUC of 0.92 in the training set, with a sensitivity of 82%, specificity of 87%, and accuracy of 85%. In the test set, the AUC was 0.82, with a sensitivity of 72%, specificity of 76%, and accuracy of 74%. Additionally, the calibration curve demonstrated good model fit, suggesting that the combined random forest model provides higher accuracy and stability in predicting Ki-67 expression status. Conclusions : The radiomics-based interpretable machine learning model presents a non-invasive methodology for the reliable prediction of Ki-67 expression status in patients with hepatocellular carcinoma, exhibiting robust discriminative performance.

Article activity feed