Machine learning-Based Classification of Papillary Thyroid Carcinoma Versus Multinodular Goiter Using Preoperative Laboratory and Cytology Data

Salar GolmohammadzadehKhiaban
Mehrad Namazee
Ali Rahnama

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Thyroid nodules are frequently encountered in clinical practice, with their detection increasing due to advancements in imaging modalities. While most nodules are benign, distinguishing papillary thyroid carcinoma (PTC) from benign entities such as multinodular goiter (MNG) remains a diagnostic challenge. Fine-needle aspiration (FNA) and sonography are standard tools, but their limitations highlight the need for supplementary approaches. This study evaluates the use of machine learning (ML) models to classify PTC versus MNG using routine preoperative clinical, laboratory, and cytological data before performing surgery and Pathology results.

Methods

This retrospective multicenter study included 971 patients who underwent total thyroidectomy between 2020 and 2024. The dataset incorporated demographic data, preoperative sonographic findings, hematologic and thyroid function tests, and FNA cytology results. Five supervised ML algorithms—Logistic Regression, Random Forest, XGBoost, Support Vector Machine (SVM), and K-Nearest Neighbor (KNN)—were trained and validated. Model performance was assessed using accuracy, precision, recall, F1-score, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC).

Results

The XGBoost model achieved the best performance, with an accuracy of 84.4%, precision of 85.3%, and an AUC-ROC of 0.881. It also demonstrated high sensitivity (0.714) and specificity (0.944). Random Forest also performed well (accuracy: 81.2%, AUC-ROC: 0.919). Logistic Regression, SVM, and KNN underperformed in comparison. Feature importance analysis revealed that the FNA result, nodule size, and TSH were the most influential predictors.

Conclusion

Machine learning models, particularly XGBoost and Random Forest, show promise in accurately distinguishing between MNG and PTC using routine clinical data. Their integration into preoperative assessment may enhance diagnostic precision, reduce unnecessary procedures, and support personalized surgical decision-making. Further validation in diverse, multicenter cohorts is warranted to confirm generalizability and clinical utility.

Version published to 10.1101/2025.05.15.25327670 on medRxiv
May 15, 2025

Key Ultrasonographic Features, Fine-Needle Aspiration, and Nodule Location in the Preoperative Differentiation of Benign and Malignant Thyroid Nodules: A Retrospective Study

This article has 5 authors:
1. Yuanguang Tian
2. Kaikai Zhai
3. Honghong Wu
4. Zheng Wang
5. Haiyi Wang
This article has no evaluationsLatest version Dec 12, 2025
Diagnostic Comparison of TI-RADS and a Nomogram for Thyroid Nodules in Northwestern China

This article has 5 authors:
1. Miao Tan
2. Wenhan Li
3. Jianhui Li
4. Jia Du
5. Xufeng Zhang
This article has no evaluationsLatest version Dec 30, 2025
Deep Learning-Based MRI Segmentation for Non-Invasive Prediction of Microsatellite Instability in Endometrial Cancer: A Multicenter Study

This article has 10 authors:
1. Ke Wang
2. Xiaoli Song
3. Xiaoyi Liu
4. Xuqing Lin
5. Hongjian Luo
6. Xinyi Gou
7. Nan Hong
8. Yi Wang
9. Rong Zhou
10. Jin Cheng
This article has no evaluationsLatest version Dec 30, 2025

Discuss this preprint

Listed in

Abstract

Methods

Results

Conclusion

Article activity feed

Related articles

Key Ultrasonographic Features, Fine-Needle Aspiration, and Nodule Location in the Preoperative Differentiation of Benign and Malignant Thyroid Nodules: A Retrospective Study

Diagnostic Comparison of TI-RADS and a Nomogram for Thyroid Nodules in Northwestern China

Deep Learning-Based MRI Segmentation for Non-Invasive Prediction of Microsatellite Instability in Endometrial Cancer: A Multicenter Study