Automated L3 Skeletal Muscle Segmentation for Evaluation of Sarcopenia: Development and Independent Validation of an Ensemble-Based 2D nnU-Net Pipeline in a Complex Liver Disease Cohort

Hyeon Yu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose: To develop a fully automated 2D nnU-Net pipeline for multi-class skeletal muscle segmentation (psoas, paraspinal, and abdominal wall) at the third lumbar (L3) vertebral level, and to quantitatively evaluate its diagnostic performance and reliability compared to manual segmentation. Materials and Methods: A 2D nnU‑Net was trained on 164 axial L3 CT slices from the multi-institutional AMOS22 dataset, spanning diverse abdominal pathologies and multivendor imaging. To assess generalizability under severe anatomical distortion, independent external validation was performed in 50 consecutive patients with advanced liver disease from a single institution (January–December 2025; mean age, 63 ± 15 years; 32 women, 18 men), of whom 88% had moderate-to-severe ascites. Model stability was examined by comparing a five‑fold ensemble with the best‑performing single‑fold model. Intra‑observer reliability of the manual reference standard was evaluated in a random subset of 30 cases. Performance metrics included the Dice Similarity Coefficient (DSC), Pearson correlation coefficient (r), and Bland–Altman analysis for cross‑sectional areas and mean attenuation. The inference workflow was deployed via a custom Streamlit‑based graphical user interface (GUI). Results: In this anatomically complex external validation cohort, the 5-fold ensemble 2D nnU-Net achieved an overall mean DSC of 0.937 ± 0.043, with 80% of cases achieving a mean DSC ≥ 0.90. While the mean DSC was statistically comparable to the best single-fold model (0.937, p = 0.736), the ensemble strategy increased the minimum observed DSC (worst-case performance) from 0.720 to 0.822. Comparison between the ensemble model and manual segmentation yielded a Pearson correlation of r = 0.955 (p < 0.001) for total skeletal muscle area, with a mean bias of +7.17 cm². Intra-observer agreement for the manual reference standard demonstrated a correlation of r = 0.995 for total area. The automated pipeline required 3-5 seconds per case for inference and quantitative reporting, compared to 3-5 minutes for manual segmentation. Conclusion: In patients with advanced liver disease and substantial anatomical distortion from ascites, an ensemble-based 2D nnU‑Net provides quantitative accuracy and measurement agreement comparable to manual L3 skeletal muscle segmentation, while mitigating lower-bound (worst-case) errors relative to single-fold models. Integration with a dedicated GUI enables substantial time savings and supports scalable clinical body composition analysis.

Version published to 10.20944/preprints202602.1774.v1
Feb 26, 2026

AI-Driven Appendicular Skeletal Muscle Mass Index (ASMI) Prediction and Low Muscle Mass Detection from Routine Hip X-rays: A Novel Opportunistic Screening Tool

This article has 5 authors:
1. Ling Lee
2. Shu-Han Chuang
3. Yi-Jie Kuo
4. Lien-Chen Wu
5. Yu-Pin Chen
This article has no evaluationsLatest version Jan 25, 2026
Automated Deep-Learning Quantification of Intramuscular Fat in Lumbar Spine Muscles on Dixon MRI: Validation and Normative Reference Values from 173 Healthy Adults

This article has 5 authors:
1. Germán Balerdi
2. Johann Henckel
3. Anna Di Laura
4. Alister J. Hart
5. Martin A. Belzunce
This article has no evaluationsLatest version Feb 24, 2026
Development and Validation of a Nomogram Combining Ultrasound Parameters and Clinical Indicators for Predicting Sarcopenia in Breast Cancer Patients

This article has 7 authors:
1. Ying Wang
2. Yan Hu
3. Yue Liu
4. Mingxin Ji
5. Ruxue sun
6. Shuang Yang
7. Jianlin Wu
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI-Driven Appendicular Skeletal Muscle Mass Index (ASMI) Prediction and Low Muscle Mass Detection from Routine Hip X-rays: A Novel Opportunistic Screening Tool

Automated Deep-Learning Quantification of Intramuscular Fat in Lumbar Spine Muscles on Dixon MRI: Validation and Normative Reference Values from 173 Healthy Adults

Development and Validation of a Nomogram Combining Ultrasound Parameters and Clinical Indicators for Predicting Sarcopenia in Breast Cancer Patients