AUGMENT: a framework for robust assessment of the clinical utility of segmentation algorithms

Cathal McCague
Thomas Buddenkotte
Lorena Escudero Sanchez
David Hulse
Roxana Pintican
Leonardo Rundo
AUGMENT study team
James D. Brenton
Dominique-Laurent Couturier
Ozan Öktem
Ramona Woitek
Carola-Bibiane Schönlieb
Evis Sala
Mireia Crispin Ortuzar

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Evaluating AI-based segmentation models primarily relies on quantitative metrics, but it remains unclear if this approach leads to practical, clinically applicable tools.

Purpose

To create a systematic framework for evaluating the performance of segmentation models using clinically relevant criteria.

Materials and Methods

We developed the AUGMENT framework (Assessing Utility of seGMENtation Tools), based on a structured classification of main categories of error in segmentation tasks. To evaluate the framework, we assembled a team of 20 clinicians covering a broad range of radiological expertise and analysed the challenging task of segmenting metastatic ovarian cancer using AI. We used three evaluation methods: (i) Dice Similarity Coefficient (DSC), (ii) visual Turing test, assessing 429 segmented disease-sites on 80 CT scans from the Cancer Imaging Atlas), and (iii) AUGMENT framework, where 3 radiologists and the AI-model created segmentations of 784 separate disease sites on 27 CT scans from a multi-institution dataset.

Results

The AI model had modest technical performance (DSC=72±19 for the pelvic and ovarian disease, and 64±24 for omental disease), and it failed the visual Turing test. However, the AUGMENT framework revealed that (i) the AI model produced segmentations of the same quality as radiologists ( p =.46), and (ii) it enabled radiologists to produce human+AI collaborative segmentations of significantly higher quality ( p =<.001) and in significantly less time ( p =<.001).

Conclusion

Quantitative performance metrics of segmentation algorithms can mask their clinical utility. The AUGMENT framework enables the systematic identification of clinically usable AI-models and highlights the importance of assessing the interaction between AI tools and radiologists.

Summary statement

Our framework, called AUGMENT, provides an objective assessment of the clinical utility of segmentation algorithms based on well-established error categories.

Key results

Combining quantitative metrics with qualitative information on performance from domain experts whose work is impacted by an algorithm’s use is a more accurate, transparent and trustworthy way of appraising an algorithm than using quantitative metrics alone.
The AUGMENT framework captures clinical utility in terms of segmentation quality and human+AI complementarity even in algorithms with modest technical segmentation performance.
AUGMENT might have utility during the development and validation process, including in segmentation challenges, for those seeking clinical translation, and to audit model performance after integration into clinical practice.

Version published to 10.1101/2024.09.20.24313970v1 on medRxiv
Sep 23, 2024

Beyond Benchmarks: Towards Robust Artificial Intelligence Bone Segmentation in Socio-Technical Systems

This article has 51 authors:
1. Kunpeng Xie
2. Lennart Johannes Gruber
3. Martin Crampen
4. Yao Li
5. André Ferreira
6. Elias Tappeiner
7. Maxime Gillot
8. Jan Schepers
9. Jiangchang Xu
10. Tobias Pankert
11. Michel Beyer
12. Negar Shahamiri
13. Reinier ten Brink
14. Gauthier Dot
15. Charlotte Weschke
16. Niels van Nistelrooij
17. Pieter-Jan Verhelst
18. Yan Guo
19. Zhibin Xu
20. Jonas Bienzeisler
21. Ashkan Rashad
22. Tabea Flügge
23. Ross Cotton
24. Shankeeth Vinayahalingam
25. Robert Ilesan
26. Stefan Raith
27. Dennis Madsen
28. Constantin Seibold
29. Tong Xi
30. Stefaan Bergé
31. Sven Nebelung
32. Oldřich Kodym
33. Osku Sundqvist
34. Florian Thieringer
35. Hans Lamecker
36. Antoine Coppens
37. Thomas Potrusil
38. Joep Kraeima
39. Max Witjes
40. Guomin Wu
41. Xiaojun Chen
42. Adriaan Lambrechts
43. Lucia H Soares Cevidanes
44. Stefan Zachow
45. Alexander Hermans
46. Daniel Truhn
47. Victor Alves
48. Jan Egger
49. Rainer Röhrig
50. Frank Hölzle
51. Behrus Puladi
This article has no evaluationsLatest version Jun 13, 2025
Balancing Accuracy and Efficiency: A Comprehensive Analysis of Optimization Algorithms in Medical Image Segmentation

This article has 5 authors:
1. Nijad A. Al-Najdawi
2. Ali F. Al-Shawabkeh
3. Sara Tedmori
4. Ibrahim I. Ikhries
5. Osama Dorgham
This article has no evaluationsLatest version May 20, 2025
Medical Images Segmentation Utility (MIS-U)

This article has 1 author:
1. Ihab Elaff
This article has no evaluationsLatest version Jun 27, 2025

Listed in

Abstract

Background

Purpose

Materials and Methods

Results

Conclusion

Summary statement

Key results

Article activity feed

Related articles

Beyond Benchmarks: Towards Robust Artificial Intelligence Bone Segmentation in Socio-Technical Systems

Balancing Accuracy and Efficiency: A Comprehensive Analysis of Optimization Algorithms in Medical Image Segmentation

Medical Images Segmentation Utility (MIS-U)