AUGMENT: a framework for robust assessment of the clinical utility of segmentation algorithms

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Evaluating AI-based segmentation models primarily relies on quantitative metrics, but it remains unclear if this approach leads to practical, clinically applicable tools.

Purpose

To create a systematic framework for evaluating the performance of segmentation models using clinically relevant criteria.

Materials and Methods

We developed the AUGMENT framework (Assessing Utility of seGMENtation Tools), based on a structured classification of main categories of error in segmentation tasks. To evaluate the framework, we assembled a team of 20 clinicians covering a broad range of radiological expertise and analysed the challenging task of segmenting metastatic ovarian cancer using AI. We used three evaluation methods: (i) Dice Similarity Coefficient (DSC), (ii) visual Turing test, assessing 429 segmented disease-sites on 80 CT scans from the Cancer Imaging Atlas), and (iii) AUGMENT framework, where 3 radiologists and the AI-model created segmentations of 784 separate disease sites on 27 CT scans from a multi-institution dataset.

Results

The AI model had modest technical performance (DSC=72±19 for the pelvic and ovarian disease, and 64±24 for omental disease), and it failed the visual Turing test. However, the AUGMENT framework revealed that (i) the AI model produced segmentations of the same quality as radiologists ( p =.46), and (ii) it enabled radiologists to produce human+AI collaborative segmentations of significantly higher quality ( p =<.001) and in significantly less time ( p =<.001).

Conclusion

Quantitative performance metrics of segmentation algorithms can mask their clinical utility. The AUGMENT framework enables the systematic identification of clinically usable AI-models and highlights the importance of assessing the interaction between AI tools and radiologists.

Summary statement

Our framework, called AUGMENT, provides an objective assessment of the clinical utility of segmentation algorithms based on well-established error categories.

Key results

  • Combining quantitative metrics with qualitative information on performance from domain experts whose work is impacted by an algorithm’s use is a more accurate, transparent and trustworthy way of appraising an algorithm than using quantitative metrics alone.

  • The AUGMENT framework captures clinical utility in terms of segmentation quality and human+AI complementarity even in algorithms with modest technical segmentation performance.

  • AUGMENT might have utility during the development and validation process, including in segmentation challenges, for those seeking clinical translation, and to audit model performance after integration into clinical practice.

Article activity feed