Triage with AI: A Rule-out Framework Quantifying the Risks and Benefits of Screening Mammogram Automation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

AI has been proposed as a triage or “rule out” device to reduce radiologist workload, but it is presently unclear how an AI triage threshold should be determined. We present a framework for determining an optimal threshold.

Materials and Methods

114,229 bilateral 2D digital screening mammograms were retrospectively analyzed from 2006-2023. All mammograms were given an AI score using Mirai, an open-source deep-learning model. Several metrics were examined using two thresholds for determining ruled out versus retained cases: 1) Caseload Reduce Rate (CRR; percent of caseload reduced due to rule-out), 2) Gross AI False Omission Rate (G-FOR; probability of a patient having breast cancer if ruled out), 3) AI Net False Omission Rate (N-FOR; probability of a patient having breast cancer if ruled out and the radiologist would have caught in standard care [i.e. no triage].), 4) AI Adjusted Net False Omission Rate (30%) (AN-FOR[30%]; N-FOR adjusted for the hypothetical scenario where radiologists detect an extra 30% of breast cancers among AI retained cases). The two thresholds were severity scores of 0.2 (Yuden’s J) and 0.05 (AN-FOR[30%]=0). The former is mathematically optimal; the latter reflects a threshold where AI triage does not introduce any total increase in False Negatives.

Results

At the 0.20 threshold, G-FOR, N-FOR, and AN-FOR(30%) were 0.26%, 0.017%, and 0.14%, respectively (223, 141, and 121, respectively, missed cancer cases) and CRR=75%. At the 0.05 threshold, the G-FOR, N-FOR, and AN-FOR (30%) are 0.12%, 0.07%, and 0.00% (49, 30, and 0, respectively, missed cancer cases) and CRR=36%.

Conclusion

We demonstrate how radiology practices can consider the trade-offs of using different AI scores triage thresholds. At the AN-FOR rate of 30%, the Yuden’s J threshold results in 121 additional missed cancers for a 75% caseload reduction. We estimate no additional missed cancers at a 36% caseload reduction.

Article activity feed