Automated Video-Based Analysis of Surgical Meta-competencies Using Computer Vision

Josiah Aklilu
Joshua A. Villarreal
Chloe K. Nobuhara
Charlotte Egeland
Xiaohan Wang
Elaine Sui
Alan Brown
Matthew Leipzig
Reid Dale
Anita Rau
Alfred Song
Shelly Goel
Eric Sorenson
Vanessa Palter
Roger Bohn
Teodor Grantcharov
Jeffrey K. Jopling
Serena Yeung-Levy

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Traditional surgical training relies on an apprenticeship model, which is subjective and threatened by human bias. Performance metric scales attempt to offer more objective feedback by providing a structured grading rubric, but these scores are still ultimately subjective. Leveraging computer vision and artificial intelligence to assess surgical performance has the potential to shift subjective measurements into automated and objective feedback for trainees.

Materials and Methods

This retrospective, multi-institutional study analyzed 319 laparoscopic cholecystectomy videos from IRB-approved deidentified datasets and segmented the videos into 2862 clips. Using an internally validated video-based assessment rubric, we annotated video clips across five metacompetency domains: tissue handling, psychomotor skills, efficiency, dissection quality, and exposure quality. Short video segments ( < 90s) were rated on a 5-point scale by expert raters. We trained a deep learning model (DINOv2) to classify composite high (4–5) vs low (1–3) metacompetency scores, representing yes/no binary feedback, a realistic comparison to the operating room. Model performance was evaluated via area under the receiver operating characteristic curve.

Results

Among 2862 LC video clips, model performance was highest for dissection quality during the exposing gallbladder step (AUROC 91.5%, 95% confidence interval [CI], 84.5-96.5). Moderate performance was observed for efficiency (AUROC 72.6%, 95% CI 59.9-83.2) and exposure quality (AUROC 68.7%, 95% CI 55.2-81.8). Dissection and exposure quality scores during hepatocystic triangle dissection yielded AUROCs of 63.8% (95% CI 56.9-71.3) and 66.0% (95% CI 53.1-76.7), respectively.

Conclusion

We demonstrate the feasibility of a purely vision-based deep learning model to grade surgical skill based on metacompetencies with excellent performance during simple steps. This technique represents an advance over prior whole-video approaches that rely heavily on tool- tracking and kinematic data, and may lead to greater model performance by using binary feedback on increasingly specific step segmentation.

Version published to 10.1101/2025.11.24.25340912 on medRxiv
Nov 27, 2025

AI Performance on Image-based Medical Case Scenarios: A Cross-Sectional Comparative Study

This article has 6 authors:
1. Jia-Wei Liu
2. Yue-Tong Qian
3. Xiao Ma
4. Jun-Ping Fan
5. Lan-Wei Guo
6. Hong-Bo Yang
This article has no evaluationsLatest version Dec 13, 2025
Analysis of Performance Differences Between Self-Developed Fundus Surgery Image Evaluation Software and Kinovea in Ophthalmic Microsurgery

This article has 6 authors:
1. Haonan Xu
2. Yunwei Hu
3. Yincong Xu
4. Pisong Yan
5. Bowen Zhong
6. Weifeng Liu
This article has no evaluationsLatest version Dec 22, 2025
From Subjective to Objective: Validating Patient Satisfaction in Facial Surgery Through Psychometrics

This article has 1 author:
1. Maarten J. Ottenhof
This article has no evaluationsLatest version Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Background

Materials and Methods

Results

Conclusion

Article activity feed

Related articles

AI Performance on Image-based Medical Case Scenarios: A Cross-Sectional Comparative Study

Analysis of Performance Differences Between Self-Developed Fundus Surgery Image Evaluation Software and Kinovea in Ophthalmic Microsurgery

From Subjective to Objective: Validating Patient Satisfaction in Facial Surgery Through Psychometrics