Automated Video-Based Analysis of Surgical Meta-competencies Using Computer Vision
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Traditional surgical training relies on an apprenticeship model, which is subjective and threatened by human bias. Performance metric scales attempt to offer more objective feedback by providing a structured grading rubric, but these scores are still ultimately subjective. Leveraging computer vision and artificial intelligence to assess surgical performance has the potential to shift subjective measurements into automated and objective feedback for trainees.
Materials and Methods
This retrospective, multi-institutional study analyzed 319 laparoscopic cholecystectomy videos from IRB-approved deidentified datasets and segmented the videos into 2862 clips. Using an internally validated video-based assessment rubric, we annotated video clips across five metacompetency domains: tissue handling, psychomotor skills, efficiency, dissection quality, and exposure quality. Short video segments ( < 90s) were rated on a 5-point scale by expert raters. We trained a deep learning model (DINOv2) to classify composite high (4–5) vs low (1–3) metacompetency scores, representing yes/no binary feedback, a realistic comparison to the operating room. Model performance was evaluated via area under the receiver operating characteristic curve.
Results
Among 2862 LC video clips, model performance was highest for dissection quality during the exposing gallbladder step (AUROC 91.5%, 95% confidence interval [CI], 84.5-96.5). Moderate performance was observed for efficiency (AUROC 72.6%, 95% CI 59.9-83.2) and exposure quality (AUROC 68.7%, 95% CI 55.2-81.8). Dissection and exposure quality scores during hepatocystic triangle dissection yielded AUROCs of 63.8% (95% CI 56.9-71.3) and 66.0% (95% CI 53.1-76.7), respectively.
Conclusion
We demonstrate the feasibility of a purely vision-based deep learning model to grade surgical skill based on metacompetencies with excellent performance during simple steps. This technique represents an advance over prior whole-video approaches that rely heavily on tool- tracking and kinematic data, and may lead to greater model performance by using binary feedback on increasingly specific step segmentation.