Extended KAFR: A kinematic-adaptive paradigm for the efficient analysis of surgical video
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Artificial Intelligence (AI) is increasingly being utilized in surgical video analysis for applications such as phase recognition, skill assessment, and workflow optimization. A crucial challenge is the length of surgical recordings, which can range from one to several hours, causing a significant computational burden. Prior work established Kinematics Adaptive Frame Recognition (KAFR) for robotic surgery, demonstrating that tracking tool motion can effectively identify frames associated with relevant surgical activity while filtering redundant content. However, laparoscopic surgery presents additional challenges, including manual camera control, which results in frequent motion artifacts, and image quality that is often inferior to that of robotic systems. This study evaluates whether KAFR generalizes to the more challenging laparoscopic setting on the Cholec80 benchmark, a widely used dataset comprising 80 laparoscopic cholecystectomy procedures annotated for seven surgical phases. Our approach is divided into three main phases: (1) Tracking phase: a fine-tuned YOLO model detects and segments surgical tools; (2) Selection phase: frames are adaptively selected based on tool displacement (Adaptive 1) or velocity variation (Adaptive 2); (3) Classification phase: an X3D model classifies selected frames into surgical phases. The proposed approach achieved a 91.3% F1 score, utilizing only 0.58% of the total frames (a seven-fold reduction in processed frames compared to typical sampling methods), yet the model maintains performance comparable to state-of-the-art models such as LoViT (90.2%) and Trans-SVNet (89.7%). These results demonstrate that the kinematics-based strategy transfers effectively to the challenging laparoscopic environment.