Purpose: Surgical video review is essential for minimally invasive surgical training, but manual annotation of surgical steps is time-consuming and limits scalability. We propose a weakly supervised pre-training framework that leverages unannotated or heterogeneously labeled surgical videos to improve automated surgical step recognition.
Methods: We evaluate three types of weak labels derived from unannotated datasets: (1) surgical phases from the same or other procedures, (2) surgical steps from different procedure types, and (3) intraoperative time progression. Using datasets from four robotic-assisted procedures (sleeve gastrectomy, hysterectomy, cholecystectomy, and radical prostatectomy), we simulate real-world annotation scarcity by varying the proportion of available step annotations ( 0.25, 0.5, 0.75, 1.0). We benchmark the performance of a 2D CNN model trained with and without weak label pre-training.
Results: Pre-training with surgical phase labels-particularly from the same procedure type (PHASE-WITHIN)-consistently improved step recognition performance, with gains up to 6.4 f1-score points over standard ImageNet-based models under limited annotation conditions ( = 0.25 on SLG). Cross-procedure step pre-training was beneficial for some procedures, and time-based labels provided moderate gains depending on procedure structure. Label efficiency analysis shows the baseline model would require labeling an additional 30-60 videos at = 0.25 to match the performance achieved by the best weak-pretraining strategy across procedures.
Conclusion: Weakly supervised pre-training offers a practical strategy to improve surgical step recognition when annotated data is scarce. This approach can support scalable feedback and assessment in surgical training workflows where comprehensive annotations are infeasible.
扫码关注我们
求助内容:
应助结果提醒方式:
