Pub Date : 2026-01-06DOI: 10.1007/s11548-025-03562-3
L T Castro, C Barata, P Martins, F Afonso, M Pascoal, C Santiago, L Mennillo, P Mira, D Stoyanov, M Chand, S Bano, A S Soares
Purpose: Anatomical identification during abdominal surgery is subjective given unclear boundaries of anatomical structures. Semantic segmentation of these structures relies on an accurate identification of the boundaries which carries an unknown uncertainty. Given its inherent subjectivity, it is important to assess annotation adequacy. This study aims to evaluate variability in anatomical structure identification and segmentation using MedSAM by surgical residents.
Methods: Images from the Dresden Surgical Anatomy Dataset and the Endoscapes2023 Dataset were semantically annotated by a group of surgery residents using MedSAM in the following classes: abdominal wall, colon, liver, small bowel, spleen, stomach and gallbladder. Each class had 3 to 4 sets of annotations. Inter-annotator variability was assessed through DSC, ICC, BIoU and using the Simultaneous Truth and Performance Level Estimation algorithm to obtain a consensus mask and by calculating Fleiss' kappa agreement between all annotations and reference.
Results: The study showed strong inter-annotator agreement among surgical residents, with DSC values of 0.84-0.95 and Fleiss' kappa between 0.85 and 0.91. Surface area reliability was good to excellent (ICC = 0.62-0.91), while boundary delineation showed lower reproducibility (BIoU = 0.092-0.157). STAPLE consensus masks confirmed consistent overall shape annotations despite variability in boundary precision.
Conclusion: The study demonstrated low variability in the semantic segmentation of intraperitoneal organs in minimally invasive abdominal surgery, performed by surgical residents using MedSAM. While DSC and Fleiss' kappa values confirm strong inter-annotator agreement, the relatively low BIoU values point to challenges in boundary precision, especially for anatomically complex or variable structures. These results establish a benchmark for expanding annotation efforts to larger datasets and more detailed anatomical features.
{"title":"Benchmarking variability in semantic segmentation in minimally invasive abdominal surgery.","authors":"L T Castro, C Barata, P Martins, F Afonso, M Pascoal, C Santiago, L Mennillo, P Mira, D Stoyanov, M Chand, S Bano, A S Soares","doi":"10.1007/s11548-025-03562-3","DOIUrl":"https://doi.org/10.1007/s11548-025-03562-3","url":null,"abstract":"<p><strong>Purpose: </strong>Anatomical identification during abdominal surgery is subjective given unclear boundaries of anatomical structures. Semantic segmentation of these structures relies on an accurate identification of the boundaries which carries an unknown uncertainty. Given its inherent subjectivity, it is important to assess annotation adequacy. This study aims to evaluate variability in anatomical structure identification and segmentation using MedSAM by surgical residents.</p><p><strong>Methods: </strong>Images from the Dresden Surgical Anatomy Dataset and the Endoscapes2023 Dataset were semantically annotated by a group of surgery residents using MedSAM in the following classes: abdominal wall, colon, liver, small bowel, spleen, stomach and gallbladder. Each class had 3 to 4 sets of annotations. Inter-annotator variability was assessed through DSC, ICC, BIoU and using the Simultaneous Truth and Performance Level Estimation algorithm to obtain a consensus mask and by calculating Fleiss' kappa agreement between all annotations and reference.</p><p><strong>Results: </strong>The study showed strong inter-annotator agreement among surgical residents, with DSC values of 0.84-0.95 and Fleiss' kappa between 0.85 and 0.91. Surface area reliability was good to excellent (ICC = 0.62-0.91), while boundary delineation showed lower reproducibility (BIoU = 0.092-0.157). STAPLE consensus masks confirmed consistent overall shape annotations despite variability in boundary precision.</p><p><strong>Conclusion: </strong>The study demonstrated low variability in the semantic segmentation of intraperitoneal organs in minimally invasive abdominal surgery, performed by surgical residents using MedSAM. While DSC and Fleiss' kappa values confirm strong inter-annotator agreement, the relatively low BIoU values point to challenges in boundary precision, especially for anatomically complex or variable structures. These results establish a benchmark for expanding annotation efforts to larger datasets and more detailed anatomical features.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1007/s11548-025-03566-z
Behnaz Gheflati, Morteza Mirzaei, Joel Zuhars, Sunil Rottoo, Hassan Rivaz
Purpose: Computer-assisted surgical navigation systems have been developed to improve the precision of total knee arthroplasty (TKA) by providing real-time guidance on implant alignment relative to patient anatomy. However, surface registration remains a key source of error that can propagate through the surgical workflow. This study investigates how patient-specific femoral bone geometry influences registration accuracy, aiming to enhance the reliability and consistency of computer-assisted orthopedic procedures.
Methods: Eighteen high-fidelity 3D-printed femur models were used to simulate intraoperative digitization. Surface points collected from the distal femur were registered to preoperative CT-derived models using a rigid iterative closest point (ICP) algorithm. Registration accuracy was quantified across six degrees of freedom. An in-house statistical shape model (SSM), built from 114 CT femurs, was employed to extract shape coefficients and correlate them with the measured registration errors. To verify robustness, additional analyses were conducted using synthetic and in silico CT-based femur datasets.
Results: Significant correlations (p-values < 0.05) were observed between specific shape coefficients and registration errors. The third and fourth principal shape modes showed the strongest associations with rotational misalignments, particularly flexion-extension and varus-valgus components. These findings demonstrate that geometric variability in the distal femur, especially condylar morphology, plays a major role in determining the stability and accuracy of surface-based registration.
Conclusions: Registration errors in TKA are strongly influenced by patient-specific bone geometry. Shape features derived from statistical shape models can serve as reliable predictors of registration performance, providing quantitative insight into how anatomical variability impacts surgical precision and alignment accuracy in computer-assisted total knee arthroplasty.
{"title":"Statistical shape model-based estimation of registration error in computer-assisted total knee arthroplasty.","authors":"Behnaz Gheflati, Morteza Mirzaei, Joel Zuhars, Sunil Rottoo, Hassan Rivaz","doi":"10.1007/s11548-025-03566-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03566-z","url":null,"abstract":"<p><strong>Purpose: </strong>Computer-assisted surgical navigation systems have been developed to improve the precision of total knee arthroplasty (TKA) by providing real-time guidance on implant alignment relative to patient anatomy. However, surface registration remains a key source of error that can propagate through the surgical workflow. This study investigates how patient-specific femoral bone geometry influences registration accuracy, aiming to enhance the reliability and consistency of computer-assisted orthopedic procedures.</p><p><strong>Methods: </strong>Eighteen high-fidelity 3D-printed femur models were used to simulate intraoperative digitization. Surface points collected from the distal femur were registered to preoperative CT-derived models using a rigid iterative closest point (ICP) algorithm. Registration accuracy was quantified across six degrees of freedom. An in-house statistical shape model (SSM), built from 114 CT femurs, was employed to extract shape coefficients and correlate them with the measured registration errors. To verify robustness, additional analyses were conducted using synthetic and in silico CT-based femur datasets.</p><p><strong>Results: </strong>Significant correlations (p-values < 0.05) were observed between specific shape coefficients and registration errors. The third and fourth principal shape modes showed the strongest associations with rotational misalignments, particularly flexion-extension and varus-valgus components. These findings demonstrate that geometric variability in the distal femur, especially condylar morphology, plays a major role in determining the stability and accuracy of surface-based registration.</p><p><strong>Conclusions: </strong>Registration errors in TKA are strongly influenced by patient-specific bone geometry. Shape features derived from statistical shape models can serve as reliable predictors of registration performance, providing quantitative insight into how anatomical variability impacts surgical precision and alignment accuracy in computer-assisted total knee arthroplasty.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-07-19DOI: 10.1007/s11548-025-03481-3
Mohammad R Salmanpour, Amin Mousavi, Yixi Xu, William B Weeks, Ilker Hacihaliloglu
Purpose: Image-to-image (I2I) translation networks have emerged as promising tools for generating synthetic medical images; however, their clinical reliability and ability to preserve diagnostically relevant features remain underexplored. This study evaluates the performance of state-of-the-art 2D/3D I2I networks for converting ultrasound (US) images to synthetic MRI in prostate cancer (PCa) imaging. The novelty lies in combining radiomics, expert clinical evaluation, and classification performance to comprehensively benchmark these models for potential integration into real-world diagnostic workflows.
Methods: A dataset of 794 PCa patients was analyzed using ten leading I2I networks to synthesize MRI from US input. Radiomics feature (RF) analysis was performed using Spearman correlation to assess whether high-performing networks (SSIM > 0.85) preserved quantitative imaging biomarkers. A qualitative evaluation by seven experienced physicians assessed the anatomical realism, presence of artifacts, and diagnostic interpretability of synthetic images. Additionally, classification tasks using synthetic images were conducted using two machine learning and one deep learning model to assess the practical diagnostic benefit.
Results: Among all networks, 2D-Pix2Pix achieved the highest SSIM (0.855 ± 0.032). RF analysis showed that 76 out of 186 features were preserved post-translation, while the remainder were degraded or lost. Qualitative feedback revealed consistent issues with low-level feature preservation and artifact generation, particularly in lesion-rich regions. These evaluations were conducted to assess whether synthetic MRI retained clinically relevant patterns, supported expert interpretation, and improved diagnostic accuracy. Importantly, classification performance using synthetic MRI significantly exceeded that of US-based input, achieving average accuracy and AUC of ~ 0.93 ± 0.05.
Conclusion: Although 2D-Pix2Pix showed the best overall performance in similarity and partial RF preservation, improvements are still required in lesion-level fidelity and artifact suppression. The combination of radiomics, qualitative, and classification analyses offered a holistic view of the current strengths and limitations of I2I models, supporting their potential in clinical applications pending further refinement and validation.
{"title":"Influence of high-performance image-to-image translation networks on clinical visual assessment and outcome prediction: utilizing ultrasound to MRI translation in prostate cancer.","authors":"Mohammad R Salmanpour, Amin Mousavi, Yixi Xu, William B Weeks, Ilker Hacihaliloglu","doi":"10.1007/s11548-025-03481-3","DOIUrl":"10.1007/s11548-025-03481-3","url":null,"abstract":"<p><strong>Purpose: </strong>Image-to-image (I2I) translation networks have emerged as promising tools for generating synthetic medical images; however, their clinical reliability and ability to preserve diagnostically relevant features remain underexplored. This study evaluates the performance of state-of-the-art 2D/3D I2I networks for converting ultrasound (US) images to synthetic MRI in prostate cancer (PCa) imaging. The novelty lies in combining radiomics, expert clinical evaluation, and classification performance to comprehensively benchmark these models for potential integration into real-world diagnostic workflows.</p><p><strong>Methods: </strong>A dataset of 794 PCa patients was analyzed using ten leading I2I networks to synthesize MRI from US input. Radiomics feature (RF) analysis was performed using Spearman correlation to assess whether high-performing networks (SSIM > 0.85) preserved quantitative imaging biomarkers. A qualitative evaluation by seven experienced physicians assessed the anatomical realism, presence of artifacts, and diagnostic interpretability of synthetic images. Additionally, classification tasks using synthetic images were conducted using two machine learning and one deep learning model to assess the practical diagnostic benefit.</p><p><strong>Results: </strong>Among all networks, 2D-Pix2Pix achieved the highest SSIM (0.855 ± 0.032). RF analysis showed that 76 out of 186 features were preserved post-translation, while the remainder were degraded or lost. Qualitative feedback revealed consistent issues with low-level feature preservation and artifact generation, particularly in lesion-rich regions. These evaluations were conducted to assess whether synthetic MRI retained clinically relevant patterns, supported expert interpretation, and improved diagnostic accuracy. Importantly, classification performance using synthetic MRI significantly exceeded that of US-based input, achieving average accuracy and AUC of ~ 0.93 ± 0.05.</p><p><strong>Conclusion: </strong>Although 2D-Pix2Pix showed the best overall performance in similarity and partial RF preservation, improvements are still required in lesion-level fidelity and artifact suppression. The combination of radiomics, qualitative, and classification analyses offered a holistic view of the current strengths and limitations of I2I models, supporting their potential in clinical applications pending further refinement and validation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"125-135"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144668937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-08-02DOI: 10.1007/s11548-025-03486-y
Botao Yang, Chunming Li, Simone Fezzi, Zehao Fan, Runguo Wei, Yankai Chen, Domenico Tavella, Flavio L Ribichini, Su Zhang, Faisal Sharif, Shengxian Tu
Purpose: Accurate segmentation of renal arteries from X-ray angiography videos is crucial for evaluating renal sympathetic denervation (RDN) procedures but remains challenging due to dynamic changes in contrast concentration and vessel morphology across frames. The purpose of this study is to propose TCA-Net, a deep learning model that improves segmentation consistency by leveraging local and global contextual information in angiography videos.
Methods: Our approach utilizes a novel deep learning framework that incorporates two key modules: a local temporal window vessel enhancement module and a global vessel refinement module (GVR). The local module fuses multi-scale temporal-spatial features to improve the semantic representation of vessels in the current frame, while the GVR module integrates decoupled attention strategies (video-level and object-level attention) and gating mechanisms to refine global vessel information and eliminate redundancy. To further improve segmentation consistency, a temporal perception consistency loss function is introduced during training.
Results: We evaluated our model using 195 renal artery angiography sequences for development and tested it on an external dataset from 44 patients. The results demonstrate that TCA-Net achieves an F1-score of 0.8678 for segmenting renal arteries, outperforming existing state-of-the-art segmentation methods.
Conclusion: We present TCA-Net, a deep learning-based model that significantly improves segmentation consistency for renal artery angiography videos. By effectively leveraging both local and global temporal contextual information, TCA-Net outperforms current methods and provides a reliable tool for assessing RDN procedures.
{"title":"Temporal consistency-aware network for renal artery segmentation in X-ray angiography.","authors":"Botao Yang, Chunming Li, Simone Fezzi, Zehao Fan, Runguo Wei, Yankai Chen, Domenico Tavella, Flavio L Ribichini, Su Zhang, Faisal Sharif, Shengxian Tu","doi":"10.1007/s11548-025-03486-y","DOIUrl":"10.1007/s11548-025-03486-y","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate segmentation of renal arteries from X-ray angiography videos is crucial for evaluating renal sympathetic denervation (RDN) procedures but remains challenging due to dynamic changes in contrast concentration and vessel morphology across frames. The purpose of this study is to propose TCA-Net, a deep learning model that improves segmentation consistency by leveraging local and global contextual information in angiography videos.</p><p><strong>Methods: </strong>Our approach utilizes a novel deep learning framework that incorporates two key modules: a local temporal window vessel enhancement module and a global vessel refinement module (GVR). The local module fuses multi-scale temporal-spatial features to improve the semantic representation of vessels in the current frame, while the GVR module integrates decoupled attention strategies (video-level and object-level attention) and gating mechanisms to refine global vessel information and eliminate redundancy. To further improve segmentation consistency, a temporal perception consistency loss function is introduced during training.</p><p><strong>Results: </strong>We evaluated our model using 195 renal artery angiography sequences for development and tested it on an external dataset from 44 patients. The results demonstrate that TCA-Net achieves an F1-score of 0.8678 for segmenting renal arteries, outperforming existing state-of-the-art segmentation methods.</p><p><strong>Conclusion: </strong>We present TCA-Net, a deep learning-based model that significantly improves segmentation consistency for renal artery angiography videos. By effectively leveraging both local and global temporal contextual information, TCA-Net outperforms current methods and provides a reliable tool for assessing RDN procedures.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"71-81"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144769280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-07-02DOI: 10.1007/s11548-025-03472-4
David Gastager, Ghazal Ghazaei, Constantin Patsch
Purpose: Automated surgical workflow analysis is a common yet challenging task with diverse applications in surgical education, research, and clinical decision-making. Although videos are commonly collected during surgical interventions, the lack of annotated datasets hinders the development of accurate and comprehensive workflow analysis solutions. We introduce a novel approach for addressing the sparsity and heterogeneity of annotated training data inspired by the human learning procedure of watching experts and understanding their explanations.
Methods: Our method leverages a video-language model trained on alignment, denoising, and generative tasks to learn short-term spatio-temporal and multimodal representations. A task-specific temporal model is then used to capture relationships across entire videos. To achieve comprehensive video-language understanding in the surgical domain, we introduce a data collection and filtering strategy to construct a large-scale pretraining dataset from educational YouTube videos. We then utilize parameter-efficient fine-tuning by projecting downstream task annotations from publicly available surgical datasets into the language domain.
Results: Extensive experiments in two surgical domains demonstrate the effectiveness of our approach, with performance improvements of up to 7% in phase segmentation tasks, 5% in zero-shot phase segmentation, and comparable capabilities to fully supervised models in few-shot settings. Harnessing our model's capabilities for long-range temporal localization and text generation, we present the first comprehensive solution for dense video captioning (DVC) of surgical videos, addressing this task despite the absence of existing DVC datasets in the surgical domain.
Conclusion: We introduce a novel approach to surgical workflow understanding that leverages video-language pretraining, large-scale video pretraining, and optimized fine-tuning. Our method improves performance over state-of-the-art techniques and enables new downstream tasks for surgical video understanding.
{"title":"Watch and learn: leveraging expert knowledge and language for surgical video understanding.","authors":"David Gastager, Ghazal Ghazaei, Constantin Patsch","doi":"10.1007/s11548-025-03472-4","DOIUrl":"10.1007/s11548-025-03472-4","url":null,"abstract":"<p><strong>Purpose: </strong>Automated surgical workflow analysis is a common yet challenging task with diverse applications in surgical education, research, and clinical decision-making. Although videos are commonly collected during surgical interventions, the lack of annotated datasets hinders the development of accurate and comprehensive workflow analysis solutions. We introduce a novel approach for addressing the sparsity and heterogeneity of annotated training data inspired by the human learning procedure of watching experts and understanding their explanations.</p><p><strong>Methods: </strong>Our method leverages a video-language model trained on alignment, denoising, and generative tasks to learn short-term spatio-temporal and multimodal representations. A task-specific temporal model is then used to capture relationships across entire videos. To achieve comprehensive video-language understanding in the surgical domain, we introduce a data collection and filtering strategy to construct a large-scale pretraining dataset from educational YouTube videos. We then utilize parameter-efficient fine-tuning by projecting downstream task annotations from publicly available surgical datasets into the language domain.</p><p><strong>Results: </strong>Extensive experiments in two surgical domains demonstrate the effectiveness of our approach, with performance improvements of up to 7% in phase segmentation tasks, 5% in zero-shot phase segmentation, and comparable capabilities to fully supervised models in few-shot settings. Harnessing our model's capabilities for long-range temporal localization and text generation, we present the first comprehensive solution for dense video captioning (DVC) of surgical videos, addressing this task despite the absence of existing DVC datasets in the surgical domain.</p><p><strong>Conclusion: </strong>We introduce a novel approach to surgical workflow understanding that leverages video-language pretraining, large-scale video pretraining, and optimized fine-tuning. Our method improves performance over state-of-the-art techniques and enables new downstream tasks for surgical video understanding.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"185-194"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144546070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Cataract surgery is among the most frequently performed procedures worldwide. Accurate, real-time segmentation of the cornea and surgical instruments is vital for intraoperative guidance and surgical education. However, most existing deep learning-based segmentation methods depend on pixel-level annotations, which are time-consuming and limit practical deployment.
Methods: We present EllipseNet, an anchor-free framework utilizing ellipse-based modeling for real-time corneal segmentation in cataract surgery. Built upon the Hourglass network for feature extraction, EllipseNet requires only simple rectangular bounding box annotations from users. It then autonomously infers the major and minor axes of the corneal ellipse, generating elliptical bounding boxes that more precisely match corneal shapes.
Results: EllipseNet achieves efficient real-time performance by segmenting each image within 42 ms and attaining a Dice accuracy of 95.81%. It delivers segmentation speed nearly three times faster than state-of-the-art models, while maintaining similar accuracy levels.
Conclusion: EllipseNet provides rapid and accurate corneal segmentation in real time, significantly reducing annotation workload for practitioners. Its design streamlines the segmentation pipeline, lowering the barrier for clinical application. The source code is publicly available at: https://github.com/shixueyi/corneal-segmentation .
{"title":"Real-time corneal image segmentation for cataract surgery based on detection framework.","authors":"Xueyi Shi, Dexun Zhang, Shenwen Liang, Wenjing Meng, Huoling Luo, Tianqiao Zhang","doi":"10.1007/s11548-025-03506-x","DOIUrl":"10.1007/s11548-025-03506-x","url":null,"abstract":"<p><strong>Objective: </strong>Cataract surgery is among the most frequently performed procedures worldwide. Accurate, real-time segmentation of the cornea and surgical instruments is vital for intraoperative guidance and surgical education. However, most existing deep learning-based segmentation methods depend on pixel-level annotations, which are time-consuming and limit practical deployment.</p><p><strong>Methods: </strong>We present EllipseNet, an anchor-free framework utilizing ellipse-based modeling for real-time corneal segmentation in cataract surgery. Built upon the Hourglass network for feature extraction, EllipseNet requires only simple rectangular bounding box annotations from users. It then autonomously infers the major and minor axes of the corneal ellipse, generating elliptical bounding boxes that more precisely match corneal shapes.</p><p><strong>Results: </strong>EllipseNet achieves efficient real-time performance by segmenting each image within 42 ms and attaining a Dice accuracy of 95.81%. It delivers segmentation speed nearly three times faster than state-of-the-art models, while maintaining similar accuracy levels.</p><p><strong>Conclusion: </strong>EllipseNet provides rapid and accurate corneal segmentation in real time, significantly reducing annotation workload for practitioners. Its design streamlines the segmentation pipeline, lowering the barrier for clinical application. The source code is publicly available at: https://github.com/shixueyi/corneal-segmentation .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"83-92"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-08-23DOI: 10.1007/s11548-025-03448-4
Roger D Soberanis-Mukul, Rohit Shankar, Lalithkumar Seenivasan, Jose L Porras, Masaru Ishii, Mathias Unberath
Purpose: Predicting surgical time completion helps streamline surgical workflow and OR utilization, enhancing hospital efficacy. When time prediction is based on interventional video of the surgical site, time predictions may correlate with technical proficiency of the surgeon because skill is a useful proxy of completion time. To understand features that are predictive of surgical time in surgical site video, we develop prototype-like visual explanations, making them applicable to video sequences.
Methods: We introduce an interpretable method for predicting surgical duration by identifying prototype-like patterns within egocentric video of the surgical site. Unlike conventional image-based prototype models that generate patch-based prototypes, our method extracts video-based explanations tied to segments of surgical videos with similar time deviation patterns. We achieve this by comparing the principal components of feature representation differences at various time points in the predictions. To effectively capture long-range dependencies in the prediction task, we employ an informer as the primary predictive model.
Results: This model is applied to a dataset of 42 point-of-view craniotomy videos, collected under an approved IRB protocol. On average, our interpretable model performs better than the baseline models in surgical time completion.
Conclusion: Our approach not only contributes to the interpretability of surgical time predictions but also takes full advantage of the detailed information provided by surgical video data.
{"title":"The interpretable surgical temporal informer: explainable surgical time completion prediction.","authors":"Roger D Soberanis-Mukul, Rohit Shankar, Lalithkumar Seenivasan, Jose L Porras, Masaru Ishii, Mathias Unberath","doi":"10.1007/s11548-025-03448-4","DOIUrl":"10.1007/s11548-025-03448-4","url":null,"abstract":"<p><strong>Purpose: </strong>Predicting surgical time completion helps streamline surgical workflow and OR utilization, enhancing hospital efficacy. When time prediction is based on interventional video of the surgical site, time predictions may correlate with technical proficiency of the surgeon because skill is a useful proxy of completion time. To understand features that are predictive of surgical time in surgical site video, we develop prototype-like visual explanations, making them applicable to video sequences.</p><p><strong>Methods: </strong>We introduce an interpretable method for predicting surgical duration by identifying prototype-like patterns within egocentric video of the surgical site. Unlike conventional image-based prototype models that generate patch-based prototypes, our method extracts video-based explanations tied to segments of surgical videos with similar time deviation patterns. We achieve this by comparing the principal components of feature representation differences at various time points in the predictions. To effectively capture long-range dependencies in the prediction task, we employ an informer as the primary predictive model.</p><p><strong>Results: </strong>This model is applied to a dataset of 42 point-of-view craniotomy videos, collected under an approved IRB protocol. On average, our interpretable model performs better than the baseline models in surgical time completion.</p><p><strong>Conclusion: </strong>Our approach not only contributes to the interpretability of surgical time predictions but also takes full advantage of the detailed information provided by surgical video data.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"11-19"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: In augmented reality (AR)-guided laparoscopic liver surgery, accurate segmentation of liver landmarks is crucial for precise 3D-2D registration. However, existing methods struggle with complex structures, limited data, and class imbalance. In this study, we propose a novel approach to improve landmark segmentation performance by leveraging liver mask prediction.
Methods: We propose a dual-decoder model enhanced by a pre-trained segment anything model (SAM) encoder, where one decoder segments the liver and the other focuses on liver landmarks. The SAM encoder provides robust features for liver mask prediction, improving generalizability. A liver-guided consistency constraint establishes fine-grained spatial consistency between liver regions and landmarks, enhancing segmentation accuracy through detailed spatial modeling.
Results: The proposed method achieved state-of-the-art performance in liver landmark segmentation on two public laparoscopic datasets. By addressing feature entanglement, the dual-decoder framework with SAM and consistency constraints significantly improved segmentation in complex surgical scenarios.
Conclusion: The SAM-enhanced dual-decoder network, incorporating liver-guided consistency constraints, offers a promising solution for 2D landmark segmentation in AR-guided laparoscopic surgery. By mutually reinforcing liver mask and landmark segmentation, the method achieves improved accuracy and robustness for intraoperative applications.
{"title":"Liver mask-guided SAM-enhanced dual-decoder network for landmark segmentation in AR-guided surgery.","authors":"Xukun Zhang, Sharib Ali, Yanlan Kang, Jingyi Zhu, Minghao Han, Le Wang, Xiaoying Wang, Lihua Zhang","doi":"10.1007/s11548-025-03516-9","DOIUrl":"10.1007/s11548-025-03516-9","url":null,"abstract":"<p><strong>Purpose: </strong>In augmented reality (AR)-guided laparoscopic liver surgery, accurate segmentation of liver landmarks is crucial for precise 3D-2D registration. However, existing methods struggle with complex structures, limited data, and class imbalance. In this study, we propose a novel approach to improve landmark segmentation performance by leveraging liver mask prediction.</p><p><strong>Methods: </strong>We propose a dual-decoder model enhanced by a pre-trained segment anything model (SAM) encoder, where one decoder segments the liver and the other focuses on liver landmarks. The SAM encoder provides robust features for liver mask prediction, improving generalizability. A liver-guided consistency constraint establishes fine-grained spatial consistency between liver regions and landmarks, enhancing segmentation accuracy through detailed spatial modeling.</p><p><strong>Results: </strong>The proposed method achieved state-of-the-art performance in liver landmark segmentation on two public laparoscopic datasets. By addressing feature entanglement, the dual-decoder framework with SAM and consistency constraints significantly improved segmentation in complex surgical scenarios.</p><p><strong>Conclusion: </strong>The SAM-enhanced dual-decoder network, incorporating liver-guided consistency constraints, offers a promising solution for 2D landmark segmentation in AR-guided laparoscopic surgery. By mutually reinforcing liver mask and landmark segmentation, the method achieves improved accuracy and robustness for intraoperative applications.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"115-124"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-11-08DOI: 10.1007/s11548-025-03545-4
Atharva Paralikar, Gang Li, Chima Oluigbo, Pavel Yarmolenko, Kevin Cleary, Reza Monfaredi
Purpose: This article reports on the development and feasibility testing of an MR-safe robotic needle driver. The needle driver is pneumatically actuated and designed for automatic insertion and extraction of needles along a straight trajectory within the MRI scanner.
Method: All parts use plastic resins and composite materials to ensure MR-safe operation. A needle could be clamped in the needle carriage using a pneumatically operated clamp. The clamp is designed to be easily attached and detached from the needle driver. Clamps with different opening sizes could accommodate a range of needles from 18 to 22 gauge. To mimic the manual procedure of needle insertion, a pneumatically operated rack-and-pinion mechanism simultaneously translates and rotates the needle carriage along a helical slot. Signal-to-noise ratio (SNR) and 2-D geometric distortion were measured to evaluate the MRI compatibility. Targeting was measured with an electromagnetic tracker. We also evaluated the maximum force that could be generated at the tip of the needle with different clamping pressures using a force sensor.
Results: We recorded the maximum percentage change in SNR for multiple configurations of needle drivers as 6.6% and the maximum geometric distortion at 0.24%. The needle driver's mean positioning accuracy for 105 targets at 50 mm depth was 2.38 ± 1.00 mm in a composite tissue phantom. The angulation error for the straight trajectory was 0.51°, and the mean linear trajectory deviation was statistically negligible. The measured force at the needle tip was 1.17N, 1.6N, and 2.12N at 30, 40, and 50 psi, respectively.
Conclusion: This preliminary study showed that the prototype of our robotic needle driver works as intended for the insertion and extraction of the needle. The driver is MR-safe and serves as a suitable platform for MRI-guided interventions.
{"title":"MR-safe robotic needle driver for real-time MRI-guided minimally invasive procedures: a feasibility study.","authors":"Atharva Paralikar, Gang Li, Chima Oluigbo, Pavel Yarmolenko, Kevin Cleary, Reza Monfaredi","doi":"10.1007/s11548-025-03545-4","DOIUrl":"10.1007/s11548-025-03545-4","url":null,"abstract":"<p><strong>Purpose: </strong>This article reports on the development and feasibility testing of an MR-safe robotic needle driver. The needle driver is pneumatically actuated and designed for automatic insertion and extraction of needles along a straight trajectory within the MRI scanner.</p><p><strong>Method: </strong>All parts use plastic resins and composite materials to ensure MR-safe operation. A needle could be clamped in the needle carriage using a pneumatically operated clamp. The clamp is designed to be easily attached and detached from the needle driver. Clamps with different opening sizes could accommodate a range of needles from 18 to 22 gauge. To mimic the manual procedure of needle insertion, a pneumatically operated rack-and-pinion mechanism simultaneously translates and rotates the needle carriage along a helical slot. Signal-to-noise ratio (SNR) and 2-D geometric distortion were measured to evaluate the MRI compatibility. Targeting was measured with an electromagnetic tracker. We also evaluated the maximum force that could be generated at the tip of the needle with different clamping pressures using a force sensor.</p><p><strong>Results: </strong>We recorded the maximum percentage change in SNR for multiple configurations of needle drivers as 6.6% and the maximum geometric distortion at 0.24%. The needle driver's mean positioning accuracy for 105 targets at 50 mm depth was 2.38 ± 1.00 mm in a composite tissue phantom. The angulation error for the straight trajectory was 0.51°, and the mean linear trajectory deviation was statistically negligible. The measured force at the needle tip was 1.17N, 1.6N, and 2.12N at 30, 40, and 50 psi, respectively.</p><p><strong>Conclusion: </strong>This preliminary study showed that the prototype of our robotic needle driver works as intended for the insertion and extraction of the needle. The driver is MR-safe and serves as a suitable platform for MRI-guided interventions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"39-47"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-08-18DOI: 10.1007/s11548-025-03497-9
Lotta Orsmaa, Mikko Saukkoriipi, Jari Kangas, Nastaran Rasouli, Jorma Järnstedt, Helena Mehtonen, Jaakko Sahlsten, Joel Jaskari, Kimmo Kaski, Roope Raisamo
Purpose: Artificial intelligence (AI) achieves high-quality annotations of radiological images, yet often lacks the robustness required in clinical practice. Interactive annotation starts with an AI-generated delineation, allowing radiologists to refine it with feedback, potentially improving precision and reliability. These techniques have been explored in two-dimensional desktop environments, but are not validated by radiologists or integrated with immersive visualization technologies. We used a Virtual Reality (VR) system to determine whether (1) the annotation quality improves when radiologists can edit the AI annotation and (2) whether the extra work done by editing is worthwhile.
Methods: We evaluated the clinical feasibility of an interactive VR approach to annotate mandibular and mental foramina on segmented 3D mandibular models. Three experienced dentomaxillofacial radiologists reviewed AI-generated annotations and, when needed, refined them at the voxel level in 3D space through click-based interactions until clinical standards were met.
Results: Our results indicate that integrating expert feedback within an immersive VR environment enhances annotation accuracy, improves clinical usability, and offers valuable insights for developing medical image analysis systems incorporating radiologist input.
Conclusion: This study is the first to compare the quality of original and interactive AI annotation and to use radiologists' opinions as the measure. More research is needed for generalization.
{"title":"Interactive AI annotation of medical images in a virtual reality environment.","authors":"Lotta Orsmaa, Mikko Saukkoriipi, Jari Kangas, Nastaran Rasouli, Jorma Järnstedt, Helena Mehtonen, Jaakko Sahlsten, Joel Jaskari, Kimmo Kaski, Roope Raisamo","doi":"10.1007/s11548-025-03497-9","DOIUrl":"10.1007/s11548-025-03497-9","url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence (AI) achieves high-quality annotations of radiological images, yet often lacks the robustness required in clinical practice. Interactive annotation starts with an AI-generated delineation, allowing radiologists to refine it with feedback, potentially improving precision and reliability. These techniques have been explored in two-dimensional desktop environments, but are not validated by radiologists or integrated with immersive visualization technologies. We used a Virtual Reality (VR) system to determine whether (1) the annotation quality improves when radiologists can edit the AI annotation and (2) whether the extra work done by editing is worthwhile.</p><p><strong>Methods: </strong>We evaluated the clinical feasibility of an interactive VR approach to annotate mandibular and mental foramina on segmented 3D mandibular models. Three experienced dentomaxillofacial radiologists reviewed AI-generated annotations and, when needed, refined them at the voxel level in 3D space through click-based interactions until clinical standards were met.</p><p><strong>Results: </strong>Our results indicate that integrating expert feedback within an immersive VR environment enhances annotation accuracy, improves clinical usability, and offers valuable insights for developing medical image analysis systems incorporating radiologist input.</p><p><strong>Conclusion: </strong>This study is the first to compare the quality of original and interactive AI annotation and to use radiologists' opinions as the measure. More research is needed for generalization.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"49-58"},"PeriodicalIF":2.3,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12929304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}