首页 > 最新文献

International Journal of Computer Assisted Radiology and Surgery最新文献

英文 中文
Toward precision surgical education: quantitative evaluation of surgical performance using instantaneous screw axes. 走向精准外科教育:利用瞬时螺旋轴定量评价手术效果。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-26 DOI: 10.1007/s11548-025-03565-0
Heath Boyea, James Korndorffer, Ann Majewicz Fey

Purpose: In surgical skill development, a trainee's goal is to move through the learning curve and achieve expert performance. Goal directed training, with expert performance as the goal, can facilitate skill development; however, there are currently few methods available to encode expert-like simulation performance into learning strategies that can be practiced independently.

Methods: We propose a novel method of surgical simulation skill analysis through segmenting and evaluating kinematic data with instantaneous screw axes (ISA) theory and K-means clustering. In ISA, single degree of freedom (DOF) tasks can be represented as displacements about a single screw axis; however, we propose extending this method to more complex tasks defining them with clusters of similar ISAs in the surgical environment, decomposing them into a sequence of 1DOF movements. In this paper, we present an ISA algorithm and apply it to surgeon manipulator poses across fourteen suturing and knot-tying gestures obtained from the JIGSAWS surgical dataset. We also apply this method to entire simulated suturing demonstrations across a 6-month training period from the BGU-SKILLS dataset. We implemented K-means clustering to segment these movements into sub-gestures. We hypothesize that individuals with greater levels of expertise should exhibit more concise actions with minimal extraneous movement; therefore, fewer clusters should be required to decompose their simulation performance.

Results: Our ISA algorithm was applied to 1136 gestures from ten surgeons across three skill levels and 324 unsegmented demonstrations collected from 18 surgical residents over a training period of 6 months. We performed a Kruskal-Wallis analysis with a Dunn-Sidak post-hoc test on the number of ISA clusters required to decompose each gesture. We found that highly task-constrained gestures required significantly fewer numbers of clusters for expert and/or intermediate groups when compared to novices on suturing tasks only.

Conclusion: Our results suggest that this method can be used to identify task-constrained gestures within independently performed suturing surgical simulations and classify them into higher skill and lower skill sets. This analysis can also provide geometric feedback on performed gestures vs expert gestures, providing personalized automated performance analysis for surgical trainees leading to personalized educational training.

目的:在外科技能发展中,受训人员的目标是通过学习曲线,达到专家水平。目标导向训练,以专家表现为目标,促进技能发展;然而,目前很少有方法可以将类似专家的模拟性能编码为可以独立练习的学习策略。方法:利用瞬时螺旋轴(ISA)理论和K-means聚类方法对运动数据进行分割和评估,提出了一种新的手术模拟技能分析方法。在ISA中,单自由度(DOF)任务可以表示为围绕单个螺杆轴的位移;然而,我们建议将这种方法扩展到更复杂的任务中,将它们定义为手术环境中类似isa的集群,并将其分解为一系列1DOF运动。在本文中,我们提出了一种ISA算法,并将其应用于从JIGSAWS手术数据集中获得的14种缝合和打结手势的外科医生操纵器姿势。我们还将这种方法应用于BGU-SKILLS数据集中为期6个月的整个模拟缝合演示。我们实现了K-means聚类,将这些动作分割成子手势。我们假设,具有更高水平专业知识的个体应该表现出更简洁的行动和最小的外部运动;因此,需要更少的集群来分解它们的模拟性能。结果:在为期6个月的培训期间,我们的ISA算法应用于来自10名外科医生的1136个手势,涉及三个技能水平,以及来自18名外科住院医师的324个未分割的演示。我们对分解每个手势所需的ISA集群数量进行了Kruskal-Wallis分析和Dunn-Sidak事后测试。我们发现,与仅在缝合任务上的新手相比,高度任务约束手势对专家和/或中级组所需的集群数量明显更少。结论:我们的研究结果表明,该方法可用于识别独立进行缝合手术模拟中的任务受限手势,并将其分为高技能和低技能组。该分析还可以提供表演手势与专家手势的几何反馈,为外科培训生提供个性化的自动化性能分析,从而实现个性化的教育培训。
{"title":"Toward precision surgical education: quantitative evaluation of surgical performance using instantaneous screw axes.","authors":"Heath Boyea, James Korndorffer, Ann Majewicz Fey","doi":"10.1007/s11548-025-03565-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03565-0","url":null,"abstract":"<p><strong>Purpose: </strong>In surgical skill development, a trainee's goal is to move through the learning curve and achieve expert performance. Goal directed training, with expert performance as the goal, can facilitate skill development; however, there are currently few methods available to encode expert-like simulation performance into learning strategies that can be practiced independently.</p><p><strong>Methods: </strong>We propose a novel method of surgical simulation skill analysis through segmenting and evaluating kinematic data with instantaneous screw axes (ISA) theory and K-means clustering. In ISA, single degree of freedom (DOF) tasks can be represented as displacements about a single screw axis; however, we propose extending this method to more complex tasks defining them with clusters of similar ISAs in the surgical environment, decomposing them into a sequence of 1DOF movements. In this paper, we present an ISA algorithm and apply it to surgeon manipulator poses across fourteen suturing and knot-tying gestures obtained from the JIGSAWS surgical dataset. We also apply this method to entire simulated suturing demonstrations across a 6-month training period from the BGU-SKILLS dataset. We implemented K-means clustering to segment these movements into sub-gestures. We hypothesize that individuals with greater levels of expertise should exhibit more concise actions with minimal extraneous movement; therefore, fewer clusters should be required to decompose their simulation performance.</p><p><strong>Results: </strong>Our ISA algorithm was applied to 1136 gestures from ten surgeons across three skill levels and 324 unsegmented demonstrations collected from 18 surgical residents over a training period of 6 months. We performed a Kruskal-Wallis analysis with a Dunn-Sidak post-hoc test on the number of ISA clusters required to decompose each gesture. We found that highly task-constrained gestures required significantly fewer numbers of clusters for expert and/or intermediate groups when compared to novices on suturing tasks only.</p><p><strong>Conclusion: </strong>Our results suggest that this method can be used to identify task-constrained gestures within independently performed suturing surgical simulations and classify them into higher skill and lower skill sets. This analysis can also provide geometric feedback on performed gestures vs expert gestures, providing personalized automated performance analysis for surgical trainees leading to personalized educational training.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
St-Swin TransNet: a spatiotemporal swin transformer-based network for self-supervised depth estimation in stereoscopic surgical videos. St-Swin TransNet:一个基于时空swin变压器的网络,用于立体手术视频的自监督深度估计。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-24 DOI: 10.1007/s11548-026-03569-4
Derong Yu, Wenyuan Sun, Junchen Wang, Guoyan Zheng

Purpose: Depth estimation from stereoscopic laparoscopic videos is of vital importance in computer-assisted intervention due to its potential for downstream tasks in laparoscopic surgical navigation. Previous works mostly focus on depth estimation from static frames, while temporal information in stereoscopic laparoscopic videos is largely ignored.

Methods: A spatiotemporal swin (ST-Swin) transformer-based network, referred to as ST-Swin TransNet, is proposed for depth estimation in stereoscopic surgical videos. Built upon a symmetric encoder-decoder architecture consisting of 12 ST-Swin blocks, ST-Swin TransNet extracts spatiotemporal features for efficient and accurate depth estimation, where the ST-Swin blocks are designed to capture spatiotemporal information from stereo video sequences via self-attention mechanism. Given binocular laparoscopic videos, ST-Swin TransNet exploits hierarchical spatiotemporal features to predict disparity maps.

Results: Comprehensive experiments are conducted on two typical yet challenging public datasets to evaluate the performance of the proposed method. We additionally demonstrate the feasibility of applying ST-Swin TransNet to video see-through augmented reality (VST-AR) navigation in laparoscopic surgery. Our method achieved a mean absolute depth error (mADE) of 3.33 mm in depth estimation and a mean absolute distance (mAD) of 1.07 mm in VST-AR navigation.

Conclusion: A spatiotemporal swin transformer-based network for self-supervised depth estimation in binocular laparoscopic surgical videos was developed. Results from the comprehensive experiments demonstrate the superior performance of the proposed method over the state-of-the-art methods.

目的:由于立体腹腔镜视频的深度估计在腹腔镜手术导航的下游任务中具有潜在的潜力,因此在计算机辅助干预中至关重要。以往的工作大多集中在静态帧的深度估计上,而立体腹腔镜视频中的时间信息在很大程度上被忽略了。方法:提出了一种基于时空swin (ST-Swin)变压器的网络,称为ST-Swin TransNet,用于立体手术视频的深度估计。ST-Swin TransNet基于由12个ST-Swin块组成的对称编码器-解码器架构,提取时空特征以实现高效准确的深度估计,其中ST-Swin块旨在通过自注意机制从立体视频序列中捕获时空信息。给定双目腹腔镜视频,ST-Swin TransNet利用分层时空特征来预测视差图。结果:在两个典型但具有挑战性的公共数据集上进行了综合实验,以评估所提出方法的性能。我们还演示了将ST-Swin TransNet应用于腹腔镜手术中视频透明增强现实(VST-AR)导航的可行性。该方法在VST-AR导航中实现了深度估计的平均绝对深度误差(mADE)为3.33 mm,平均绝对距离(mAD)为1.07 mm。结论:建立了一种用于双目腹腔镜手术视频自监督深度估计的基于时空漩涡变压器的网络。综合实验结果表明,该方法优于现有方法。
{"title":"St-Swin TransNet: a spatiotemporal swin transformer-based network for self-supervised depth estimation in stereoscopic surgical videos.","authors":"Derong Yu, Wenyuan Sun, Junchen Wang, Guoyan Zheng","doi":"10.1007/s11548-026-03569-4","DOIUrl":"https://doi.org/10.1007/s11548-026-03569-4","url":null,"abstract":"<p><strong>Purpose: </strong>Depth estimation from stereoscopic laparoscopic videos is of vital importance in computer-assisted intervention due to its potential for downstream tasks in laparoscopic surgical navigation. Previous works mostly focus on depth estimation from static frames, while temporal information in stereoscopic laparoscopic videos is largely ignored.</p><p><strong>Methods: </strong>A spatiotemporal swin (ST-Swin) transformer-based network, referred to as ST-Swin TransNet, is proposed for depth estimation in stereoscopic surgical videos. Built upon a symmetric encoder-decoder architecture consisting of 12 ST-Swin blocks, ST-Swin TransNet extracts spatiotemporal features for efficient and accurate depth estimation, where the ST-Swin blocks are designed to capture spatiotemporal information from stereo video sequences via self-attention mechanism. Given binocular laparoscopic videos, ST-Swin TransNet exploits hierarchical spatiotemporal features to predict disparity maps.</p><p><strong>Results: </strong>Comprehensive experiments are conducted on two typical yet challenging public datasets to evaluate the performance of the proposed method. We additionally demonstrate the feasibility of applying ST-Swin TransNet to video see-through augmented reality (VST-AR) navigation in laparoscopic surgery. Our method achieved a mean absolute depth error (mADE) of 3.33 mm in depth estimation and a mean absolute distance (mAD) of 1.07 mm in VST-AR navigation.</p><p><strong>Conclusion: </strong>A spatiotemporal swin transformer-based network for self-supervised depth estimation in binocular laparoscopic surgical videos was developed. Results from the comprehensive experiments demonstrate the superior performance of the proposed method over the state-of-the-art methods.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing open-surgery gesture recognition using 3D pose estimation. 利用三维姿态估计增强开放手术手势识别。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-14 DOI: 10.1007/s11548-025-03564-1
Ori Meiraz, Shlomi Laufer, Robert Spector, Itay Or, Gil Bolotin, Tom Friedman

Purpose Surgical gestures are fundamental components of surgical procedures, encompassing actions such as cutting, suturing, and knot-tying. Gesture recognition plays a pivotal role in the automatic analysis of surgical data. Although recent advancements have improved surgical gesture recognition, much of the existing research relies on simulations or minimally invasive surgery data, failing to capture the complexities of open surgery. In this study, we introduce and employ a new open surgery dataset focused on closing incisions after saphenous vein harvesting. Methods Our goal is to improve gesture recognition accuracy by incorporating tool pose estimation and 3D hand pose predictions of surgeons. We employ MS-TCN++  and LTContext  for gesture recognition, and further enhance performance through an ensemble of models using different modalities-video, tool pose, and hand pose data.Results The results reveal that using an ensemble model combining all three modalities provides a substantial improvement over video-only approaches, leading to statistically significant gains across multiple evaluation metrics. We further demonstrate that the model can rely solely on hand and tool poses, completely discarding the video input, while still achieving comparable performance. Additionally, we show that an ensemble model using only hand and tool poses produces results that are either: statistically significantly better than using video alone, or not statistically significantly different.Conclusion This study highlights the effectiveness of integrating multimodal data for surgical gesture recognition. By combining video, hand pose, and tool pose information, our approach achieves higher accuracy and robustness compared to video-only methods. Moreover, the comparable performance of pose-only models suggests a promising, privacy-preserving alternative for surgical data analysis.

手术手势是外科手术的基本组成部分,包括切割、缝合和打结等动作。手势识别在手术数据的自动分析中起着举足轻重的作用。尽管最近的进展改善了手术手势识别,但现有的许多研究依赖于模拟或微创手术数据,未能捕捉开放手术的复杂性。在这项研究中,我们介绍并采用了一个新的开放手术数据集,专注于隐静脉采集后闭合切口。方法通过结合工具姿态估计和三维手部姿态预测,提高外科医生的手势识别精度。我们使用MS-TCN++和LTContext进行手势识别,并通过使用不同模式(视频、工具姿势和手姿势数据)的模型集成进一步提高性能。结果表明,与仅使用视频的方法相比,使用结合所有三种方式的集成模型提供了实质性的改进,从而在多个评估指标上获得了统计上显著的收益。我们进一步证明,该模型可以完全依赖于手和工具的姿势,完全放弃视频输入,同时仍然达到相当的性能。此外,我们表明,仅使用手和工具姿势的集成模型产生的结果要么在统计上明显优于单独使用视频,要么在统计上没有显著差异。结论本研究强调了多模态数据集成在手术手势识别中的有效性。通过结合视频、手姿态和工具姿态信息,我们的方法比仅视频的方法具有更高的准确性和鲁棒性。此外,仅姿态模型的可比性能为外科数据分析提供了一个有前途的、保护隐私的替代方案。
{"title":"Enhancing open-surgery gesture recognition using 3D pose estimation.","authors":"Ori Meiraz, Shlomi Laufer, Robert Spector, Itay Or, Gil Bolotin, Tom Friedman","doi":"10.1007/s11548-025-03564-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03564-1","url":null,"abstract":"<p><p>Purpose Surgical gestures are fundamental components of surgical procedures, encompassing actions such as cutting, suturing, and knot-tying. Gesture recognition plays a pivotal role in the automatic analysis of surgical data. Although recent advancements have improved surgical gesture recognition, much of the existing research relies on simulations or minimally invasive surgery data, failing to capture the complexities of open surgery. In this study, we introduce and employ a new open surgery dataset focused on closing incisions after saphenous vein harvesting. Methods Our goal is to improve gesture recognition accuracy by incorporating tool pose estimation and 3D hand pose predictions of surgeons. We employ MS-TCN++  and LTContext  for gesture recognition, and further enhance performance through an ensemble of models using different modalities-video, tool pose, and hand pose data.Results The results reveal that using an ensemble model combining all three modalities provides a substantial improvement over video-only approaches, leading to statistically significant gains across multiple evaluation metrics. We further demonstrate that the model can rely solely on hand and tool poses, completely discarding the video input, while still achieving comparable performance. Additionally, we show that an ensemble model using only hand and tool poses produces results that are either: statistically significantly better than using video alone, or not statistically significantly different.Conclusion This study highlights the effectiveness of integrating multimodal data for surgical gesture recognition. By combining video, hand pose, and tool pose information, our approach achieves higher accuracy and robustness compared to video-only methods. Moreover, the comparable performance of pose-only models suggests a promising, privacy-preserving alternative for surgical data analysis.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Environmental and economic costs behind LLMs. 法学硕士背后的环境和经济成本。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-12 DOI: 10.1007/s11548-026-03568-5
Pilar López-Úbeda, Teodoro Martín-Noguerol, Antonio Luna

Purpose: To discuss the economic and environmental implications of implementing large language models (LLMs) in radiology, highlighting both their transformative potential and the challenges they pose for equitable and sustainable adoption.

Methods: Current trends in AI investment, infrastructure requirements, operational costs, and environmental impact associated with LLMs are analyzed, highlighting the specific challenges of integrating LLMs into radiological workflows, including data privacy, regulatory compliance, and cost barriers for healthcare institutions. The analysis also considers the costs of model validation, maintenance, and updates, as well as investments in system integration, staff training, and cybersecurity for clinical implementation.

Results: LLMs have revolutionized natural language processing and offer promising applications in radiology, such as improved diagnostic support and workflow optimization. However, their deployment involves substantial financial and environmental costs. Training and operating these models require high-performance computing infrastructure, significant energy consumption, and large volumes of annotated data. Water usage and CO₂ emissions from data centers further raise sustainability concerns, while ongoing operational costs add to the financial burden. Subscription fees and per-query pricing may restrict access for smaller institutions, widening existing inequalities.

Conclusion: While LLMs offer significant benefits for radiology, their high economic and environmental costs present challenges to widespread and equitable adoption. Responsible use, sustainable practices, and policy frameworks are essential to ensure that AI-driven innovations do not exacerbate existing disparities in healthcare access and quality.

目的:讨论在放射学中实施大型语言模型(llm)的经济和环境影响,强调它们的变革潜力和它们为公平和可持续采用所带来的挑战。方法:分析了与llm相关的人工智能投资、基础设施需求、运营成本和环境影响的当前趋势,强调了将llm集成到放射工作流程中的具体挑战,包括数据隐私、法规遵从性和医疗机构的成本障碍。该分析还考虑了模型验证、维护和更新的成本,以及在系统集成、员工培训和临床实施的网络安全方面的投资。结果:法学硕士彻底改变了自然语言处理,并在放射学中提供了有前途的应用,例如改进的诊断支持和工作流程优化。然而,它们的部署涉及大量的财政和环境成本。训练和操作这些模型需要高性能的计算基础设施、大量的能源消耗和大量带注释的数据。数据中心的用水和二氧化碳排放进一步引发了可持续性问题,而持续的运营成本增加了财务负担。订阅费用和按查询收费可能会限制小型机构的访问权限,从而扩大现有的不平等。结论:虽然llm为放射学提供了显著的好处,但其高昂的经济和环境成本对广泛和公平的采用提出了挑战。负责任的使用、可持续的做法和政策框架对于确保人工智能驱动的创新不会加剧医疗保健可及性和质量方面的现有差距至关重要。
{"title":"Environmental and economic costs behind LLMs.","authors":"Pilar López-Úbeda, Teodoro Martín-Noguerol, Antonio Luna","doi":"10.1007/s11548-026-03568-5","DOIUrl":"https://doi.org/10.1007/s11548-026-03568-5","url":null,"abstract":"<p><strong>Purpose: </strong>To discuss the economic and environmental implications of implementing large language models (LLMs) in radiology, highlighting both their transformative potential and the challenges they pose for equitable and sustainable adoption.</p><p><strong>Methods: </strong>Current trends in AI investment, infrastructure requirements, operational costs, and environmental impact associated with LLMs are analyzed, highlighting the specific challenges of integrating LLMs into radiological workflows, including data privacy, regulatory compliance, and cost barriers for healthcare institutions. The analysis also considers the costs of model validation, maintenance, and updates, as well as investments in system integration, staff training, and cybersecurity for clinical implementation.</p><p><strong>Results: </strong>LLMs have revolutionized natural language processing and offer promising applications in radiology, such as improved diagnostic support and workflow optimization. However, their deployment involves substantial financial and environmental costs. Training and operating these models require high-performance computing infrastructure, significant energy consumption, and large volumes of annotated data. Water usage and CO₂ emissions from data centers further raise sustainability concerns, while ongoing operational costs add to the financial burden. Subscription fees and per-query pricing may restrict access for smaller institutions, widening existing inequalities.</p><p><strong>Conclusion: </strong>While LLMs offer significant benefits for radiology, their high economic and environmental costs present challenges to widespread and equitable adoption. Responsible use, sustainable practices, and policy frameworks are essential to ensure that AI-driven innovations do not exacerbate existing disparities in healthcare access and quality.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking variability in semantic segmentation in minimally invasive abdominal surgery. 微创腹部手术中语义分割的基准变异性。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-06 DOI: 10.1007/s11548-025-03562-3
L T Castro, C Barata, P Martins, F Afonso, M Pascoal, C Santiago, L Mennillo, P Mira, D Stoyanov, M Chand, S Bano, A S Soares

Purpose: Anatomical identification during abdominal surgery is subjective given unclear boundaries of anatomical structures. Semantic segmentation of these structures relies on an accurate identification of the boundaries which carries an unknown uncertainty. Given its inherent subjectivity, it is important to assess annotation adequacy. This study aims to evaluate variability in anatomical structure identification and segmentation using MedSAM by surgical residents.

Methods: Images from the Dresden Surgical Anatomy Dataset and the Endoscapes2023 Dataset were semantically annotated by a group of surgery residents using MedSAM in the following classes: abdominal wall, colon, liver, small bowel, spleen, stomach and gallbladder. Each class had 3 to 4 sets of annotations. Inter-annotator variability was assessed through DSC, ICC, BIoU and using the Simultaneous Truth and Performance Level Estimation algorithm to obtain a consensus mask and by calculating Fleiss' kappa agreement between all annotations and reference.

Results: The study showed strong inter-annotator agreement among surgical residents, with DSC values of 0.84-0.95 and Fleiss' kappa between 0.85 and 0.91. Surface area reliability was good to excellent (ICC = 0.62-0.91), while boundary delineation showed lower reproducibility (BIoU = 0.092-0.157). STAPLE consensus masks confirmed consistent overall shape annotations despite variability in boundary precision.

Conclusion: The study demonstrated low variability in the semantic segmentation of intraperitoneal organs in minimally invasive abdominal surgery, performed by surgical residents using MedSAM. While DSC and Fleiss' kappa values confirm strong inter-annotator agreement, the relatively low BIoU values point to challenges in boundary precision, especially for anatomically complex or variable structures. These results establish a benchmark for expanding annotation efforts to larger datasets and more detailed anatomical features.

目的:由于解剖结构界限不清,腹部手术解剖鉴定是主观的。这些结构的语义分割依赖于对带有未知不确定性的边界的准确识别。鉴于其固有的主观性,评估注释的充分性是很重要的。本研究旨在评估外科住院医师使用MedSAM识别和分割解剖结构的变异性。方法:由一组外科住院医师使用MedSAM对来自Dresden外科解剖数据集和Endoscapes2023数据集的图像进行语义注释,这些图像分为腹壁、结肠、肝脏、小肠、脾脏、胃和胆囊。每个类有3到4组注释。通过DSC、ICC、BIoU,并使用同步真值和性能水平估计算法来获得共识掩码,并通过计算所有注释和参考之间的Fleiss kappa协议来评估注释者间的可变性。结果:研究显示外科住院医师的注释者之间有很强的一致性,DSC值为0.84-0.95,Fleiss kappa值为0.85 - 0.91。表面区域可信度为良好至优异(ICC = 0.62 ~ 0.91),边界圈定重现性较低(BIoU = 0.092 ~ 0.157)。尽管边界精度存在差异,但STAPLE共识掩模证实了整体形状注释的一致性。结论:该研究表明,在微创腹部手术中,由外科住院医师使用MedSAM进行的腹膜内器官的语义分割具有低变异性。虽然DSC和Fleiss的kappa值证实了注释者之间的强烈一致性,但相对较低的BIoU值表明边界精度存在挑战,特别是对于解剖复杂或可变的结构。这些结果为将注释工作扩展到更大的数据集和更详细的解剖特征建立了基准。
{"title":"Benchmarking variability in semantic segmentation in minimally invasive abdominal surgery.","authors":"L T Castro, C Barata, P Martins, F Afonso, M Pascoal, C Santiago, L Mennillo, P Mira, D Stoyanov, M Chand, S Bano, A S Soares","doi":"10.1007/s11548-025-03562-3","DOIUrl":"https://doi.org/10.1007/s11548-025-03562-3","url":null,"abstract":"<p><strong>Purpose: </strong>Anatomical identification during abdominal surgery is subjective given unclear boundaries of anatomical structures. Semantic segmentation of these structures relies on an accurate identification of the boundaries which carries an unknown uncertainty. Given its inherent subjectivity, it is important to assess annotation adequacy. This study aims to evaluate variability in anatomical structure identification and segmentation using MedSAM by surgical residents.</p><p><strong>Methods: </strong>Images from the Dresden Surgical Anatomy Dataset and the Endoscapes2023 Dataset were semantically annotated by a group of surgery residents using MedSAM in the following classes: abdominal wall, colon, liver, small bowel, spleen, stomach and gallbladder. Each class had 3 to 4 sets of annotations. Inter-annotator variability was assessed through DSC, ICC, BIoU and using the Simultaneous Truth and Performance Level Estimation algorithm to obtain a consensus mask and by calculating Fleiss' kappa agreement between all annotations and reference.</p><p><strong>Results: </strong>The study showed strong inter-annotator agreement among surgical residents, with DSC values of 0.84-0.95 and Fleiss' kappa between 0.85 and 0.91. Surface area reliability was good to excellent (ICC = 0.62-0.91), while boundary delineation showed lower reproducibility (BIoU = 0.092-0.157). STAPLE consensus masks confirmed consistent overall shape annotations despite variability in boundary precision.</p><p><strong>Conclusion: </strong>The study demonstrated low variability in the semantic segmentation of intraperitoneal organs in minimally invasive abdominal surgery, performed by surgical residents using MedSAM. While DSC and Fleiss' kappa values confirm strong inter-annotator agreement, the relatively low BIoU values point to challenges in boundary precision, especially for anatomically complex or variable structures. These results establish a benchmark for expanding annotation efforts to larger datasets and more detailed anatomical features.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical shape model-based estimation of registration error in computer-assisted total knee arthroplasty. 基于统计形状模型的计算机辅助全膝关节置换术配准误差估计。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-06 DOI: 10.1007/s11548-025-03566-z
Behnaz Gheflati, Morteza Mirzaei, Joel Zuhars, Sunil Rottoo, Hassan Rivaz

Purpose: Computer-assisted surgical navigation systems have been developed to improve the precision of total knee arthroplasty (TKA) by providing real-time guidance on implant alignment relative to patient anatomy. However, surface registration remains a key source of error that can propagate through the surgical workflow. This study investigates how patient-specific femoral bone geometry influences registration accuracy, aiming to enhance the reliability and consistency of computer-assisted orthopedic procedures.

Methods: Eighteen high-fidelity 3D-printed femur models were used to simulate intraoperative digitization. Surface points collected from the distal femur were registered to preoperative CT-derived models using a rigid iterative closest point (ICP) algorithm. Registration accuracy was quantified across six degrees of freedom. An in-house statistical shape model (SSM), built from 114 CT femurs, was employed to extract shape coefficients and correlate them with the measured registration errors. To verify robustness, additional analyses were conducted using synthetic and in silico CT-based femur datasets.

Results: Significant correlations (p-values < 0.05) were observed between specific shape coefficients and registration errors. The third and fourth principal shape modes showed the strongest associations with rotational misalignments, particularly flexion-extension and varus-valgus components. These findings demonstrate that geometric variability in the distal femur, especially condylar morphology, plays a major role in determining the stability and accuracy of surface-based registration.

Conclusions: Registration errors in TKA are strongly influenced by patient-specific bone geometry. Shape features derived from statistical shape models can serve as reliable predictors of registration performance, providing quantitative insight into how anatomical variability impacts surgical precision and alignment accuracy in computer-assisted total knee arthroplasty.

目的:计算机辅助手术导航系统的开发是为了提高全膝关节置换术(TKA)的精度,通过提供相对于患者解剖结构的植入物对齐的实时指导。然而,表面配准仍然是一个关键的错误来源,可以在整个手术工作流程中传播。本研究探讨了患者特定的股骨几何形状如何影响配准精度,旨在提高计算机辅助骨科手术的可靠性和一致性。方法:采用18个高保真3d打印股骨模型模拟术中数字化。从股骨远端收集的表面点使用刚性迭代最近点(ICP)算法注册到术前ct衍生模型。在六个自由度上量化配准精度。利用114根CT股骨建立的内部统计形状模型(SSM)提取形状系数,并将其与测量的配准误差相关联。为了验证稳健性,使用合成和基于计算机ct的股骨数据集进行了额外的分析。结果:显著相关性(p值)结论:TKA的配准误差受患者特异性骨几何形状的强烈影响。来自统计形状模型的形状特征可以作为注册性能的可靠预测因子,为计算机辅助全膝关节置换术中解剖变异如何影响手术精度和对齐精度提供定量见解。
{"title":"Statistical shape model-based estimation of registration error in computer-assisted total knee arthroplasty.","authors":"Behnaz Gheflati, Morteza Mirzaei, Joel Zuhars, Sunil Rottoo, Hassan Rivaz","doi":"10.1007/s11548-025-03566-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03566-z","url":null,"abstract":"<p><strong>Purpose: </strong>Computer-assisted surgical navigation systems have been developed to improve the precision of total knee arthroplasty (TKA) by providing real-time guidance on implant alignment relative to patient anatomy. However, surface registration remains a key source of error that can propagate through the surgical workflow. This study investigates how patient-specific femoral bone geometry influences registration accuracy, aiming to enhance the reliability and consistency of computer-assisted orthopedic procedures.</p><p><strong>Methods: </strong>Eighteen high-fidelity 3D-printed femur models were used to simulate intraoperative digitization. Surface points collected from the distal femur were registered to preoperative CT-derived models using a rigid iterative closest point (ICP) algorithm. Registration accuracy was quantified across six degrees of freedom. An in-house statistical shape model (SSM), built from 114 CT femurs, was employed to extract shape coefficients and correlate them with the measured registration errors. To verify robustness, additional analyses were conducted using synthetic and in silico CT-based femur datasets.</p><p><strong>Results: </strong>Significant correlations (p-values < 0.05) were observed between specific shape coefficients and registration errors. The third and fourth principal shape modes showed the strongest associations with rotational misalignments, particularly flexion-extension and varus-valgus components. These findings demonstrate that geometric variability in the distal femur, especially condylar morphology, plays a major role in determining the stability and accuracy of surface-based registration.</p><p><strong>Conclusions: </strong>Registration errors in TKA are strongly influenced by patient-specific bone geometry. Shape features derived from statistical shape models can serve as reliable predictors of registration performance, providing quantitative insight into how anatomical variability impacts surgical precision and alignment accuracy in computer-assisted total knee arthroplasty.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized scan path planning for robotic ultrasound in head and neck lesions. 头颈部病变机器人超声个性化扫描路径规划。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-27 DOI: 10.1007/s11548-025-03563-2
Ryosuke Tsumura, Toshifumi Tomioka, Yoshihiko Koseki, Kiyoshi Yoshinaka

Purpose: Scan path planning is an essential technology for fully automated robotic ultrasound (US). During US screening in head and neck lesions, it is necessary to observe the entire neck, including the thyroid, parotid, submandibular glands, and cervical lymph nodes efficiently. This study developed a personalized scan path planning for robotic US in head and neck lesions.

Methods: We proposed the personalized scan path planning that can estimate the scan position by applying the non-rigid registration between the patient neck image and the similar neck image among the previous patients. To select the most similar neck image in the database, the unique similarity index with the combination of multiple similarity metrics (neck area, position of the centroid, SSIM, Dice coefficient, and contours of the neck area) was developed. Fifty neck images were used to determine the optimal combination experimentally. The non-rigid registration from each of the 50 images to the other 49 images, resulting in a total of 2,450 registration pairs, was conducted, and the correlations between each similarity metric and registration error were analyzed.

Results: The best-performing similarity index achieved a correlation coefficient of 0.730 between the similarity score and registration error, which shows that the proposed index can select the similar image with a high accuracy of scan path registration. With the proposed scan path planning, the feasibility study is performed on six healthy subjects. The results showed a success rate of 94.4% for obtaining the specific target images required for the neck and head US examinations.

Conclusion: The proposed scan path planning allows the scan to adapt to individual differences in the neck, which can contribute to the automated robotic US in head and neck lesions.

目的:扫描路径规划是全自动机器人超声(US)的基本技术。在头颈部病变的超声筛查中,需要对整个颈部进行有效的观察,包括甲状腺、腮腺、颌下腺和颈部淋巴结。本研究为头部和颈部病变的机器人US开发了个性化的扫描路径规划。方法:利用患者颈部图像与既往患者相似颈部图像之间的非刚性配准,提出个性化的扫描路径规划,估计扫描位置。为了从数据库中选择最相似的颈部图像,建立了多个相似度指标(颈部面积、质心位置、SSIM、Dice系数、颈部轮廓)相结合的唯一相似度指标。用50张颈部图像进行实验,确定最佳组合。将50幅图像中的每一幅与其他49幅图像进行非刚性配准,共得到2450对配准,并分析了各相似度指标与配准误差之间的相关性。结果:表现最好的相似度指标与配准误差之间的相关系数为0.730,表明所提出的相似度指标能够选择出具有较高扫描路径配准精度的相似图像。利用所提出的扫描路径规划,对6名健康受试者进行了可行性研究。结果显示,获得颈部和头部超声检查所需的特定目标图像的成功率为94.4%。结论:提出的扫描路径规划可以使扫描适应颈部的个体差异,有助于头颈部病变的自动化机器人US。
{"title":"Personalized scan path planning for robotic ultrasound in head and neck lesions.","authors":"Ryosuke Tsumura, Toshifumi Tomioka, Yoshihiko Koseki, Kiyoshi Yoshinaka","doi":"10.1007/s11548-025-03563-2","DOIUrl":"https://doi.org/10.1007/s11548-025-03563-2","url":null,"abstract":"<p><strong>Purpose: </strong>Scan path planning is an essential technology for fully automated robotic ultrasound (US). During US screening in head and neck lesions, it is necessary to observe the entire neck, including the thyroid, parotid, submandibular glands, and cervical lymph nodes efficiently. This study developed a personalized scan path planning for robotic US in head and neck lesions.</p><p><strong>Methods: </strong>We proposed the personalized scan path planning that can estimate the scan position by applying the non-rigid registration between the patient neck image and the similar neck image among the previous patients. To select the most similar neck image in the database, the unique similarity index with the combination of multiple similarity metrics (neck area, position of the centroid, SSIM, Dice coefficient, and contours of the neck area) was developed. Fifty neck images were used to determine the optimal combination experimentally. The non-rigid registration from each of the 50 images to the other 49 images, resulting in a total of 2,450 registration pairs, was conducted, and the correlations between each similarity metric and registration error were analyzed.</p><p><strong>Results: </strong>The best-performing similarity index achieved a correlation coefficient of 0.730 between the similarity score and registration error, which shows that the proposed index can select the similar image with a high accuracy of scan path registration. With the proposed scan path planning, the feasibility study is performed on six healthy subjects. The results showed a success rate of 94.4% for obtaining the specific target images required for the neck and head US examinations.</p><p><strong>Conclusion: </strong>The proposed scan path planning allows the scan to adapt to individual differences in the neck, which can contribute to the automated robotic US in head and neck lesions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145846992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DyEndoVO: scene dynamics-aware pose estimation of endoscope in minimally invasive surgery. 微创手术中内窥镜的场景动态感知姿态估计。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-22 DOI: 10.1007/s11548-025-03549-0
Jinjing Xu, Reuben Docea, Micha Pfeiffer, Martin Wagner, Marius Distler, Stefanie Speidel

Purpose: Estimating the 6 degrees of freedom (DoF) pose of an endoscope is crucial for various applications in minimally invasive computer-assisted surgery. Image-based approaches are some of the most practical solutions for pose estimation in surgical environments, due to a limited workspace and sensor constraints. However, these methods often struggle or fail in dynamic scenes, such as those involving tissue deformation, surgical tool movement, and tool-tissue interaction.

Methods: We propose DyEndoVO, an end-to-end visual odometry method in dynamic endoscopic scenes. Our method consists of a transformer-based motion detection network and a weighted pose-optimization module. The motion detection network infers scene dynamics and guides the pose estimation. Furthermore, we introduce a semi-synthetic dataset featuring tissue and tool movement categories. It serves as training data, improving pose estimation accuracy, and also includes motion masks to enable a fine-grained inspection and evaluation.

Results: DyEndoVO significantly outperforms state-of-the-art methods in pose estimation for dynamic surgical scenes. Despite being trained solely on a synthetic dataset, our method generalizes well to real-world data without fine-tuning. Further analysis attributes this success to the effective detection of scene dynamics and the adaptation in the learned weight toward pose estimation; moreover, the semi-synthetic dataset also plays a key role in bridging the sim-to-real gap.

Conclusions: In this work, we aim to improve the accuracy and robustness of pose estimation in challenging dynamic surgical scenes, by effectively handling scene dynamics. Our method, combined with the proposed synthetic dataset, demonstrates improved performance in pose estimation and generalizes well to real-world data, showing its potential in advancing related works such as SLAM and 3D reconstruction in complex surgical environments.

目的:估计内窥镜的6自由度(DoF)姿态对于微创计算机辅助手术的各种应用至关重要。由于有限的工作空间和传感器的限制,基于图像的方法是手术环境中姿态估计的一些最实用的解决方案。然而,这些方法经常在动态场景中挣扎或失败,例如涉及组织变形、手术工具运动和工具-组织相互作用的场景。方法:我们提出DyEndoVO,一种在动态内镜场景下的端到端视觉里程计方法。该方法由基于变压器的运动检测网络和加权姿态优化模块组成。运动检测网络推断场景动态并指导姿态估计。此外,我们引入了一个半合成的数据集,其中包括组织和工具运动类别。它可以作为训练数据,提高姿态估计的准确性,还包括运动面具,以实现细粒度的检查和评估。结果:DyEndoVO在动态手术场景的姿态估计方面明显优于最先进的方法。尽管仅在合成数据集上进行训练,但我们的方法可以很好地推广到真实世界的数据,而无需微调。进一步分析将这一成功归因于对场景动态的有效检测和学习权值对姿态估计的适应性;此外,半合成数据集在弥合模拟与真实之间的差距方面也起着关键作用。结论:在这项工作中,我们旨在通过有效地处理场景动态,提高在具有挑战性的动态手术场景中姿态估计的准确性和鲁棒性。我们的方法与所提出的合成数据集相结合,在姿态估计方面表现出更好的性能,并且可以很好地推广到现实世界的数据中,显示出其在复杂手术环境中推进SLAM和3D重建等相关工作的潜力。
{"title":"DyEndoVO: scene dynamics-aware pose estimation of endoscope in minimally invasive surgery.","authors":"Jinjing Xu, Reuben Docea, Micha Pfeiffer, Martin Wagner, Marius Distler, Stefanie Speidel","doi":"10.1007/s11548-025-03549-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03549-0","url":null,"abstract":"<p><strong>Purpose: </strong>Estimating the 6 degrees of freedom (DoF) pose of an endoscope is crucial for various applications in minimally invasive computer-assisted surgery. Image-based approaches are some of the most practical solutions for pose estimation in surgical environments, due to a limited workspace and sensor constraints. However, these methods often struggle or fail in dynamic scenes, such as those involving tissue deformation, surgical tool movement, and tool-tissue interaction.</p><p><strong>Methods: </strong>We propose DyEndoVO, an end-to-end visual odometry method in dynamic endoscopic scenes. Our method consists of a transformer-based motion detection network and a weighted pose-optimization module. The motion detection network infers scene dynamics and guides the pose estimation. Furthermore, we introduce a semi-synthetic dataset featuring tissue and tool movement categories. It serves as training data, improving pose estimation accuracy, and also includes motion masks to enable a fine-grained inspection and evaluation.</p><p><strong>Results: </strong>DyEndoVO significantly outperforms state-of-the-art methods in pose estimation for dynamic surgical scenes. Despite being trained solely on a synthetic dataset, our method generalizes well to real-world data without fine-tuning. Further analysis attributes this success to the effective detection of scene dynamics and the adaptation in the learned weight toward pose estimation; moreover, the semi-synthetic dataset also plays a key role in bridging the sim-to-real gap.</p><p><strong>Conclusions: </strong>In this work, we aim to improve the accuracy and robustness of pose estimation in challenging dynamic surgical scenes, by effectively handling scene dynamics. Our method, combined with the proposed synthetic dataset, demonstrates improved performance in pose estimation and generalizes well to real-world data, showing its potential in advancing related works such as SLAM and 3D reconstruction in complex surgical environments.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery. 使用深度视觉语言模型提高了内窥镜耳鼻喉手术辅助应用中的多任务性能。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-22 DOI: 10.1007/s11548-025-03512-z
Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth

Purpose: Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of natural language, limiting their assistance capabilities. This work adopts a proven architecture for vision-language models (VLM) to perform multi-task learning for image classification, text prediction, and surgical report generation, specifically for endoscopic ENT surgeries.

Methods: We adopted a VLM architecture utilizing encoders biased for the endoscopy domain for image and text embedding and combine them via cross-attention. The model was trained on a newly created multi-task dataset derived from 30 annotated endoscopic procedures, comprising 130,000 multi-label images, anatomical descriptions, and synchronized surgical reports. Two variations of the model, a lightweight 61M parameter and a 176M parameter model, were evaluated both against an existing baseline from previous mono-task studies as well as the EndoVit and SurgicalGPT models as external references. Ablation studies investigate the influence of removing image or text embeddings, and cross-attention on the task performance. Performance was measured for landmark classification, structured text prediction, and report generation using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity metrics.

Results: The VLM base model improves the baseline f1 score for image classification by up to 12% and natural language text generation by up to 14% across image classification and report generation tasks. The text generation of structured language tasks, however, showed minimal gains, indicating limitations in structured sentence learning from combined image-text embeddings. EndoViT and SurgicalGPT slightly trail our domain-specific VLM. The image-only and text-only ablations confirm that the vision component benefits language tasks, whereas text has limited impact on landmark detection.

Conclusion: We developed a vision-language model capable of integrating image and text data for endoscopic ENT assistance tasks, that is able to replace three isolated models, delivering multi-task assistance while outperforming prior and general‑purpose baselines. Remaining challenges include the handling of imbalanced class distributions and limited gains on templated structured text.

目的:内窥镜辅助应用的深度学习模型主要关注基于图像的任务,如工具检测、解剖分类和工作流分割。然而,这些方法往往忽视了自然语言的整合,限制了它们的辅助能力。这项工作采用了一种成熟的视觉语言模型(VLM)架构来执行多任务学习,用于图像分类、文本预测和手术报告生成,特别是内窥镜耳鼻喉手术。方法:采用一种基于内窥镜域的编码器的VLM架构进行图像和文本嵌入,并通过交叉关注将它们组合起来。该模型在一个新创建的多任务数据集上进行训练,该数据集来自30个带注释的内镜手术,包括130,000个多标签图像、解剖描述和同步手术报告。模型的两种变体,轻量级61M参数模型和176M参数模型,根据先前单任务研究的现有基线以及EndoVit和SurgicalGPT模型作为外部参考进行评估。消融研究探讨了去除图像或文本嵌入以及交叉注意对任务表现的影响。使用精度、召回率、F1-score、BLEU-2、ROUGE-L和余弦相似度指标来测量里程碑分类、结构化文本预测和报告生成的性能。结果:在图像分类和报告生成任务中,VLM基础模型将图像分类的基线f1分数提高了12%,将自然语言文本生成提高了14%。然而,结构化语言任务的文本生成显示出最小的收益,这表明了从图像-文本组合嵌入中学习结构化句子的局限性。EndoViT和SurgicalGPT稍微落后于我们领域特定的VLM。纯图像和纯文本的消融证实了视觉成分有利于语言任务,而文本对地标检测的影响有限。结论:我们开发了一种视觉语言模型,能够集成内窥镜耳鼻炎辅助任务的图像和文本数据,能够取代三个孤立的模型,提供多任务辅助,同时优于先前和通用基线。剩下的挑战包括处理不平衡的类分布和模板化结构化文本的有限收益。
{"title":"Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery.","authors":"Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth","doi":"10.1007/s11548-025-03512-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03512-z","url":null,"abstract":"<p><strong>Purpose: </strong>Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of natural language, limiting their assistance capabilities. This work adopts a proven architecture for vision-language models (VLM) to perform multi-task learning for image classification, text prediction, and surgical report generation, specifically for endoscopic ENT surgeries.</p><p><strong>Methods: </strong>We adopted a VLM architecture utilizing encoders biased for the endoscopy domain for image and text embedding and combine them via cross-attention. The model was trained on a newly created multi-task dataset derived from 30 annotated endoscopic procedures, comprising 130,000 multi-label images, anatomical descriptions, and synchronized surgical reports. Two variations of the model, a lightweight 61M parameter and a 176M parameter model, were evaluated both against an existing baseline from previous mono-task studies as well as the EndoVit and SurgicalGPT models as external references. Ablation studies investigate the influence of removing image or text embeddings, and cross-attention on the task performance. Performance was measured for landmark classification, structured text prediction, and report generation using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity metrics.</p><p><strong>Results: </strong>The VLM base model improves the baseline f1 score for image classification by up to 12% and natural language text generation by up to 14% across image classification and report generation tasks. The text generation of structured language tasks, however, showed minimal gains, indicating limitations in structured sentence learning from combined image-text embeddings. EndoViT and SurgicalGPT slightly trail our domain-specific VLM. The image-only and text-only ablations confirm that the vision component benefits language tasks, whereas text has limited impact on landmark detection.</p><p><strong>Conclusion: </strong>We developed a vision-language model capable of integrating image and text data for endoscopic ENT assistance tasks, that is able to replace three isolated models, delivering multi-task assistance while outperforming prior and general‑purpose baselines. Remaining challenges include the handling of imbalanced class distributions and limited gains on templated structured text.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying the anatomical variability of the proximal femur. 量化股骨近端解剖变异性。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-19 DOI: 10.1007/s11548-025-03560-5
Angelika Ramesh, Johann Henckel, Alister Hart, Anna Di Laura

Purpose: Achieving a prosthetic femoral version (PFV) within the target range of 10-20° is crucial for optimal biomechanics in total hip arthroplasty (THA). Predicting the PFV preoperatively is challenging due to the limited understanding of the relationship between native femoral version (NFV) and the morphology of the intramedullary canal. This study aims to quantify the 3D morphological variability and identify the most variable anatomical features of the proximal femur pre- and post-operatively.

Methods: Pre- and post-operative CT scans from 62 patients (31 males, 31 females) who underwent THA and received a single stem design (straight, triple-tapered) were analysed. Four femoral models were generated per patient: 1. Native proximal femur, 2. Native femur after neck osteotomy, 3. Internal femoral canal after neck osteotomy, and 4. Reconstructed femur. Statistical Shape Models (SSMs) were developed separately by sex, and principal component analysis (PCA) was used to identify dominant modes of anatomical variation.

Results: The first three principal components (PCs) accounted for over 60% of shape variability across all models. PFV showed weak correlation with NFV as variability existed between the SSM of the internal femoral canal and SSM of the native proximal femur. Sex-specific differences in the measured NFV and PFV were found, with females exhibiting a greater range and a more anteverted femur/femoral stem. The female canal model showed intramedullary version variability; however, this variability was not present in the first three PCs in the corresponding male model.

Conclusions: This study demonstrates that PFV cannot be reliably predicted from NFV alone. These findings underscore the need for advanced, 3D preoperative planning tools to better predict stem version and accommodate patient-specific anatomy. Additionally, the increased variability observed in females may warrant sex-specific consideration in implant design choice and surgical technique.

目的:在全髋关节置换术(THA)中实现10-20°目标范围内的假体股骨版本(PFV)对于获得最佳生物力学至关重要。由于对天然股骨版本(NFV)与髓内管形态之间关系的了解有限,因此术前预测PFV具有挑战性。本研究旨在量化股骨近端术前和术后的三维形态学变异,并确定最易变化的解剖特征。方法:对62例(男31例,女31例)行全髋关节置换术并接受单茎设计(直、三锥形)的患者进行术前和术后CT扫描分析。每个患者生成4个股骨模型:1。原生股骨近端,2;颈部截骨后原生股骨,3。颈截骨术后股骨内管;重建股骨。统计形状模型(SSMs)按性别分别开发,主成分分析(PCA)用于确定解剖变异的优势模式。结果:前三个主成分(pc)占所有模型形状变异性的60%以上。PFV与NFV的相关性较弱,因为股骨内管SSM与原股骨近端SSM存在差异。在测量的NFV和PFV中发现了性别特异性差异,女性表现出更大的范围和更前倾的股骨/股骨干。雌性椎管模型显示髓内形态变异;然而,在相应的男性模型的前三个pc中不存在这种可变性。结论:本研究表明,仅从NFV不能可靠地预测PFV。这些发现强调需要先进的3D术前计划工具来更好地预测干细胞版本并适应患者特定的解剖结构。此外,在女性中观察到的变异性增加可能需要在植入物设计选择和手术技术方面考虑性别特异性。
{"title":"Quantifying the anatomical variability of the proximal femur.","authors":"Angelika Ramesh, Johann Henckel, Alister Hart, Anna Di Laura","doi":"10.1007/s11548-025-03560-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03560-5","url":null,"abstract":"<p><strong>Purpose: </strong>Achieving a prosthetic femoral version (PFV) within the target range of 10-20° is crucial for optimal biomechanics in total hip arthroplasty (THA). Predicting the PFV preoperatively is challenging due to the limited understanding of the relationship between native femoral version (NFV) and the morphology of the intramedullary canal. This study aims to quantify the 3D morphological variability and identify the most variable anatomical features of the proximal femur pre- and post-operatively.</p><p><strong>Methods: </strong>Pre- and post-operative CT scans from 62 patients (31 males, 31 females) who underwent THA and received a single stem design (straight, triple-tapered) were analysed. Four femoral models were generated per patient: 1. Native proximal femur, 2. Native femur after neck osteotomy, 3. Internal femoral canal after neck osteotomy, and 4. Reconstructed femur. Statistical Shape Models (SSMs) were developed separately by sex, and principal component analysis (PCA) was used to identify dominant modes of anatomical variation.</p><p><strong>Results: </strong>The first three principal components (PCs) accounted for over 60% of shape variability across all models. PFV showed weak correlation with NFV as variability existed between the SSM of the internal femoral canal and SSM of the native proximal femur. Sex-specific differences in the measured NFV and PFV were found, with females exhibiting a greater range and a more anteverted femur/femoral stem. The female canal model showed intramedullary version variability; however, this variability was not present in the first three PCs in the corresponding male model.</p><p><strong>Conclusions: </strong>This study demonstrates that PFV cannot be reliably predicted from NFV alone. These findings underscore the need for advanced, 3D preoperative planning tools to better predict stem version and accommodate patient-specific anatomy. Additionally, the increased variability observed in females may warrant sex-specific consideration in implant design choice and surgical technique.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145794570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Assisted Radiology and Surgery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1