首页 > 最新文献

International Journal of Computer Assisted Radiology and Surgery最新文献

英文 中文
Enhancing open-surgery gesture recognition using 3D pose estimation. 利用三维姿态估计增强开放手术手势识别。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-14 DOI: 10.1007/s11548-025-03564-1
Ori Meiraz, Shlomi Laufer, Robert Spector, Itay Or, Gil Bolotin, Tom Friedman

Purpose Surgical gestures are fundamental components of surgical procedures, encompassing actions such as cutting, suturing, and knot-tying. Gesture recognition plays a pivotal role in the automatic analysis of surgical data. Although recent advancements have improved surgical gesture recognition, much of the existing research relies on simulations or minimally invasive surgery data, failing to capture the complexities of open surgery. In this study, we introduce and employ a new open surgery dataset focused on closing incisions after saphenous vein harvesting. Methods Our goal is to improve gesture recognition accuracy by incorporating tool pose estimation and 3D hand pose predictions of surgeons. We employ MS-TCN++  and LTContext  for gesture recognition, and further enhance performance through an ensemble of models using different modalities-video, tool pose, and hand pose data.Results The results reveal that using an ensemble model combining all three modalities provides a substantial improvement over video-only approaches, leading to statistically significant gains across multiple evaluation metrics. We further demonstrate that the model can rely solely on hand and tool poses, completely discarding the video input, while still achieving comparable performance. Additionally, we show that an ensemble model using only hand and tool poses produces results that are either: statistically significantly better than using video alone, or not statistically significantly different.Conclusion This study highlights the effectiveness of integrating multimodal data for surgical gesture recognition. By combining video, hand pose, and tool pose information, our approach achieves higher accuracy and robustness compared to video-only methods. Moreover, the comparable performance of pose-only models suggests a promising, privacy-preserving alternative for surgical data analysis.

手术手势是外科手术的基本组成部分,包括切割、缝合和打结等动作。手势识别在手术数据的自动分析中起着举足轻重的作用。尽管最近的进展改善了手术手势识别,但现有的许多研究依赖于模拟或微创手术数据,未能捕捉开放手术的复杂性。在这项研究中,我们介绍并采用了一个新的开放手术数据集,专注于隐静脉采集后闭合切口。方法通过结合工具姿态估计和三维手部姿态预测,提高外科医生的手势识别精度。我们使用MS-TCN++和LTContext进行手势识别,并通过使用不同模式(视频、工具姿势和手姿势数据)的模型集成进一步提高性能。结果表明,与仅使用视频的方法相比,使用结合所有三种方式的集成模型提供了实质性的改进,从而在多个评估指标上获得了统计上显著的收益。我们进一步证明,该模型可以完全依赖于手和工具的姿势,完全放弃视频输入,同时仍然达到相当的性能。此外,我们表明,仅使用手和工具姿势的集成模型产生的结果要么在统计上明显优于单独使用视频,要么在统计上没有显著差异。结论本研究强调了多模态数据集成在手术手势识别中的有效性。通过结合视频、手姿态和工具姿态信息,我们的方法比仅视频的方法具有更高的准确性和鲁棒性。此外,仅姿态模型的可比性能为外科数据分析提供了一个有前途的、保护隐私的替代方案。
{"title":"Enhancing open-surgery gesture recognition using 3D pose estimation.","authors":"Ori Meiraz, Shlomi Laufer, Robert Spector, Itay Or, Gil Bolotin, Tom Friedman","doi":"10.1007/s11548-025-03564-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03564-1","url":null,"abstract":"<p><p>Purpose Surgical gestures are fundamental components of surgical procedures, encompassing actions such as cutting, suturing, and knot-tying. Gesture recognition plays a pivotal role in the automatic analysis of surgical data. Although recent advancements have improved surgical gesture recognition, much of the existing research relies on simulations or minimally invasive surgery data, failing to capture the complexities of open surgery. In this study, we introduce and employ a new open surgery dataset focused on closing incisions after saphenous vein harvesting. Methods Our goal is to improve gesture recognition accuracy by incorporating tool pose estimation and 3D hand pose predictions of surgeons. We employ MS-TCN++  and LTContext  for gesture recognition, and further enhance performance through an ensemble of models using different modalities-video, tool pose, and hand pose data.Results The results reveal that using an ensemble model combining all three modalities provides a substantial improvement over video-only approaches, leading to statistically significant gains across multiple evaluation metrics. We further demonstrate that the model can rely solely on hand and tool poses, completely discarding the video input, while still achieving comparable performance. Additionally, we show that an ensemble model using only hand and tool poses produces results that are either: statistically significantly better than using video alone, or not statistically significantly different.Conclusion This study highlights the effectiveness of integrating multimodal data for surgical gesture recognition. By combining video, hand pose, and tool pose information, our approach achieves higher accuracy and robustness compared to video-only methods. Moreover, the comparable performance of pose-only models suggests a promising, privacy-preserving alternative for surgical data analysis.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145967781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Environmental and economic costs behind LLMs. 法学硕士背后的环境和经济成本。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-12 DOI: 10.1007/s11548-026-03568-5
Pilar López-Úbeda, Teodoro Martín-Noguerol, Antonio Luna

Purpose: To discuss the economic and environmental implications of implementing large language models (LLMs) in radiology, highlighting both their transformative potential and the challenges they pose for equitable and sustainable adoption.

Methods: Current trends in AI investment, infrastructure requirements, operational costs, and environmental impact associated with LLMs are analyzed, highlighting the specific challenges of integrating LLMs into radiological workflows, including data privacy, regulatory compliance, and cost barriers for healthcare institutions. The analysis also considers the costs of model validation, maintenance, and updates, as well as investments in system integration, staff training, and cybersecurity for clinical implementation.

Results: LLMs have revolutionized natural language processing and offer promising applications in radiology, such as improved diagnostic support and workflow optimization. However, their deployment involves substantial financial and environmental costs. Training and operating these models require high-performance computing infrastructure, significant energy consumption, and large volumes of annotated data. Water usage and CO₂ emissions from data centers further raise sustainability concerns, while ongoing operational costs add to the financial burden. Subscription fees and per-query pricing may restrict access for smaller institutions, widening existing inequalities.

Conclusion: While LLMs offer significant benefits for radiology, their high economic and environmental costs present challenges to widespread and equitable adoption. Responsible use, sustainable practices, and policy frameworks are essential to ensure that AI-driven innovations do not exacerbate existing disparities in healthcare access and quality.

目的:讨论在放射学中实施大型语言模型(llm)的经济和环境影响,强调它们的变革潜力和它们为公平和可持续采用所带来的挑战。方法:分析了与llm相关的人工智能投资、基础设施需求、运营成本和环境影响的当前趋势,强调了将llm集成到放射工作流程中的具体挑战,包括数据隐私、法规遵从性和医疗机构的成本障碍。该分析还考虑了模型验证、维护和更新的成本,以及在系统集成、员工培训和临床实施的网络安全方面的投资。结果:法学硕士彻底改变了自然语言处理,并在放射学中提供了有前途的应用,例如改进的诊断支持和工作流程优化。然而,它们的部署涉及大量的财政和环境成本。训练和操作这些模型需要高性能的计算基础设施、大量的能源消耗和大量带注释的数据。数据中心的用水和二氧化碳排放进一步引发了可持续性问题,而持续的运营成本增加了财务负担。订阅费用和按查询收费可能会限制小型机构的访问权限,从而扩大现有的不平等。结论:虽然llm为放射学提供了显著的好处,但其高昂的经济和环境成本对广泛和公平的采用提出了挑战。负责任的使用、可持续的做法和政策框架对于确保人工智能驱动的创新不会加剧医疗保健可及性和质量方面的现有差距至关重要。
{"title":"Environmental and economic costs behind LLMs.","authors":"Pilar López-Úbeda, Teodoro Martín-Noguerol, Antonio Luna","doi":"10.1007/s11548-026-03568-5","DOIUrl":"https://doi.org/10.1007/s11548-026-03568-5","url":null,"abstract":"<p><strong>Purpose: </strong>To discuss the economic and environmental implications of implementing large language models (LLMs) in radiology, highlighting both their transformative potential and the challenges they pose for equitable and sustainable adoption.</p><p><strong>Methods: </strong>Current trends in AI investment, infrastructure requirements, operational costs, and environmental impact associated with LLMs are analyzed, highlighting the specific challenges of integrating LLMs into radiological workflows, including data privacy, regulatory compliance, and cost barriers for healthcare institutions. The analysis also considers the costs of model validation, maintenance, and updates, as well as investments in system integration, staff training, and cybersecurity for clinical implementation.</p><p><strong>Results: </strong>LLMs have revolutionized natural language processing and offer promising applications in radiology, such as improved diagnostic support and workflow optimization. However, their deployment involves substantial financial and environmental costs. Training and operating these models require high-performance computing infrastructure, significant energy consumption, and large volumes of annotated data. Water usage and CO₂ emissions from data centers further raise sustainability concerns, while ongoing operational costs add to the financial burden. Subscription fees and per-query pricing may restrict access for smaller institutions, widening existing inequalities.</p><p><strong>Conclusion: </strong>While LLMs offer significant benefits for radiology, their high economic and environmental costs present challenges to widespread and equitable adoption. Responsible use, sustainable practices, and policy frameworks are essential to ensure that AI-driven innovations do not exacerbate existing disparities in healthcare access and quality.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145960786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking variability in semantic segmentation in minimally invasive abdominal surgery. 微创腹部手术中语义分割的基准变异性。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-06 DOI: 10.1007/s11548-025-03562-3
L T Castro, C Barata, P Martins, F Afonso, M Pascoal, C Santiago, L Mennillo, P Mira, D Stoyanov, M Chand, S Bano, A S Soares

Purpose: Anatomical identification during abdominal surgery is subjective given unclear boundaries of anatomical structures. Semantic segmentation of these structures relies on an accurate identification of the boundaries which carries an unknown uncertainty. Given its inherent subjectivity, it is important to assess annotation adequacy. This study aims to evaluate variability in anatomical structure identification and segmentation using MedSAM by surgical residents.

Methods: Images from the Dresden Surgical Anatomy Dataset and the Endoscapes2023 Dataset were semantically annotated by a group of surgery residents using MedSAM in the following classes: abdominal wall, colon, liver, small bowel, spleen, stomach and gallbladder. Each class had 3 to 4 sets of annotations. Inter-annotator variability was assessed through DSC, ICC, BIoU and using the Simultaneous Truth and Performance Level Estimation algorithm to obtain a consensus mask and by calculating Fleiss' kappa agreement between all annotations and reference.

Results: The study showed strong inter-annotator agreement among surgical residents, with DSC values of 0.84-0.95 and Fleiss' kappa between 0.85 and 0.91. Surface area reliability was good to excellent (ICC = 0.62-0.91), while boundary delineation showed lower reproducibility (BIoU = 0.092-0.157). STAPLE consensus masks confirmed consistent overall shape annotations despite variability in boundary precision.

Conclusion: The study demonstrated low variability in the semantic segmentation of intraperitoneal organs in minimally invasive abdominal surgery, performed by surgical residents using MedSAM. While DSC and Fleiss' kappa values confirm strong inter-annotator agreement, the relatively low BIoU values point to challenges in boundary precision, especially for anatomically complex or variable structures. These results establish a benchmark for expanding annotation efforts to larger datasets and more detailed anatomical features.

目的:由于解剖结构界限不清,腹部手术解剖鉴定是主观的。这些结构的语义分割依赖于对带有未知不确定性的边界的准确识别。鉴于其固有的主观性,评估注释的充分性是很重要的。本研究旨在评估外科住院医师使用MedSAM识别和分割解剖结构的变异性。方法:由一组外科住院医师使用MedSAM对来自Dresden外科解剖数据集和Endoscapes2023数据集的图像进行语义注释,这些图像分为腹壁、结肠、肝脏、小肠、脾脏、胃和胆囊。每个类有3到4组注释。通过DSC、ICC、BIoU,并使用同步真值和性能水平估计算法来获得共识掩码,并通过计算所有注释和参考之间的Fleiss kappa协议来评估注释者间的可变性。结果:研究显示外科住院医师的注释者之间有很强的一致性,DSC值为0.84-0.95,Fleiss kappa值为0.85 - 0.91。表面区域可信度为良好至优异(ICC = 0.62 ~ 0.91),边界圈定重现性较低(BIoU = 0.092 ~ 0.157)。尽管边界精度存在差异,但STAPLE共识掩模证实了整体形状注释的一致性。结论:该研究表明,在微创腹部手术中,由外科住院医师使用MedSAM进行的腹膜内器官的语义分割具有低变异性。虽然DSC和Fleiss的kappa值证实了注释者之间的强烈一致性,但相对较低的BIoU值表明边界精度存在挑战,特别是对于解剖复杂或可变的结构。这些结果为将注释工作扩展到更大的数据集和更详细的解剖特征建立了基准。
{"title":"Benchmarking variability in semantic segmentation in minimally invasive abdominal surgery.","authors":"L T Castro, C Barata, P Martins, F Afonso, M Pascoal, C Santiago, L Mennillo, P Mira, D Stoyanov, M Chand, S Bano, A S Soares","doi":"10.1007/s11548-025-03562-3","DOIUrl":"https://doi.org/10.1007/s11548-025-03562-3","url":null,"abstract":"<p><strong>Purpose: </strong>Anatomical identification during abdominal surgery is subjective given unclear boundaries of anatomical structures. Semantic segmentation of these structures relies on an accurate identification of the boundaries which carries an unknown uncertainty. Given its inherent subjectivity, it is important to assess annotation adequacy. This study aims to evaluate variability in anatomical structure identification and segmentation using MedSAM by surgical residents.</p><p><strong>Methods: </strong>Images from the Dresden Surgical Anatomy Dataset and the Endoscapes2023 Dataset were semantically annotated by a group of surgery residents using MedSAM in the following classes: abdominal wall, colon, liver, small bowel, spleen, stomach and gallbladder. Each class had 3 to 4 sets of annotations. Inter-annotator variability was assessed through DSC, ICC, BIoU and using the Simultaneous Truth and Performance Level Estimation algorithm to obtain a consensus mask and by calculating Fleiss' kappa agreement between all annotations and reference.</p><p><strong>Results: </strong>The study showed strong inter-annotator agreement among surgical residents, with DSC values of 0.84-0.95 and Fleiss' kappa between 0.85 and 0.91. Surface area reliability was good to excellent (ICC = 0.62-0.91), while boundary delineation showed lower reproducibility (BIoU = 0.092-0.157). STAPLE consensus masks confirmed consistent overall shape annotations despite variability in boundary precision.</p><p><strong>Conclusion: </strong>The study demonstrated low variability in the semantic segmentation of intraperitoneal organs in minimally invasive abdominal surgery, performed by surgical residents using MedSAM. While DSC and Fleiss' kappa values confirm strong inter-annotator agreement, the relatively low BIoU values point to challenges in boundary precision, especially for anatomically complex or variable structures. These results establish a benchmark for expanding annotation efforts to larger datasets and more detailed anatomical features.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical shape model-based estimation of registration error in computer-assisted total knee arthroplasty. 基于统计形状模型的计算机辅助全膝关节置换术配准误差估计。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2026-01-06 DOI: 10.1007/s11548-025-03566-z
Behnaz Gheflati, Morteza Mirzaei, Joel Zuhars, Sunil Rottoo, Hassan Rivaz

Purpose: Computer-assisted surgical navigation systems have been developed to improve the precision of total knee arthroplasty (TKA) by providing real-time guidance on implant alignment relative to patient anatomy. However, surface registration remains a key source of error that can propagate through the surgical workflow. This study investigates how patient-specific femoral bone geometry influences registration accuracy, aiming to enhance the reliability and consistency of computer-assisted orthopedic procedures.

Methods: Eighteen high-fidelity 3D-printed femur models were used to simulate intraoperative digitization. Surface points collected from the distal femur were registered to preoperative CT-derived models using a rigid iterative closest point (ICP) algorithm. Registration accuracy was quantified across six degrees of freedom. An in-house statistical shape model (SSM), built from 114 CT femurs, was employed to extract shape coefficients and correlate them with the measured registration errors. To verify robustness, additional analyses were conducted using synthetic and in silico CT-based femur datasets.

Results: Significant correlations (p-values < 0.05) were observed between specific shape coefficients and registration errors. The third and fourth principal shape modes showed the strongest associations with rotational misalignments, particularly flexion-extension and varus-valgus components. These findings demonstrate that geometric variability in the distal femur, especially condylar morphology, plays a major role in determining the stability and accuracy of surface-based registration.

Conclusions: Registration errors in TKA are strongly influenced by patient-specific bone geometry. Shape features derived from statistical shape models can serve as reliable predictors of registration performance, providing quantitative insight into how anatomical variability impacts surgical precision and alignment accuracy in computer-assisted total knee arthroplasty.

目的:计算机辅助手术导航系统的开发是为了提高全膝关节置换术(TKA)的精度,通过提供相对于患者解剖结构的植入物对齐的实时指导。然而,表面配准仍然是一个关键的错误来源,可以在整个手术工作流程中传播。本研究探讨了患者特定的股骨几何形状如何影响配准精度,旨在提高计算机辅助骨科手术的可靠性和一致性。方法:采用18个高保真3d打印股骨模型模拟术中数字化。从股骨远端收集的表面点使用刚性迭代最近点(ICP)算法注册到术前ct衍生模型。在六个自由度上量化配准精度。利用114根CT股骨建立的内部统计形状模型(SSM)提取形状系数,并将其与测量的配准误差相关联。为了验证稳健性,使用合成和基于计算机ct的股骨数据集进行了额外的分析。结果:显著相关性(p值)结论:TKA的配准误差受患者特异性骨几何形状的强烈影响。来自统计形状模型的形状特征可以作为注册性能的可靠预测因子,为计算机辅助全膝关节置换术中解剖变异如何影响手术精度和对齐精度提供定量见解。
{"title":"Statistical shape model-based estimation of registration error in computer-assisted total knee arthroplasty.","authors":"Behnaz Gheflati, Morteza Mirzaei, Joel Zuhars, Sunil Rottoo, Hassan Rivaz","doi":"10.1007/s11548-025-03566-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03566-z","url":null,"abstract":"<p><strong>Purpose: </strong>Computer-assisted surgical navigation systems have been developed to improve the precision of total knee arthroplasty (TKA) by providing real-time guidance on implant alignment relative to patient anatomy. However, surface registration remains a key source of error that can propagate through the surgical workflow. This study investigates how patient-specific femoral bone geometry influences registration accuracy, aiming to enhance the reliability and consistency of computer-assisted orthopedic procedures.</p><p><strong>Methods: </strong>Eighteen high-fidelity 3D-printed femur models were used to simulate intraoperative digitization. Surface points collected from the distal femur were registered to preoperative CT-derived models using a rigid iterative closest point (ICP) algorithm. Registration accuracy was quantified across six degrees of freedom. An in-house statistical shape model (SSM), built from 114 CT femurs, was employed to extract shape coefficients and correlate them with the measured registration errors. To verify robustness, additional analyses were conducted using synthetic and in silico CT-based femur datasets.</p><p><strong>Results: </strong>Significant correlations (p-values < 0.05) were observed between specific shape coefficients and registration errors. The third and fourth principal shape modes showed the strongest associations with rotational misalignments, particularly flexion-extension and varus-valgus components. These findings demonstrate that geometric variability in the distal femur, especially condylar morphology, plays a major role in determining the stability and accuracy of surface-based registration.</p><p><strong>Conclusions: </strong>Registration errors in TKA are strongly influenced by patient-specific bone geometry. Shape features derived from statistical shape models can serve as reliable predictors of registration performance, providing quantitative insight into how anatomical variability impacts surgical precision and alignment accuracy in computer-assisted total knee arthroplasty.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized scan path planning for robotic ultrasound in head and neck lesions. 头颈部病变机器人超声个性化扫描路径规划。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-27 DOI: 10.1007/s11548-025-03563-2
Ryosuke Tsumura, Toshifumi Tomioka, Yoshihiko Koseki, Kiyoshi Yoshinaka

Purpose: Scan path planning is an essential technology for fully automated robotic ultrasound (US). During US screening in head and neck lesions, it is necessary to observe the entire neck, including the thyroid, parotid, submandibular glands, and cervical lymph nodes efficiently. This study developed a personalized scan path planning for robotic US in head and neck lesions.

Methods: We proposed the personalized scan path planning that can estimate the scan position by applying the non-rigid registration between the patient neck image and the similar neck image among the previous patients. To select the most similar neck image in the database, the unique similarity index with the combination of multiple similarity metrics (neck area, position of the centroid, SSIM, Dice coefficient, and contours of the neck area) was developed. Fifty neck images were used to determine the optimal combination experimentally. The non-rigid registration from each of the 50 images to the other 49 images, resulting in a total of 2,450 registration pairs, was conducted, and the correlations between each similarity metric and registration error were analyzed.

Results: The best-performing similarity index achieved a correlation coefficient of 0.730 between the similarity score and registration error, which shows that the proposed index can select the similar image with a high accuracy of scan path registration. With the proposed scan path planning, the feasibility study is performed on six healthy subjects. The results showed a success rate of 94.4% for obtaining the specific target images required for the neck and head US examinations.

Conclusion: The proposed scan path planning allows the scan to adapt to individual differences in the neck, which can contribute to the automated robotic US in head and neck lesions.

目的:扫描路径规划是全自动机器人超声(US)的基本技术。在头颈部病变的超声筛查中,需要对整个颈部进行有效的观察,包括甲状腺、腮腺、颌下腺和颈部淋巴结。本研究为头部和颈部病变的机器人US开发了个性化的扫描路径规划。方法:利用患者颈部图像与既往患者相似颈部图像之间的非刚性配准,提出个性化的扫描路径规划,估计扫描位置。为了从数据库中选择最相似的颈部图像,建立了多个相似度指标(颈部面积、质心位置、SSIM、Dice系数、颈部轮廓)相结合的唯一相似度指标。用50张颈部图像进行实验,确定最佳组合。将50幅图像中的每一幅与其他49幅图像进行非刚性配准,共得到2450对配准,并分析了各相似度指标与配准误差之间的相关性。结果:表现最好的相似度指标与配准误差之间的相关系数为0.730,表明所提出的相似度指标能够选择出具有较高扫描路径配准精度的相似图像。利用所提出的扫描路径规划,对6名健康受试者进行了可行性研究。结果显示,获得颈部和头部超声检查所需的特定目标图像的成功率为94.4%。结论:提出的扫描路径规划可以使扫描适应颈部的个体差异,有助于头颈部病变的自动化机器人US。
{"title":"Personalized scan path planning for robotic ultrasound in head and neck lesions.","authors":"Ryosuke Tsumura, Toshifumi Tomioka, Yoshihiko Koseki, Kiyoshi Yoshinaka","doi":"10.1007/s11548-025-03563-2","DOIUrl":"https://doi.org/10.1007/s11548-025-03563-2","url":null,"abstract":"<p><strong>Purpose: </strong>Scan path planning is an essential technology for fully automated robotic ultrasound (US). During US screening in head and neck lesions, it is necessary to observe the entire neck, including the thyroid, parotid, submandibular glands, and cervical lymph nodes efficiently. This study developed a personalized scan path planning for robotic US in head and neck lesions.</p><p><strong>Methods: </strong>We proposed the personalized scan path planning that can estimate the scan position by applying the non-rigid registration between the patient neck image and the similar neck image among the previous patients. To select the most similar neck image in the database, the unique similarity index with the combination of multiple similarity metrics (neck area, position of the centroid, SSIM, Dice coefficient, and contours of the neck area) was developed. Fifty neck images were used to determine the optimal combination experimentally. The non-rigid registration from each of the 50 images to the other 49 images, resulting in a total of 2,450 registration pairs, was conducted, and the correlations between each similarity metric and registration error were analyzed.</p><p><strong>Results: </strong>The best-performing similarity index achieved a correlation coefficient of 0.730 between the similarity score and registration error, which shows that the proposed index can select the similar image with a high accuracy of scan path registration. With the proposed scan path planning, the feasibility study is performed on six healthy subjects. The results showed a success rate of 94.4% for obtaining the specific target images required for the neck and head US examinations.</p><p><strong>Conclusion: </strong>The proposed scan path planning allows the scan to adapt to individual differences in the neck, which can contribute to the automated robotic US in head and neck lesions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145846992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DyEndoVO: scene dynamics-aware pose estimation of endoscope in minimally invasive surgery. 微创手术中内窥镜的场景动态感知姿态估计。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-22 DOI: 10.1007/s11548-025-03549-0
Jinjing Xu, Reuben Docea, Micha Pfeiffer, Martin Wagner, Marius Distler, Stefanie Speidel

Purpose: Estimating the 6 degrees of freedom (DoF) pose of an endoscope is crucial for various applications in minimally invasive computer-assisted surgery. Image-based approaches are some of the most practical solutions for pose estimation in surgical environments, due to a limited workspace and sensor constraints. However, these methods often struggle or fail in dynamic scenes, such as those involving tissue deformation, surgical tool movement, and tool-tissue interaction.

Methods: We propose DyEndoVO, an end-to-end visual odometry method in dynamic endoscopic scenes. Our method consists of a transformer-based motion detection network and a weighted pose-optimization module. The motion detection network infers scene dynamics and guides the pose estimation. Furthermore, we introduce a semi-synthetic dataset featuring tissue and tool movement categories. It serves as training data, improving pose estimation accuracy, and also includes motion masks to enable a fine-grained inspection and evaluation.

Results: DyEndoVO significantly outperforms state-of-the-art methods in pose estimation for dynamic surgical scenes. Despite being trained solely on a synthetic dataset, our method generalizes well to real-world data without fine-tuning. Further analysis attributes this success to the effective detection of scene dynamics and the adaptation in the learned weight toward pose estimation; moreover, the semi-synthetic dataset also plays a key role in bridging the sim-to-real gap.

Conclusions: In this work, we aim to improve the accuracy and robustness of pose estimation in challenging dynamic surgical scenes, by effectively handling scene dynamics. Our method, combined with the proposed synthetic dataset, demonstrates improved performance in pose estimation and generalizes well to real-world data, showing its potential in advancing related works such as SLAM and 3D reconstruction in complex surgical environments.

目的:估计内窥镜的6自由度(DoF)姿态对于微创计算机辅助手术的各种应用至关重要。由于有限的工作空间和传感器的限制,基于图像的方法是手术环境中姿态估计的一些最实用的解决方案。然而,这些方法经常在动态场景中挣扎或失败,例如涉及组织变形、手术工具运动和工具-组织相互作用的场景。方法:我们提出DyEndoVO,一种在动态内镜场景下的端到端视觉里程计方法。该方法由基于变压器的运动检测网络和加权姿态优化模块组成。运动检测网络推断场景动态并指导姿态估计。此外,我们引入了一个半合成的数据集,其中包括组织和工具运动类别。它可以作为训练数据,提高姿态估计的准确性,还包括运动面具,以实现细粒度的检查和评估。结果:DyEndoVO在动态手术场景的姿态估计方面明显优于最先进的方法。尽管仅在合成数据集上进行训练,但我们的方法可以很好地推广到真实世界的数据,而无需微调。进一步分析将这一成功归因于对场景动态的有效检测和学习权值对姿态估计的适应性;此外,半合成数据集在弥合模拟与真实之间的差距方面也起着关键作用。结论:在这项工作中,我们旨在通过有效地处理场景动态,提高在具有挑战性的动态手术场景中姿态估计的准确性和鲁棒性。我们的方法与所提出的合成数据集相结合,在姿态估计方面表现出更好的性能,并且可以很好地推广到现实世界的数据中,显示出其在复杂手术环境中推进SLAM和3D重建等相关工作的潜力。
{"title":"DyEndoVO: scene dynamics-aware pose estimation of endoscope in minimally invasive surgery.","authors":"Jinjing Xu, Reuben Docea, Micha Pfeiffer, Martin Wagner, Marius Distler, Stefanie Speidel","doi":"10.1007/s11548-025-03549-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03549-0","url":null,"abstract":"<p><strong>Purpose: </strong>Estimating the 6 degrees of freedom (DoF) pose of an endoscope is crucial for various applications in minimally invasive computer-assisted surgery. Image-based approaches are some of the most practical solutions for pose estimation in surgical environments, due to a limited workspace and sensor constraints. However, these methods often struggle or fail in dynamic scenes, such as those involving tissue deformation, surgical tool movement, and tool-tissue interaction.</p><p><strong>Methods: </strong>We propose DyEndoVO, an end-to-end visual odometry method in dynamic endoscopic scenes. Our method consists of a transformer-based motion detection network and a weighted pose-optimization module. The motion detection network infers scene dynamics and guides the pose estimation. Furthermore, we introduce a semi-synthetic dataset featuring tissue and tool movement categories. It serves as training data, improving pose estimation accuracy, and also includes motion masks to enable a fine-grained inspection and evaluation.</p><p><strong>Results: </strong>DyEndoVO significantly outperforms state-of-the-art methods in pose estimation for dynamic surgical scenes. Despite being trained solely on a synthetic dataset, our method generalizes well to real-world data without fine-tuning. Further analysis attributes this success to the effective detection of scene dynamics and the adaptation in the learned weight toward pose estimation; moreover, the semi-synthetic dataset also plays a key role in bridging the sim-to-real gap.</p><p><strong>Conclusions: </strong>In this work, we aim to improve the accuracy and robustness of pose estimation in challenging dynamic surgical scenes, by effectively handling scene dynamics. Our method, combined with the proposed synthetic dataset, demonstrates improved performance in pose estimation and generalizes well to real-world data, showing its potential in advancing related works such as SLAM and 3D reconstruction in complex surgical environments.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery. 使用深度视觉语言模型提高了内窥镜耳鼻喉手术辅助应用中的多任务性能。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-22 DOI: 10.1007/s11548-025-03512-z
Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth

Purpose: Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of natural language, limiting their assistance capabilities. This work adopts a proven architecture for vision-language models (VLM) to perform multi-task learning for image classification, text prediction, and surgical report generation, specifically for endoscopic ENT surgeries.

Methods: We adopted a VLM architecture utilizing encoders biased for the endoscopy domain for image and text embedding and combine them via cross-attention. The model was trained on a newly created multi-task dataset derived from 30 annotated endoscopic procedures, comprising 130,000 multi-label images, anatomical descriptions, and synchronized surgical reports. Two variations of the model, a lightweight 61M parameter and a 176M parameter model, were evaluated both against an existing baseline from previous mono-task studies as well as the EndoVit and SurgicalGPT models as external references. Ablation studies investigate the influence of removing image or text embeddings, and cross-attention on the task performance. Performance was measured for landmark classification, structured text prediction, and report generation using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity metrics.

Results: The VLM base model improves the baseline f1 score for image classification by up to 12% and natural language text generation by up to 14% across image classification and report generation tasks. The text generation of structured language tasks, however, showed minimal gains, indicating limitations in structured sentence learning from combined image-text embeddings. EndoViT and SurgicalGPT slightly trail our domain-specific VLM. The image-only and text-only ablations confirm that the vision component benefits language tasks, whereas text has limited impact on landmark detection.

Conclusion: We developed a vision-language model capable of integrating image and text data for endoscopic ENT assistance tasks, that is able to replace three isolated models, delivering multi-task assistance while outperforming prior and general‑purpose baselines. Remaining challenges include the handling of imbalanced class distributions and limited gains on templated structured text.

目的:内窥镜辅助应用的深度学习模型主要关注基于图像的任务,如工具检测、解剖分类和工作流分割。然而,这些方法往往忽视了自然语言的整合,限制了它们的辅助能力。这项工作采用了一种成熟的视觉语言模型(VLM)架构来执行多任务学习,用于图像分类、文本预测和手术报告生成,特别是内窥镜耳鼻喉手术。方法:采用一种基于内窥镜域的编码器的VLM架构进行图像和文本嵌入,并通过交叉关注将它们组合起来。该模型在一个新创建的多任务数据集上进行训练,该数据集来自30个带注释的内镜手术,包括130,000个多标签图像、解剖描述和同步手术报告。模型的两种变体,轻量级61M参数模型和176M参数模型,根据先前单任务研究的现有基线以及EndoVit和SurgicalGPT模型作为外部参考进行评估。消融研究探讨了去除图像或文本嵌入以及交叉注意对任务表现的影响。使用精度、召回率、F1-score、BLEU-2、ROUGE-L和余弦相似度指标来测量里程碑分类、结构化文本预测和报告生成的性能。结果:在图像分类和报告生成任务中,VLM基础模型将图像分类的基线f1分数提高了12%,将自然语言文本生成提高了14%。然而,结构化语言任务的文本生成显示出最小的收益,这表明了从图像-文本组合嵌入中学习结构化句子的局限性。EndoViT和SurgicalGPT稍微落后于我们领域特定的VLM。纯图像和纯文本的消融证实了视觉成分有利于语言任务,而文本对地标检测的影响有限。结论:我们开发了一种视觉语言模型,能够集成内窥镜耳鼻炎辅助任务的图像和文本数据,能够取代三个孤立的模型,提供多任务辅助,同时优于先前和通用基线。剩下的挑战包括处理不平衡的类分布和模板化结构化文本的有限收益。
{"title":"Using deep vision-language models improves multi-task performance in assistance applications for endoscopic ENT surgery.","authors":"Richard Bieck, Martin Sorge, Katharina Heuermann, Viktor Kunz, Markus Pirlich, Thomas Neumuth","doi":"10.1007/s11548-025-03512-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03512-z","url":null,"abstract":"<p><strong>Purpose: </strong>Deep learning models for endoscopic assistance applications predominantly focus on image-based tasks, such as tool detection, anatomical classification, and workflow segmentation. However, these approaches often neglect the integration of natural language, limiting their assistance capabilities. This work adopts a proven architecture for vision-language models (VLM) to perform multi-task learning for image classification, text prediction, and surgical report generation, specifically for endoscopic ENT surgeries.</p><p><strong>Methods: </strong>We adopted a VLM architecture utilizing encoders biased for the endoscopy domain for image and text embedding and combine them via cross-attention. The model was trained on a newly created multi-task dataset derived from 30 annotated endoscopic procedures, comprising 130,000 multi-label images, anatomical descriptions, and synchronized surgical reports. Two variations of the model, a lightweight 61M parameter and a 176M parameter model, were evaluated both against an existing baseline from previous mono-task studies as well as the EndoVit and SurgicalGPT models as external references. Ablation studies investigate the influence of removing image or text embeddings, and cross-attention on the task performance. Performance was measured for landmark classification, structured text prediction, and report generation using precision, recall, F1-score, BLEU-2, ROUGE-L, and cosine similarity metrics.</p><p><strong>Results: </strong>The VLM base model improves the baseline f1 score for image classification by up to 12% and natural language text generation by up to 14% across image classification and report generation tasks. The text generation of structured language tasks, however, showed minimal gains, indicating limitations in structured sentence learning from combined image-text embeddings. EndoViT and SurgicalGPT slightly trail our domain-specific VLM. The image-only and text-only ablations confirm that the vision component benefits language tasks, whereas text has limited impact on landmark detection.</p><p><strong>Conclusion: </strong>We developed a vision-language model capable of integrating image and text data for endoscopic ENT assistance tasks, that is able to replace three isolated models, delivering multi-task assistance while outperforming prior and general‑purpose baselines. Remaining challenges include the handling of imbalanced class distributions and limited gains on templated structured text.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145806319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying the anatomical variability of the proximal femur. 量化股骨近端解剖变异性。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-19 DOI: 10.1007/s11548-025-03560-5
Angelika Ramesh, Johann Henckel, Alister Hart, Anna Di Laura

Purpose: Achieving a prosthetic femoral version (PFV) within the target range of 10-20° is crucial for optimal biomechanics in total hip arthroplasty (THA). Predicting the PFV preoperatively is challenging due to the limited understanding of the relationship between native femoral version (NFV) and the morphology of the intramedullary canal. This study aims to quantify the 3D morphological variability and identify the most variable anatomical features of the proximal femur pre- and post-operatively.

Methods: Pre- and post-operative CT scans from 62 patients (31 males, 31 females) who underwent THA and received a single stem design (straight, triple-tapered) were analysed. Four femoral models were generated per patient: 1. Native proximal femur, 2. Native femur after neck osteotomy, 3. Internal femoral canal after neck osteotomy, and 4. Reconstructed femur. Statistical Shape Models (SSMs) were developed separately by sex, and principal component analysis (PCA) was used to identify dominant modes of anatomical variation.

Results: The first three principal components (PCs) accounted for over 60% of shape variability across all models. PFV showed weak correlation with NFV as variability existed between the SSM of the internal femoral canal and SSM of the native proximal femur. Sex-specific differences in the measured NFV and PFV were found, with females exhibiting a greater range and a more anteverted femur/femoral stem. The female canal model showed intramedullary version variability; however, this variability was not present in the first three PCs in the corresponding male model.

Conclusions: This study demonstrates that PFV cannot be reliably predicted from NFV alone. These findings underscore the need for advanced, 3D preoperative planning tools to better predict stem version and accommodate patient-specific anatomy. Additionally, the increased variability observed in females may warrant sex-specific consideration in implant design choice and surgical technique.

目的:在全髋关节置换术(THA)中实现10-20°目标范围内的假体股骨版本(PFV)对于获得最佳生物力学至关重要。由于对天然股骨版本(NFV)与髓内管形态之间关系的了解有限,因此术前预测PFV具有挑战性。本研究旨在量化股骨近端术前和术后的三维形态学变异,并确定最易变化的解剖特征。方法:对62例(男31例,女31例)行全髋关节置换术并接受单茎设计(直、三锥形)的患者进行术前和术后CT扫描分析。每个患者生成4个股骨模型:1。原生股骨近端,2;颈部截骨后原生股骨,3。颈截骨术后股骨内管;重建股骨。统计形状模型(SSMs)按性别分别开发,主成分分析(PCA)用于确定解剖变异的优势模式。结果:前三个主成分(pc)占所有模型形状变异性的60%以上。PFV与NFV的相关性较弱,因为股骨内管SSM与原股骨近端SSM存在差异。在测量的NFV和PFV中发现了性别特异性差异,女性表现出更大的范围和更前倾的股骨/股骨干。雌性椎管模型显示髓内形态变异;然而,在相应的男性模型的前三个pc中不存在这种可变性。结论:本研究表明,仅从NFV不能可靠地预测PFV。这些发现强调需要先进的3D术前计划工具来更好地预测干细胞版本并适应患者特定的解剖结构。此外,在女性中观察到的变异性增加可能需要在植入物设计选择和手术技术方面考虑性别特异性。
{"title":"Quantifying the anatomical variability of the proximal femur.","authors":"Angelika Ramesh, Johann Henckel, Alister Hart, Anna Di Laura","doi":"10.1007/s11548-025-03560-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03560-5","url":null,"abstract":"<p><strong>Purpose: </strong>Achieving a prosthetic femoral version (PFV) within the target range of 10-20° is crucial for optimal biomechanics in total hip arthroplasty (THA). Predicting the PFV preoperatively is challenging due to the limited understanding of the relationship between native femoral version (NFV) and the morphology of the intramedullary canal. This study aims to quantify the 3D morphological variability and identify the most variable anatomical features of the proximal femur pre- and post-operatively.</p><p><strong>Methods: </strong>Pre- and post-operative CT scans from 62 patients (31 males, 31 females) who underwent THA and received a single stem design (straight, triple-tapered) were analysed. Four femoral models were generated per patient: 1. Native proximal femur, 2. Native femur after neck osteotomy, 3. Internal femoral canal after neck osteotomy, and 4. Reconstructed femur. Statistical Shape Models (SSMs) were developed separately by sex, and principal component analysis (PCA) was used to identify dominant modes of anatomical variation.</p><p><strong>Results: </strong>The first three principal components (PCs) accounted for over 60% of shape variability across all models. PFV showed weak correlation with NFV as variability existed between the SSM of the internal femoral canal and SSM of the native proximal femur. Sex-specific differences in the measured NFV and PFV were found, with females exhibiting a greater range and a more anteverted femur/femoral stem. The female canal model showed intramedullary version variability; however, this variability was not present in the first three PCs in the corresponding male model.</p><p><strong>Conclusions: </strong>This study demonstrates that PFV cannot be reliably predicted from NFV alone. These findings underscore the need for advanced, 3D preoperative planning tools to better predict stem version and accommodate patient-specific anatomy. Additionally, the increased variability observed in females may warrant sex-specific consideration in implant design choice and surgical technique.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145794570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal framework for swallow detection in video-fluoroscopic swallow studies using manometric pressure distributions from dysphagic patients. 使用吞咽困难患者的压力分布的视频透视吞咽研究中吞咽检测的多模态框架。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-15 DOI: 10.1007/s11548-025-03556-1
Manuel Maria Loureiro da Rocha, Lisette van der Molen, Marise Neijman, Marteen J A van Alphen, Michiel M W M van den Brekel, Françoise J Siepel

Purpose: Oropharyngeal dysphagia affects up to half of head and neck cancer (HNC) patients. Multi-swallow video-fluoroscopic swallow studies (VFSS) combined with high-resolution impedance manometry (HRIM) offer a comprehensive assessment of swallowing function. However, their use in HNC populations is limited by high clinical workload and complexity of data collection and analysis with existing software.

Methods: To address the data collection challenge, we propose a framework for automatic swallow detection in simultaneous VFSS-HRIM examinations. The framework identifies candidate swallow intervals in continuous VFSS videos using an optimized double-sweep optical flow algorithm. Each candidate interval is then classified using a pressure-based swallow template derived from three annotated samples, leveraging features such as normalized peak-to-peak amplitude, mean, and standard deviation from upper esophageal sphincter sensors.

Results: The methodology was evaluated on 97 swallows from twelve post-head and neck cancer patients. The detection pipeline achieved 95% Recall and 92% F1-score. Importantly, the number of required HRIM annotations was reduced by 63%, substantially decreasing clinician workload while maintaining high accuracy.

Conclusion: This framework overcomes limitations of current software for simultaneous VFSS-HRIM collection by enabling high-accuracy, low-input swallow detection in HNC patients. Validated on a heterogeneous patient cohort, it initiates the groundwork for scalable, objective, and multimodal swallowing assessment.

目的:口咽吞咽困难影响多达一半的头颈癌(HNC)患者。多次吞咽视频透视吞咽研究(VFSS)结合高分辨率阻抗测压(hrm)提供了吞咽功能的全面评估。然而,它们在HNC人群中的使用受到高临床工作量和现有软件收集和分析数据的复杂性的限制。方法:为了解决数据收集的挑战,我们提出了一个在vfss - hrm同时检查中自动吞咽检测的框架。该框架使用优化的双扫描光流算法识别连续VFSS视频中的候选吞下间隔。然后,利用上食管括约肌传感器的标准化峰间振幅、平均值和标准差等特征,使用来自三个注释样本的基于压力的吞咽模板对每个候选区间进行分类。结果:对12例头颈癌后患者的97只燕子进行了评价。检测管道达到95%的召回率和92%的f1得分。重要的是,所需的hrm注释数量减少了63%,在保持高准确性的同时大大减少了临床医生的工作量。结论:该框架克服了当前软件同时采集vfss - hrm的局限性,实现了HNC患者的高精度、低输入吞咽检测。在异质患者队列中验证,它为可扩展、客观和多模态吞咽评估奠定了基础。
{"title":"Multimodal framework for swallow detection in video-fluoroscopic swallow studies using manometric pressure distributions from dysphagic patients.","authors":"Manuel Maria Loureiro da Rocha, Lisette van der Molen, Marise Neijman, Marteen J A van Alphen, Michiel M W M van den Brekel, Françoise J Siepel","doi":"10.1007/s11548-025-03556-1","DOIUrl":"https://doi.org/10.1007/s11548-025-03556-1","url":null,"abstract":"<p><strong>Purpose: </strong>Oropharyngeal dysphagia affects up to half of head and neck cancer (HNC) patients. Multi-swallow video-fluoroscopic swallow studies (VFSS) combined with high-resolution impedance manometry (HRIM) offer a comprehensive assessment of swallowing function. However, their use in HNC populations is limited by high clinical workload and complexity of data collection and analysis with existing software.</p><p><strong>Methods: </strong>To address the data collection challenge, we propose a framework for automatic swallow detection in simultaneous VFSS-HRIM examinations. The framework identifies candidate swallow intervals in continuous VFSS videos using an optimized double-sweep optical flow algorithm. Each candidate interval is then classified using a pressure-based swallow template derived from three annotated samples, leveraging features such as normalized peak-to-peak amplitude, mean, and standard deviation from upper esophageal sphincter sensors.</p><p><strong>Results: </strong>The methodology was evaluated on 97 swallows from twelve post-head and neck cancer patients. The detection pipeline achieved 95% Recall and 92% F1-score. Importantly, the number of required HRIM annotations was reduced by 63%, substantially decreasing clinician workload while maintaining high accuracy.</p><p><strong>Conclusion: </strong>This framework overcomes limitations of current software for simultaneous VFSS-HRIM collection by enabling high-accuracy, low-input swallow detection in HNC patients. Validated on a heterogeneous patient cohort, it initiates the groundwork for scalable, objective, and multimodal swallowing assessment.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colormap augmentation: a novel method for cross-modality domain generalization. 色图增强:一种跨模态域泛化的新方法。
IF 2.3 3区 医学 Q3 ENGINEERING, BIOMEDICAL Pub Date : 2025-12-15 DOI: 10.1007/s11548-025-03559-y
Falko Heitzer, Duc Duy Pham, Wojciech Kowalczyk, Marcus Jäger, Josef Pauli

Purpose: Domain generalization plays a crucial role in analyzing medical images from diverse clinics, scanner vendors, and imaging modalities. Existing methods often require substantial computational resources to train a highly generalized segmentation network, presenting challenges in terms of both availability and cost. The goal of this work is to evaluate a novel, yet simple and effective method for enhancing the generalization of deep learning models in segmentation across varying modalities.

Methods: Eight augmentation methods will be applied individually to a source domain dataset in order to generalize deep learning models. These models will then be tested on completely unseen target domain datasets from a different imaging modality and compared against a lower baseline model. By leveraging standard augmentation techniques, extensive intensity augmentations, and carefully chosen color transformations, we aim to address the domain shift problem, particularly in the cross-modality setting.

Results: Our novel CmapAug method, when combined with standard augmentation techniques, resulted in a substantial improvement in the Dice Score, outperforming the baseline. While the baseline struggled to segment the liver structure in some test cases, our selective combination of augmentation methods achieved Dice scores as high as 83.2%.

Conclusion: Our results highlight the general effectiveness of the tested augmentation methods in addressing domain generalization and mitigating the domain shift problem caused by differences in imaging modalities between the source and target domains. The proposed augmentation strategy offers a simple yet powerful solution to this challenge, with significant potential in clinical scenarios where annotated data from the target domain are limited or unavailable.

目的:域概化在分析来自不同诊所、扫描仪供应商和成像方式的医学图像中起着至关重要的作用。现有的方法通常需要大量的计算资源来训练高度一般化的分割网络,这在可用性和成本方面都存在挑战。这项工作的目标是评估一种新颖、简单而有效的方法,用于增强深度学习模型在不同模式分割中的泛化。方法:将八种增强方法分别应用于源域数据集,以推广深度学习模型。然后,这些模型将在来自不同成像方式的完全不可见的目标域数据集上进行测试,并与较低基线模型进行比较。通过利用标准增强技术、广泛的强度增强和精心选择的颜色变换,我们的目标是解决域移位问题,特别是在跨模态设置中。结果:我们的新型camapaug方法与标准增强技术相结合,可以显著提高Dice Score,优于基线。虽然基线在一些测试案例中难以分割肝脏结构,但我们选择性地组合增强方法获得了高达83.2%的Dice分数。结论:我们的研究结果强调了所测试的增强方法在解决域泛化和减轻由源域和目标域之间成像方式差异引起的域漂移问题方面的总体有效性。提出的增强策略为这一挑战提供了一个简单而强大的解决方案,在目标域的注释数据有限或不可用的临床场景中具有重大潜力。
{"title":"Colormap augmentation: a novel method for cross-modality domain generalization.","authors":"Falko Heitzer, Duc Duy Pham, Wojciech Kowalczyk, Marcus Jäger, Josef Pauli","doi":"10.1007/s11548-025-03559-y","DOIUrl":"https://doi.org/10.1007/s11548-025-03559-y","url":null,"abstract":"<p><strong>Purpose: </strong>Domain generalization plays a crucial role in analyzing medical images from diverse clinics, scanner vendors, and imaging modalities. Existing methods often require substantial computational resources to train a highly generalized segmentation network, presenting challenges in terms of both availability and cost. The goal of this work is to evaluate a novel, yet simple and effective method for enhancing the generalization of deep learning models in segmentation across varying modalities.</p><p><strong>Methods: </strong>Eight augmentation methods will be applied individually to a source domain dataset in order to generalize deep learning models. These models will then be tested on completely unseen target domain datasets from a different imaging modality and compared against a lower baseline model. By leveraging standard augmentation techniques, extensive intensity augmentations, and carefully chosen color transformations, we aim to address the domain shift problem, particularly in the cross-modality setting.</p><p><strong>Results: </strong>Our novel CmapAug method, when combined with standard augmentation techniques, resulted in a substantial improvement in the Dice Score, outperforming the baseline. While the baseline struggled to segment the liver structure in some test cases, our selective combination of augmentation methods achieved Dice scores as high as 83.2%.</p><p><strong>Conclusion: </strong>Our results highlight the general effectiveness of the tested augmentation methods in addressing domain generalization and mitigating the domain shift problem caused by differences in imaging modalities between the source and target domains. The proposed augmentation strategy offers a simple yet powerful solution to this challenge, with significant potential in clinical scenarios where annotated data from the target domain are limited or unavailable.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Assisted Radiology and Surgery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1