首页 > 最新文献

Medical image analysis最新文献

英文 中文
Rethinking fairness in medical imaging: Maximizing group-specific performance with application to skin disease diagnosis 重新思考医学影像的公平性:最大化群体特异性表现与皮肤病诊断的应用
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.media.2026.103950
Gelei Xu , Yuying Duan , Jun Xia , Ching-Hao Chiu , Michael Lemmon , Wei Jin , Yiyu Shi
Recent efforts in medical image computing have focused on improving fairness by balancing it with accuracy within a single, unified model. However, this often creates a trade-off: gains for underrepresented groups can come at the expense of reduced accuracy for groups that were previously well-served. In high-stakes clinical contexts, even minor drops in accuracy can lead to serious consequences, making such trade-offs highly contentious. Rather than accepting this compromise, we reframe the fairness objective in this paper as maximizing diagnostic accuracy for each patient group by leveraging additional computational resources to train group-specific models. To achieve this goal, we introduce SPARE, a novel data reweighting algorithm designed to optimize performance for a given group. SPARE evaluates the value of each training sample using two key factors: utility, which reflects the sample’s contribution to refining the model’s decision boundary, and group similarity, which captures its relevance to the target group. By assigning greater weight to samples that score highly on both metrics, SPARE rebalances the training process-particularly leveraging the value of out-of-group data-to improve group-specific accuracy while avoiding the traditional fairness-accuracy trade-off. Experiments on two skin disease datasets demonstrate that SPARE significantly improves group-specific performance while maintaining comparable fairness metrics, highlighting its promise as a more practical fairness paradigm for improving clinical reliability.
最近在医学图像计算方面的努力主要集中在通过在一个单一的、统一的模型中平衡公平性和准确性来提高公平性。然而,这通常会产生一种权衡:对代表性不足的群体的收益可能是以先前服务良好的群体的准确性降低为代价的。在高风险的临床环境中,即使是很小的准确性下降也会导致严重的后果,这使得这种权衡非常有争议。与其接受这种妥协,我们在本文中将公平性目标重新定义为通过利用额外的计算资源来训练特定于组的模型来最大化每个患者组的诊断准确性。为了实现这一目标,我们引入了一种新的数据重加权算法SPARE,该算法旨在优化给定组的性能。SPARE使用两个关键因素来评估每个训练样本的价值:效用,反映样本对改进模型决策边界的贡献,以及组相似性,捕获其与目标组的相关性。通过赋予在两个指标上得分较高的样本更大的权重,SPARE重新平衡了训练过程——特别是利用组外数据的价值——以提高组特定的准确性,同时避免了传统的公平-准确性权衡。在两个皮肤病数据集上的实验表明,SPARE在保持可比较的公平性指标的同时,显著提高了群体特异性的表现,突出了其作为提高临床可靠性的更实用的公平性范例的前景。
{"title":"Rethinking fairness in medical imaging: Maximizing group-specific performance with application to skin disease diagnosis","authors":"Gelei Xu ,&nbsp;Yuying Duan ,&nbsp;Jun Xia ,&nbsp;Ching-Hao Chiu ,&nbsp;Michael Lemmon ,&nbsp;Wei Jin ,&nbsp;Yiyu Shi","doi":"10.1016/j.media.2026.103950","DOIUrl":"10.1016/j.media.2026.103950","url":null,"abstract":"<div><div>Recent efforts in medical image computing have focused on improving fairness by balancing it with accuracy within a single, unified model. However, this often creates a trade-off: gains for underrepresented groups can come at the expense of reduced accuracy for groups that were previously well-served. In high-stakes clinical contexts, even minor drops in accuracy can lead to serious consequences, making such trade-offs highly contentious. Rather than accepting this compromise, we reframe the fairness objective in this paper as maximizing diagnostic accuracy for each patient group by leveraging additional computational resources to train group-specific models. To achieve this goal, we introduce SPARE, a novel data reweighting algorithm designed to optimize performance for a given group. SPARE evaluates the value of each training sample using two key factors: utility, which reflects the sample’s contribution to refining the model’s decision boundary, and group similarity, which captures its relevance to the target group. By assigning greater weight to samples that score highly on both metrics, SPARE rebalances the training process-particularly leveraging the value of out-of-group data-to improve group-specific accuracy while avoiding the traditional fairness-accuracy trade-off. Experiments on two skin disease datasets demonstrate that SPARE significantly improves group-specific performance while maintaining comparable fairness metrics, highlighting its promise as a more practical fairness paradigm for improving clinical reliability.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103950"},"PeriodicalIF":11.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis 基于准多模态病理生理特征学习的视网膜疾病诊断
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1016/j.media.2025.103886
Lu Zhang , Huizhen Yu , Zuowei Wang , Fu Gui , Yatu Guo , Wei Zhang , Mengyu Jia
Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
利用多模态数据的互补信号,可以有效地识别和诊断跨广谱的视网膜疾病。然而,在眼科实践中,多模态诊断通常在数据异质性、潜在侵入性、注册复杂性等方面受到挑战。因此,我们提出了一个整合多模态数据合成与融合的统一框架,用于视网膜疾病的分类与分级。具体来说,合成的多模态数据包括眼底荧光素血管造影(FFA)、多光谱成像(MSI)和强调潜在病变以及视盘/杯区域的显著性图。并行模型被独立训练以学习捕获交叉病理生理特征的模式特定表示。然后,这些特征在模式内部和模式之间进行自适应校准,以根据下游任务执行信息修剪和灵活集成。所提出的学习系统通过图像和特征空间的可视化来彻底解释。在两个公开数据集上进行的大量实验表明,我们的方法在多标签分类(F1-score: 0.683, AUC: 0.953)和糖尿病视网膜病变分级(准确率:0.842,Kappa: 0.861)方面优于最先进的方法。这项工作不仅提高了视网膜疾病筛查的准确性和效率,而且还为各种医学成像模式的数据增强提供了可扩展的框架。
{"title":"Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis","authors":"Lu Zhang ,&nbsp;Huizhen Yu ,&nbsp;Zuowei Wang ,&nbsp;Fu Gui ,&nbsp;Yatu Guo ,&nbsp;Wei Zhang ,&nbsp;Mengyu Jia","doi":"10.1016/j.media.2025.103886","DOIUrl":"10.1016/j.media.2025.103886","url":null,"abstract":"<div><div>Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice is typically challenged in terms of data heterogeneity, potential invasiveness, registration complexity, and so on. As such, a unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading. Specifically, the synthesized multimodal data incorporates fundus fluorescein angiography (FFA), multispectral imaging (MSI), and saliency maps that emphasize latent lesions as well as optic disc/cup regions. Parallel models are independently trained to learn modality-specific representations that capture cross-pathophysiological signatures. These features are then adaptively calibrated within and across modalities to perform information pruning and flexible integration according to downstream tasks. The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces. Extensive experiments on two public datasets demonstrated the superiority of our approach over state-of-the-art ones in the tasks of multi-label classification (F1-score: 0.683, AUC: 0.953) and diabetic retinopathy grading (Accuracy:0.842, Kappa: 0.861). This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103886"},"PeriodicalIF":11.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting diabetic macular edema treatment responses using OCT: Dataset and methods of APTOS competition 使用OCT预测糖尿病黄斑水肿治疗反应:APTOS竞争的数据集和方法
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.media.2026.103942
Weiyi Zhang , Peranut Chotcomwongse , Yinwen Li , Pusheng Xu , Ruijie Yao , Lianhao Zhou , Yuxuan Zhou , Hui Feng , Qiping Zhou , Xinyue Wang , Shoujin Huang , Zihao Jin , Florence H T Chung , Shujun Wang , Yalin Zheng , Mingguang He , Danli Shi , Paisan Ruamviboonsuk
Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2000 patients with labels across four sub-tasks. This paper details the competition’s structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06 %, highlighting the potential of AI in personalized DME treatment and clinical decision-making.
糖尿病性黄斑水肿(DME)对糖尿病患者的视力损害有重要影响。玻璃体内治疗的治疗反应各不相同,因此需要对患者进行分层,以预测治疗效果并实现个性化策略。据我们所知,这项研究是第一个探索治疗前分层预测二甲醚治疗反应的研究。为了推进这一研究,我们于2021年组织了第二届亚太远程眼科学会(APTOS)大数据竞赛。比赛的重点是提高使用眼科OCT图像预测抗vegf治疗反应的准确性。我们提供了一个包含来自2000名患者的数万张OCT图像的数据集,这些图像的标签跨越四个子任务。本文详细介绍了竞赛的结构、数据集、领先方法和评估指标。该竞赛吸引了科学界的强烈参与,最初有170个团队注册,41个团队进入了决赛。表现最好的团队的AUC达到80.06%,突出了人工智能在个性化DME治疗和临床决策方面的潜力。
{"title":"Predicting diabetic macular edema treatment responses using OCT: Dataset and methods of APTOS competition","authors":"Weiyi Zhang ,&nbsp;Peranut Chotcomwongse ,&nbsp;Yinwen Li ,&nbsp;Pusheng Xu ,&nbsp;Ruijie Yao ,&nbsp;Lianhao Zhou ,&nbsp;Yuxuan Zhou ,&nbsp;Hui Feng ,&nbsp;Qiping Zhou ,&nbsp;Xinyue Wang ,&nbsp;Shoujin Huang ,&nbsp;Zihao Jin ,&nbsp;Florence H T Chung ,&nbsp;Shujun Wang ,&nbsp;Yalin Zheng ,&nbsp;Mingguang He ,&nbsp;Danli Shi ,&nbsp;Paisan Ruamviboonsuk","doi":"10.1016/j.media.2026.103942","DOIUrl":"10.1016/j.media.2026.103942","url":null,"abstract":"<div><div>Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance this research, we organized the 2nd Asia-Pacific Tele-Ophthalmology Society (APTOS) Big Data Competition in 2021. The competition focused on improving predictive accuracy for anti-VEGF therapy responses using ophthalmic OCT images. We provided a dataset containing tens of thousands of OCT images from 2000 patients with labels across four sub-tasks. This paper details the competition’s structure, dataset, leading methods, and evaluation metrics. The competition attracted strong scientific community participation, with 170 teams initially registering and 41 reaching the final round. The top-performing team achieved an AUC of 80.06 %, highlighting the potential of AI in personalized DME treatment and clinical decision-making.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103942"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth-induced prompt learning for laparoscopic liver landmark detection 深度诱导提示学习用于腹腔镜肝脏地标检测
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.media.2026.103940
Ruize Cui , Weixin Si , Zhixi Li , Kai Wang , Jialun Pei , Pheng-Ann Heng , Jing Qin
Laparoscopic liver surgery presents a highly intricate intraoperative environment with significant liver deformation, posing challenges for surgeons in locating critical liver structures. Anatomical liver landmarks can greatly assist surgeons in spatial perception in laparoscopic scenarios and facilitate preoperative-to-intraoperative registration. To advance research in liver landmark detection, we develop a new dataset called L3D-2K, comprising 2000 keyframes with expert landmark annotations from surgical videos of 47 patients. Accordingly, we propose a baseline, D2GPLand+, which effectively leverages depth modality to boost landmark detection performance. Concretely, we introduce a Depth-aware Prompt Embedding (DPE) scheme, which dynamically extracts class-related global geometric cues with the guidance of self-supervised prompts from the SAM encoder. Further, a Cross-dimension Unified Mamba (CUMamba) block is designed to comprehensively incorporate RGB and depth features with the concurrent spatial and channel scanning mechanism. Besides, we bring out an Anatomical Feature Augmentation (AFA) module that captures anatomical cues and emphasizes key structures by optimizing feature granularity. For benchmarking purposes, we evaluate our method and 17 mainstream detection models on L3D, L3D-2K, and P2ILF datasets. Experimental results demonstrate that D2GPLand+ obtains superior performance on all three datasets. Our approach provides surgeons with guiding clues that facilitate surgical operations and decision-making in complex laparoscopic surgery. Our code and dataset are available at https://github.com/cuiruize/D2GPLand-Plus.
腹腔镜肝脏手术的术中环境非常复杂,肝脏变形严重,这给外科医生定位关键肝脏结构带来了挑战。解剖性肝脏标志可以极大地帮助外科医生在腹腔镜下的空间感知,并促进术前到术中登记。为了推进肝脏地标检测的研究,我们开发了一个名为L3D-2K的新数据集,其中包括来自47名患者手术视频的2000个关键帧和专家地标注释。因此,我们提出了一个基线,D2GPLand+,它有效地利用深度模式来提高地标检测性能。具体而言,我们引入了一种深度感知提示嵌入(DPE)方案,该方案在SAM编码器的自监督提示引导下动态提取与类相关的全局几何线索。此外,一个跨维统一曼巴(CUMamba)块被设计成全面结合RGB和深度特征与并发的空间和通道扫描机制。此外,我们提出了一个解剖特征增强(AFA)模块,该模块通过优化特征粒度来捕获解剖线索并强调关键结构。为了进行基准测试,我们在L3D、L3D- 2k和P2ILF数据集上评估了我们的方法和17种主流检测模型。实验结果表明,D2GPLand+在这三种数据集上都取得了优异的性能。我们的方法为外科医生在复杂的腹腔镜手术中方便手术操作和决策提供了指导线索。我们的代码和数据集可在https://github.com/cuiruize/D2GPLand-Plus上获得。
{"title":"Depth-induced prompt learning for laparoscopic liver landmark detection","authors":"Ruize Cui ,&nbsp;Weixin Si ,&nbsp;Zhixi Li ,&nbsp;Kai Wang ,&nbsp;Jialun Pei ,&nbsp;Pheng-Ann Heng ,&nbsp;Jing Qin","doi":"10.1016/j.media.2026.103940","DOIUrl":"10.1016/j.media.2026.103940","url":null,"abstract":"<div><div>Laparoscopic liver surgery presents a highly intricate intraoperative environment with significant liver deformation, posing challenges for surgeons in locating critical liver structures. Anatomical liver landmarks can greatly assist surgeons in spatial perception in laparoscopic scenarios and facilitate preoperative-to-intraoperative registration. To advance research in liver landmark detection, we develop a new dataset called <em>L3D-2K</em>, comprising 2000 keyframes with expert landmark annotations from surgical videos of 47 patients. Accordingly, we propose a baseline, D<sup>2</sup>GPLand+, which effectively leverages depth modality to boost landmark detection performance. Concretely, we introduce a Depth-aware Prompt Embedding (DPE) scheme, which dynamically extracts class-related global geometric cues with the guidance of self-supervised prompts from the SAM encoder. Further, a Cross-dimension Unified Mamba (CUMamba) block is designed to comprehensively incorporate RGB and depth features with the concurrent spatial and channel scanning mechanism. Besides, we bring out an Anatomical Feature Augmentation (AFA) module that captures anatomical cues and emphasizes key structures by optimizing feature granularity. For benchmarking purposes, we evaluate our method and 17 mainstream detection models on L3D, L3D-2K, and P2ILF datasets. Experimental results demonstrate that D<sup>2</sup>GPLand+ obtains superior performance on all three datasets. Our approach provides surgeons with guiding clues that facilitate surgical operations and decision-making in complex laparoscopic surgery. Our code and dataset are available at <span><span>https://github.com/cuiruize/D2GPLand-Plus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103940"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145962447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge 内镜手术阶段识别、仪器关键点估计和仪器实例分割的比较验证:PhaKIR 2024挑战的结果
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-13 DOI: 10.1016/j.media.2026.103945
Tobias Rueckert , David Rauber , Raphaela Maerkl , Leonard Klausmann , Suemeyye R. Yildiran , Max Gutbrod , Danilo Weber Nunes , Alvaro Fernandez Moreno , Imanol Luengo , Danail Stoyanov , Nicolas Toussaint , Enki Cho , Hyeon Bae Kim , Oh Sung Choo , Ka Young Kim , Seong Tae Kim , Gonçalo Arantes , Kehan Song , Jianjun Zhu , Junchen Xiong , Christoph Palm
Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.
To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.
We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.
在内镜录像中对手术器械进行可靠的识别和定位是计算机和机器人辅助微创手术(RAMIS)广泛应用的基础,包括手术培训、技能评估和自主辅助。然而,在现实条件下的强大性能仍然是一个重大挑战。结合手术背景-如当前的手术阶段-已成为一种有希望的策略,以提高稳健性和可解释性。为了应对这些挑战,我们组织了外科手术阶段、关键点和仪器识别(PhaKIR)子挑战,作为MICCAI 2024内窥镜视觉(EndoVis)挑战的一部分。我们引入了一个新的、多中心的数据集,包括从三个不同的医疗机构收集的13个全长腹腔镜胆囊切除术视频,并对三个相互关联的任务进行了统一的注释:手术阶段识别、仪器关键点估计和仪器实例分割。与现有的数据集不同,我们的数据集可以在同一数据中对仪器定位和程序背景进行联合调查,同时支持跨整个过程的时间信息集成。我们根据生物医学图像分析挑战的BIAS指南报告结果和发现。PhaKIR子挑战通过为RAMIS中开发时间感知、上下文驱动的方法提供独特的基准来推进该领域,并为支持未来手术场景理解的研究提供高质量的资源。
{"title":"Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge","authors":"Tobias Rueckert ,&nbsp;David Rauber ,&nbsp;Raphaela Maerkl ,&nbsp;Leonard Klausmann ,&nbsp;Suemeyye R. Yildiran ,&nbsp;Max Gutbrod ,&nbsp;Danilo Weber Nunes ,&nbsp;Alvaro Fernandez Moreno ,&nbsp;Imanol Luengo ,&nbsp;Danail Stoyanov ,&nbsp;Nicolas Toussaint ,&nbsp;Enki Cho ,&nbsp;Hyeon Bae Kim ,&nbsp;Oh Sung Choo ,&nbsp;Ka Young Kim ,&nbsp;Seong Tae Kim ,&nbsp;Gonçalo Arantes ,&nbsp;Kehan Song ,&nbsp;Jianjun Zhu ,&nbsp;Junchen Xiong ,&nbsp;Christoph Palm","doi":"10.1016/j.media.2026.103945","DOIUrl":"10.1016/j.media.2026.103945","url":null,"abstract":"<div><div>Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.</div><div>To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.</div><div>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103945"},"PeriodicalIF":11.8,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust non-rigid image-to-patient registration for contactless dynamic thoracic tumor localization using recursive deformable diffusion models 基于递归可变形扩散模型的非接触动态胸部肿瘤定位鲁棒非刚性图像-患者配准
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.media.2026.103948
Dongyuan Li , Yixin Shan , Yuxuan Mao , Puxun Tu , Haochen Shi , Shenghao Huang , Weiyan Sun , Chang Chen , Xiaojun Chen
Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge-often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion-contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.
可变形的图像到患者的配准对于外科导航和医学成像至关重要,然而跨模式的空间转换的实时计算仍然是一个主要的临床挑战-通常是耗时的,容易出错的,并且可能增加创伤或辐射暴露。虽然最先进的方法在配对医学图像上取得了令人印象深刻的速度和准确性,但它们在跨模态胸部应用中面临明显的局限性,其中生理运动如呼吸使肿瘤定位复杂化。为了解决这个问题,我们提出了一个鲁棒的,非接触的,非刚性的注册框架,用于动态胸部肿瘤定位。通过训练一个高效的递归可变形扩散模型(RDDM),仅从吸气末和呼气末扫描中重建全面的4DCT序列,捕捉反映术中状态的呼吸动力学。对于实时患者对齐,我们引入了一种基于GICP的非接触式非刚性配准算法,利用立体RGB-D成像捕获的患者皮肤表面点云。该方法通过引入法向量约束和扩张收缩约束,增强了鲁棒性,避免了局部极小值。提出的框架在公开可用的数据集和志愿者试验上得到了验证。定量评估显示RDDM在呼吸期的解剖保真度,PSNR为34.01±2.78 dB。此外,我们初步开发了基于4dct的配准和手术导航模块,支持肿瘤定位和高精度跟踪。实验结果表明,该框架初步满足临床需求,并具有整合到下游手术系统的潜力。
{"title":"Robust non-rigid image-to-patient registration for contactless dynamic thoracic tumor localization using recursive deformable diffusion models","authors":"Dongyuan Li ,&nbsp;Yixin Shan ,&nbsp;Yuxuan Mao ,&nbsp;Puxun Tu ,&nbsp;Haochen Shi ,&nbsp;Shenghao Huang ,&nbsp;Weiyan Sun ,&nbsp;Chang Chen ,&nbsp;Xiaojun Chen","doi":"10.1016/j.media.2026.103948","DOIUrl":"10.1016/j.media.2026.103948","url":null,"abstract":"<div><div>Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge-often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion-contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103948"},"PeriodicalIF":11.8,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C2HFusion: Clinical context-driven hierarchical fusion of multimodal data for personalized and quantitative prognostic assessment in pancreatic cancer C2HFusion:临床情境驱动的多模式数据分层融合,用于胰腺癌的个性化和定量预后评估
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-11 DOI: 10.1016/j.media.2026.103937
Bolun Zeng , Yaolin Xu , Peng Wang , Tianyu Lu , Zongyu Xie , Mengsu Zeng , Jianjun Zhou , Liang Liu , Haitao Sun , Xiaojun Chen
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.
胰导管腺癌是一种高度侵袭性的恶性肿瘤。准确的预后模型可以实现可靠的风险分层,以确定最有可能从辅助治疗中获益的患者,从而促进个体化临床管理,并有可能改善患者的预后。尽管最近的深度学习方法在这一领域显示出了希望,但它们的有效性往往受到融合策略的限制,这些策略无法完全捕获跨异质临床模式的分层和互补信息。为了解决这些局限性,我们提出了C2HFusion,这是一种受个性化预后风险评估临床决策启发的新型融合框架。C2HFusion的独特之处在于它集成了跨多个表示级别和结构形式的多模态数据。在成像水平,它通过交叉注意从多序列MRI中提取和聚集肿瘤水平的特征,有效地捕获互补的成像模式。在患者层面,它将结构化数据(例如,实验室结果,人口统计数据)和非结构化数据(例如,放射学报告)编码为上下文先验,然后通过一种新的特征调制机制将其与成像表示融合。为了进一步增强这种跨层集成,一个可扩展的临床专家混合(MoCE)模块通过专门的分支动态路由不同的模式,并自适应地优化特征融合,以实现更稳健的多模式建模。在覆盖681名PDAC患者的多中心真实数据集上的验证表明,C2HFusion在总体生存预测方面始终优于最先进的方法,c指数提高了5%以上。这些结果突出了它在提高预后准确性和支持更明智、个性化的临床决策方面的潜力。
{"title":"C2HFusion: Clinical context-driven hierarchical fusion of multimodal data for personalized and quantitative prognostic assessment in pancreatic cancer","authors":"Bolun Zeng ,&nbsp;Yaolin Xu ,&nbsp;Peng Wang ,&nbsp;Tianyu Lu ,&nbsp;Zongyu Xie ,&nbsp;Mengsu Zeng ,&nbsp;Jianjun Zhou ,&nbsp;Liang Liu ,&nbsp;Haitao Sun ,&nbsp;Xiaojun Chen","doi":"10.1016/j.media.2026.103937","DOIUrl":"10.1016/j.media.2026.103937","url":null,"abstract":"<div><div>Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103937"},"PeriodicalIF":11.8,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial appearance prediction for orthognathic surgery with diffusion models 用扩散模型预测正颌手术面部外观
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-11 DOI: 10.1016/j.media.2026.103934
Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan
Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.
正颌手术通过重新定位骨骼结构来纠正颅颌面畸形,以改善面部美学和功能。传统的正颌手术计划主要是骨驱动的,首先确定骨重新定位,并预测软组织的预后。然而,这是有限的,它依赖于外科医生定义的骨计划和无法直接优化患者特定的美学结果。为了解决这些限制,软组织驱动的范式寻求首先预测患者特定的最佳面部外观,然后得出实现它所需的骨骼变化。在这项工作中,我们介绍了FAPOS(正颌手术面部外观预测),这是一种新型的基于变压器的潜在扩散框架,可以直接预测术前扫描的正常外观3D面部结果,从而允许软组织驱动的计划。FAPOS利用密集的282个地标表示,并在44602个公共3D人脸的组合数据集上进行训练,克服了数据稀缺性和缺乏对应性的限制。我们的三相训练管道结合了几何编码、潜在扩散建模和患者特异性条件反射。定量和定性结果表明,FAPOS在改善面部对称性和身份保持方面优于先前的方法。这些结果标志着实现软组织驱动的手术计划的重要一步,FAPOS提供了最佳的面部目标,作为估计后续阶段骨骼调整的基础。
{"title":"Facial appearance prediction for orthognathic surgery with diffusion models","authors":"Jungwook Lee ,&nbsp;Xuanang Xu ,&nbsp;Daeseung Kim ,&nbsp;Tianshu Kuang ,&nbsp;Hannah H. Deng ,&nbsp;Xinrui Song ,&nbsp;Yasmine Soubra ,&nbsp;Michael A.K. Liebschner ,&nbsp;Jaime Gateno ,&nbsp;Pingkun Yan","doi":"10.1016/j.media.2026.103934","DOIUrl":"10.1016/j.media.2026.103934","url":null,"abstract":"<div><div>Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103934"},"PeriodicalIF":11.8,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UTMorph: A hybrid CNN-transformer network for weakly-supervised multimodal image registration in biopsy puncture UTMorph:用于活检穿刺弱监督多模态图像配准的混合CNN-Transformer网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-10 DOI: 10.1016/j.media.2026.103938
Xudong Guo , Peiyu Chen , Haifeng Wang , Zhichao Yan , Qinfen Jiang , Rongjiang Wang , Ji Bin
Accurate registration of preoperative magnetic resonance imaging (MRI) and intraoperative ultrasound (US) images is essential to enhance the precision of biopsy punctures and targeted ablation procedures using robotic systems. To improve the speed and accuracy of registration algorithms while accounting for soft tissue deformation during puncture, we propose UTMorph, a hybrid framework consisting of convolutional neural network (CNN) and Transformer network, based on the U-Net architecture. This model is designed to enable efficient and deformable multimodal image registration. We introduced a novel attention mechanism that focuses on the structured features of images, thereby ensuring precise deformation estimation and reducing computational complexity. In addition, we proposed a hybrid edge loss function to complement the shape and boundary information, thereby improving registration accuracy. Experiments were conducted on data from 704 patients, including private datasets from Shanghai East Hospital, public datasets from The Cancer Imaging Archive, and the µ-ProReg Challenge. The performance of UTMorph was compared with that of six commonly used registration methods and loss functions. UTMorph achieved superior performance across multiple evaluation metrics (dice similarity coefficient: 0.890, 95th percentile Hausdorff distance: 2.679 mm, mean surface distance: 0.284 mm, and Jacobi determinant: 0.040) and ensures accurate registration with minimal memory usage, even under significant modal differences. These findings validate the effectiveness of the UTMorph model with the hybrid edge loss function for MR-US deformable medical image registration. This code is available at https://github.com/Prps7/UTMorph.
术前磁共振成像(MRI)和术中超声(US)图像的准确登记对于提高活检穿刺和使用机器人系统靶向消融手术的精度至关重要。为了提高配准算法的速度和准确性,同时考虑到穿刺过程中的软组织变形,我们提出了基于U-Net架构的卷积神经网络(CNN)和Transformer网络组成的混合框架UTMorph。该模型旨在实现高效和可变形的多模态图像配准。我们引入了一种新的关注机制,专注于图像的结构特征,从而确保精确的变形估计并降低计算复杂度。此外,我们提出了一种混合边缘损失函数来补充形状和边界信息,从而提高配准精度。实验使用了704例患者的数据,包括来自上海东方医院的私人数据集、来自癌症影像档案馆的公共数据集和µ-ProReg挑战。将UTMorph与六种常用的配准方法和损失函数的性能进行了比较。UTMorph在多个评估指标(骰子相似系数:0.890,第95百分位Hausdorff距离:2.679 mm,平均表面距离:0.284 mm, Jacobi行列式:0.040)上取得了优异的性能,即使在显著模态差异下,也能确保以最小的内存使用量进行准确注册。这些结果验证了带有混合边缘损失函数的UTMorph模型用于MR-US变形医学图像配准的有效性。此代码可从https://github.com/Prps7/UTMorph获得。
{"title":"UTMorph: A hybrid CNN-transformer network for weakly-supervised multimodal image registration in biopsy puncture","authors":"Xudong Guo ,&nbsp;Peiyu Chen ,&nbsp;Haifeng Wang ,&nbsp;Zhichao Yan ,&nbsp;Qinfen Jiang ,&nbsp;Rongjiang Wang ,&nbsp;Ji Bin","doi":"10.1016/j.media.2026.103938","DOIUrl":"10.1016/j.media.2026.103938","url":null,"abstract":"<div><div>Accurate registration of preoperative magnetic resonance imaging (MRI) and intraoperative ultrasound (US) images is essential to enhance the precision of biopsy punctures and targeted ablation procedures using robotic systems. To improve the speed and accuracy of registration algorithms while accounting for soft tissue deformation during puncture, we propose UTMorph, a hybrid framework consisting of convolutional neural network (CNN) and Transformer network, based on the U-Net architecture. This model is designed to enable efficient and deformable multimodal image registration. We introduced a novel attention mechanism that focuses on the structured features of images, thereby ensuring precise deformation estimation and reducing computational complexity. In addition, we proposed a hybrid edge loss function to complement the shape and boundary information, thereby improving registration accuracy. Experiments were conducted on data from 704 patients, including private datasets from Shanghai East Hospital, public datasets from The Cancer Imaging Archive, and the µ-ProReg Challenge. The performance of UTMorph was compared with that of six commonly used registration methods and loss functions. UTMorph achieved superior performance across multiple evaluation metrics (dice similarity coefficient: 0.890, 95th percentile Hausdorff distance: 2.679 mm, mean surface distance: 0.284 mm, and Jacobi determinant: 0.040) and ensures accurate registration with minimal memory usage, even under significant modal differences. These findings validate the effectiveness of the UTMorph model with the hybrid edge loss function for MR-US deformable medical image registration. This code is available at <span><span>https://github.com/Prps7/UTMorph</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103938"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MegaSeg: Towards scalable semantic segmentation for megapixel images MegaSeg:面向百万像素图像的可扩展语义分割
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-10 DOI: 10.1016/j.media.2026.103933
Solomon Kefas Kaura , Jialun Wu , Zeyu Gao , Chen Li
Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.
百万像素图像分割对于高分辨率组织病理学图像分析至关重要,但目前受到GPU内存限制的限制,需要进行补丁和降采样处理,从而危及全局和局部环境。本文介绍了MegaSeg,这是一个用于百万像素图像语义分割的端到端框架,利用u形架构和分而治之策略中的流卷积网络。MegaSeg支持8192×8192像素图像(67 MP)的有效语义分割,而不会牺牲细节或结构上下文,同时显着减少内存使用。此外,我们提出了细心密集细化模块(ADRM),以有效地保留和改进局部细节,同时捕获MegaSeg解码器路径中高分辨率图像中存在的上下文信息。在公共组织病理学数据集上的实验显示了优异的性能,既保留了全局结构又保留了局部细节。在CAMELYON16中,当输入大小从4 MP缩放到67 MP时,MegaSeg将自由响应操作特征(FROC)分数从0.78提高到0.89,突出了其对大规模医学图像分割的有效性。
{"title":"MegaSeg: Towards scalable semantic segmentation for megapixel images","authors":"Solomon Kefas Kaura ,&nbsp;Jialun Wu ,&nbsp;Zeyu Gao ,&nbsp;Chen Li","doi":"10.1016/j.media.2026.103933","DOIUrl":"10.1016/j.media.2026.103933","url":null,"abstract":"<div><div>Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103933"},"PeriodicalIF":11.8,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1