首页 > 最新文献

Medical image analysis最新文献

英文 中文
Fundus image quality assessment in retinopathy of prematurity via multi-label graph evidential network 基于多标签图证据网络的早产儿视网膜病变眼底图像质量评价
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-24 DOI: 10.1016/j.media.2026.103959
Donghan Wu , Wenyue Shen , Lu Yuan , Heng Li , Huaying Hao , Juan Ye , Yitian Zhao
Retinopathy of Prematurity (ROP) is a leading cause of childhood blindness worldwide. In clinical practice, fundus imaging serves as a primary diagnostic tool for ROP, making the accurate quality assessment of these images critically important. However, existing automated methods for evaluating ROP fundus images face significant challenges. First, there is a high degree of visual similarity between lesions and factors that influence quality. Second, there is a paucity of trustworthy outputs and interpretable or clinical-friendly designs, which limit their reliability and effectiveness. In this work, we propose a ROP image quality assessment framework, termed Q-ROP. This framework leverages fine-grained multi-label annotations based on key image factors such as artifacts, illumination, spatial positioning, and structural clarity. Additionally, the integration of a label graph network with evidential learning theory enables the model to explicitly capture the relationships between quality grades and influencing factors, thereby improving both robustness and accuracy. This approach facilitates interpretable analysis by directing the model’s focus toward relevant image features and reducing interference from lesion-like artifacts. Furthermore, the incorporation of evidential learning theory serves to quantify the uncertainty inherent in quality ratings, thereby ensuring the trustworthiness of the assessments. Trained and tested on a dataset of 6677 ROP images across three quality levels (i.e. acceptable, potentially acceptable, and unacceptable), Q-ROP achieved state-of-the-art performance with a 95.82% accuracy. Its effectiveness was further validated in a downstream ROP staging task, where it significantly improved the performance of typical classification models. These results demonstrate Q-ROP’s strong potential as a reliable and robust tool for clinical decision support.
早产儿视网膜病变(ROP)是全球儿童失明的主要原因。在临床实践中,眼底成像是ROP的主要诊断工具,因此对这些图像的准确质量评估至关重要。然而,现有的眼底图像ROP评估自动化方法面临着重大挑战。首先,在病变和影响质量的因素之间存在高度的视觉相似性。其次,缺乏可信赖的输出和可解释或临床友好的设计,这限制了它们的可靠性和有效性。在这项工作中,我们提出了一个ROP图像质量评估框架,称为Q-ROP。该框架利用基于关键图像因素(如工件、照明、空间定位和结构清晰度)的细粒度多标签注释。此外,将标签图网络与证据学习理论相结合,使模型能够明确地捕捉质量等级与影响因素之间的关系,从而提高鲁棒性和准确性。这种方法通过将模型的焦点指向相关的图像特征并减少来自类似病变的工件的干扰,从而促进了可解释的分析。此外,证据学习理论的结合有助于量化质量评级固有的不确定性,从而确保评估的可信度。在三个质量水平(即可接受、潜在可接受和不可接受)的6677张ROP图像数据集上进行训练和测试,Q-ROP达到了最先进的性能,准确率为95.82%。在下游ROP分期任务中进一步验证了其有效性,该方法显著提高了典型分类模型的性能。这些结果表明Q-ROP作为临床决策支持的可靠和强大的工具具有强大的潜力。
{"title":"Fundus image quality assessment in retinopathy of prematurity via multi-label graph evidential network","authors":"Donghan Wu ,&nbsp;Wenyue Shen ,&nbsp;Lu Yuan ,&nbsp;Heng Li ,&nbsp;Huaying Hao ,&nbsp;Juan Ye ,&nbsp;Yitian Zhao","doi":"10.1016/j.media.2026.103959","DOIUrl":"10.1016/j.media.2026.103959","url":null,"abstract":"<div><div>Retinopathy of Prematurity (ROP) is a leading cause of childhood blindness worldwide. In clinical practice, fundus imaging serves as a primary diagnostic tool for ROP, making the accurate quality assessment of these images critically important. However, existing automated methods for evaluating ROP fundus images face significant challenges. First, there is a high degree of visual similarity between lesions and factors that influence quality. Second, there is a paucity of trustworthy outputs and interpretable or clinical-friendly designs, which limit their reliability and effectiveness. In this work, we propose a ROP image quality assessment framework, termed Q-ROP. This framework leverages fine-grained multi-label annotations based on key image factors such as artifacts, illumination, spatial positioning, and structural clarity. Additionally, the integration of a label graph network with evidential learning theory enables the model to explicitly capture the relationships between quality grades and influencing factors, thereby improving both robustness and accuracy. This approach facilitates interpretable analysis by directing the model’s focus toward relevant image features and reducing interference from lesion-like artifacts. Furthermore, the incorporation of evidential learning theory serves to quantify the uncertainty inherent in quality ratings, thereby ensuring the trustworthiness of the assessments. Trained and tested on a dataset of 6677 ROP images across three quality levels (i.e. acceptable, potentially acceptable, and unacceptable), Q-ROP achieved state-of-the-art performance with a 95.82% accuracy. Its effectiveness was further validated in a downstream ROP staging task, where it significantly improved the performance of typical classification models. These results demonstrate Q-ROP’s strong potential as a reliable and robust tool for clinical decision support.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103959"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative data-engine foundation model for universal few-shot 2D vascular image segmentation 生成式数据引擎基础模型的通用少拍二维血管图像分割
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-12 DOI: 10.1016/j.media.2026.103996
Rongjun Ge , Xin Li , Yuxing Liu , Chengliang Liu , Pinzheng Zhang , Jiong Zhang , Jian Yang , Jean-Louis Dillenseger , Chunfeng Yang , Yuting He , Yang Chen
The segmentation of 2D vascular structures via deep learning holds significant clinical value but is hindered by the scarcity of annotated data, severely limiting its widespread application. Developing a universal few-shot vascular segmentation model is highly desirable, yet remains challenging due to the need for extensive training and the inherent complexities of vascular imaging. In this work, we propose UniVG (Generative Data-engine Foundation Model for Universal Few-shot 2D Vascular Image Segmentation), a novel approach that learns the compositionality of vascular images and constructing a generative foundation model for robust vascular segmentation. UniVG enables the synthesis and learning of diverse and realistic vascular images through two key innovations: 1) Compositional learning for flexible and diverse vascular synthesis: It decomposes and recombines vascular structures with varying morphological features and diverse foreground-background configurations to generate richly diverse synthetic image-label pairs. 2) Few-shot generative adaptation for transferable segmentation: It fine-tunes pre-trained models with minimal annotated data to bridge the gap between synthetic and real vascular domains, synthesizing authentic and diverse vessel images for downstream few-shot vascular segmentation learning. To support our approach, we develop UniVG-58K, a large dataset comprising 58,689 vascular images across five imaging modalities, facilitating robust large-scale generative pre-training. Extensive experiments on 11 vessel segmentation tasks cross 5 modalties (only with 5 labeled images on each task) demonstrate that UniVG achieves performance comparable to fully supervised models, significantly reducing data collection and annotation costs. All code and datasets will be made publicly available at https://github.com/XinAloha/UniVG.
通过深度学习分割二维血管结构具有重要的临床价值,但由于缺乏注释数据,严重限制了其广泛应用。开发一种通用的少镜头血管分割模型是非常可取的,但由于需要广泛的训练和血管成像固有的复杂性,仍然具有挑战性。在这项工作中,我们提出了UniVG(通用少镜头2D血管图像分割生成数据引擎基础模型),这是一种学习血管图像的组合性并构建鲁棒血管分割生成基础模型的新方法。UniVG通过两个关键创新实现了多样化和逼真的血管图像的合成和学习:1)灵活多样血管合成的合成学习:它分解和重组具有不同形态特征和不同前景-背景配置的血管结构,生成丰富多样的合成图像标签对。2)针对可转移分割的小片段生成适应:利用最小的注释数据对预训练模型进行微调,弥合合成血管域与真实血管域之间的差距,为下游小片段血管分割学习合成真实多样的血管图像。为了支持我们的方法,我们开发了UniVG-58K,这是一个包含58,689张血管图像的大型数据集,跨越五种成像模式,促进了强大的大规模生成预训练。在跨5种模式的11个血管分割任务(每个任务只有5个标记图像)上进行的大量实验表明,UniVG实现了与完全监督模型相当的性能,显著降低了数据收集和注释成本。所有代码和数据集将在https://github.com/XinAloha/UniVG上公开提供。
{"title":"Generative data-engine foundation model for universal few-shot 2D vascular image segmentation","authors":"Rongjun Ge ,&nbsp;Xin Li ,&nbsp;Yuxing Liu ,&nbsp;Chengliang Liu ,&nbsp;Pinzheng Zhang ,&nbsp;Jiong Zhang ,&nbsp;Jian Yang ,&nbsp;Jean-Louis Dillenseger ,&nbsp;Chunfeng Yang ,&nbsp;Yuting He ,&nbsp;Yang Chen","doi":"10.1016/j.media.2026.103996","DOIUrl":"10.1016/j.media.2026.103996","url":null,"abstract":"<div><div>The segmentation of 2D vascular structures via deep learning holds significant clinical value but is hindered by the scarcity of annotated data, severely limiting its widespread application. Developing a universal few-shot vascular segmentation model is highly desirable, yet remains challenging due to the need for extensive training and the inherent complexities of vascular imaging. In this work, we propose <strong>UniVG</strong> (Generative Data-engine Foundation Model for Universal Few-shot 2D Vascular Image Segmentation), a novel approach that learns the compositionality of vascular images and constructing a generative foundation model for robust vascular segmentation. UniVG enables the synthesis and learning of diverse and realistic vascular images through two key innovations: <em>1) Compositional learning</em> for flexible and diverse vascular synthesis: It decomposes and recombines vascular structures with varying morphological features and diverse foreground-background configurations to generate richly diverse synthetic image-label pairs. <em>2) Few-shot generative adaptation</em> for transferable segmentation: It fine-tunes pre-trained models with minimal annotated data to bridge the gap between synthetic and real vascular domains, synthesizing authentic and diverse vessel images for downstream few-shot vascular segmentation learning. To support our approach, we develop UniVG-58K, a large dataset comprising 58,689 vascular images across five imaging modalities, facilitating robust large-scale generative pre-training. Extensive experiments on 11 vessel segmentation tasks cross 5 modalties (only with 5 labeled images on each task) demonstrate that UniVG achieves performance comparable to fully supervised models, significantly reducing data collection and annotation costs. All code and datasets will be made publicly available at <span><span>https://github.com/XinAloha/UniVG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103996"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146209648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BCIRT: Backscattering-corrected implicit representation tomography BCIRT:后向散射校正隐式表示层析成像
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-19 DOI: 10.1016/j.media.2026.104000
Chuanhao Zhang , Yangxi Li , Jianping Song , Yingwei Fan , Guochen Ning , Yu Shen , Canhong Xiang , Fang Chen , Hongen Liao
Optical coherence tomography (OCT) A-scan backscattering signals provide depth-resolved textural information about internal structures. However, conventional OCT imaging is limited by refraction-induced distortion and speckle noise, hindering fine detail resolution. While multi-angle imaging systems alleviate these issues through incoherent compounding of backscattering signals, in vivo applications face challenges: limited angular coverage during surface scanning degrades backscatter intensity compounding quality, and the absence of angular information introduces artifacts in multi-view position-intensity alignment. Furthermore, excessive smoothing during speckle suppression obscures fine textures. Consequently, reconstructing ultra-fine structures from limited-angle, sparse-view measurements remains a critical challenge. To address this, we present Backscattering-Corrected Implicit Representation Tomography (BCIRT), a framework for reconstructing multi-angle low-coherence signals. We also develop a dedicated limited-angle imaging system for intraoperative BCIRT deployment. BCIRT formulates cross-view backscattering signals as a continuous function of spatial position, utilizing implicit neural representation (INR) for fitting. A physics-informed iterative mechanism inversely models ray propagation to determine corrected ray paths, enhancing the neural representation’s robustness against distortions. Leveraging these corrected paths, we introduce a dual dynamic line mixer and a contrastive-guided discriminative deblurring module to achieve high-resolution microstructure reconstruction with reduced speckle noise. Extensive experiments on biological samples and surgical resected samples demonstrate that our method achieves state-of-the-art performance, highlighting its potential for clinical applications and biomedical research.
光学相干断层扫描(OCT)的a扫描后向散射信号提供了关于内部结构的深度分辨纹理信息。然而,传统的OCT成像受到折射引起的畸变和散斑噪声的限制,阻碍了精细细节的分辨率。虽然多角度成像系统通过后向散射信号的非相干复合缓解了这些问题,但在体内应用面临挑战:表面扫描时有限的角度覆盖会降低后向散射强度复合质量,并且缺乏角度信息会在多视图位置强度对齐中引入伪影。此外,在斑点抑制过程中过度平滑会模糊精细纹理。因此,从有限角度、稀疏视图测量重建超精细结构仍然是一个关键挑战。为了解决这个问题,我们提出了反向散射校正隐式表示层析成像(BCIRT),这是一种用于重建多角度低相干信号的框架。我们还开发了一种用于术中BCIRT部署的专用有限角度成像系统。BCIRT将交叉视后向散射信号表述为空间位置的连续函数,利用隐式神经表征(INR)进行拟合。物理信息迭代机制反向模拟射线传播以确定校正的射线路径,增强神经表示对扭曲的鲁棒性。利用这些校正后的路径,我们引入了一个双动态线路混频器和一个对比度引导的判别去模糊模块,以实现高分辨率的微观结构重建,同时降低了散斑噪声。对生物样本和手术切除样本的大量实验表明,我们的方法达到了最先进的性能,突出了其临床应用和生物医学研究的潜力。
{"title":"BCIRT: Backscattering-corrected implicit representation tomography","authors":"Chuanhao Zhang ,&nbsp;Yangxi Li ,&nbsp;Jianping Song ,&nbsp;Yingwei Fan ,&nbsp;Guochen Ning ,&nbsp;Yu Shen ,&nbsp;Canhong Xiang ,&nbsp;Fang Chen ,&nbsp;Hongen Liao","doi":"10.1016/j.media.2026.104000","DOIUrl":"10.1016/j.media.2026.104000","url":null,"abstract":"<div><div>Optical coherence tomography (OCT) A-scan backscattering signals provide depth-resolved textural information about internal structures. However, conventional OCT imaging is limited by refraction-induced distortion and speckle noise, hindering fine detail resolution. While multi-angle imaging systems alleviate these issues through incoherent compounding of backscattering signals, in vivo applications face challenges: limited angular coverage during surface scanning degrades backscatter intensity compounding quality, and the absence of angular information introduces artifacts in multi-view position-intensity alignment. Furthermore, excessive smoothing during speckle suppression obscures fine textures. Consequently, reconstructing ultra-fine structures from limited-angle, sparse-view measurements remains a critical challenge. To address this, we present Backscattering-Corrected Implicit Representation Tomography (BCIRT), a framework for reconstructing multi-angle low-coherence signals. We also develop a dedicated limited-angle imaging system for intraoperative BCIRT deployment. BCIRT formulates cross-view backscattering signals as a continuous function of spatial position, utilizing implicit neural representation (INR) for fitting. A physics-informed iterative mechanism inversely models ray propagation to determine corrected ray paths, enhancing the neural representation’s robustness against distortions. Leveraging these corrected paths, we introduce a dual dynamic line mixer and a contrastive-guided discriminative deblurring module to achieve high-resolution microstructure reconstruction with reduced speckle noise. Extensive experiments on biological samples and surgical resected samples demonstrate that our method achieves state-of-the-art performance, highlighting its potential for clinical applications and biomedical research.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 104000"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146777316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing feature fusion of U-like networks with dynamic skip connections 基于动态跳跃连接的u型网络特征融合研究
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-26 DOI: 10.1016/j.media.2026.104010
Yue Cao, Quansong He, Kaishen Wang, Jianlong Xiong, Zhang Yi, Tao He
U-like networks have become fundamental frameworks in medical image segmentation through skip connections that bridge high-level semantics and low-level spatial details. Despite their success, conventional skip connections exhibit two key limitations: inter-feature constraints and intra-feature constraints. The inter-feature constraint refers to the static nature of feature fusion in traditional skip connections, where information is transmitted along fixed pathways regardless of feature content. The intra-feature constraint arises from the insufficient modeling of multi-scale feature interactions, thereby hindering the effective aggregation of global contextual information. To overcome these limitations, we propose a novel Dynamic Skip Connection (DSC) block that fundamentally enhances cross-layer connectivity through adaptive mechanisms. The DSC block integrates two complementary components: (1) Test-Time Training (TTT) module: This module addresses the inter-feature constraint by enabling dynamic adaptation of hidden representations during inference, facilitating content-aware feature refinement. (2) Dynamic Multi-Scale Kernel (DMSK) module: To mitigate the intra-feature constraint, this module adaptively selects kernel sizes based on global contextual cues, enhancing the network’s capacity for multi-scale feature integration. The DSC block is architecture-agnostic and can be seamlessly incorporated into existing U-like network structures. Extensive experiments demonstrate the plug-and-play effectiveness of the proposed DSC block across CNN-based, Transformer-based, hybrid CNN-Transformer, and Mamba-based U-like networks. The code is available at https://github.com/BlackJack-Cao/U-like-Networks-with-DSC.
类u网络已经成为医学图像分割的基本框架,通过跳过连接,将高级语义和低级空间细节连接起来。尽管它们取得了成功,但传统的跳线连接存在两个关键限制:特征间约束和特征内约束。特征间约束是指传统跳变连接中特征融合的静态特性,即无论特征内容如何,信息都沿着固定的路径传输。特征内约束源于对多尺度特征交互的建模不足,从而阻碍了全局上下文信息的有效聚合。为了克服这些限制,我们提出了一种新的动态跳过连接(DSC)块,通过自适应机制从根本上增强了跨层连接。DSC块集成了两个互补的组件:(1)测试时间训练(TTT)模块:该模块通过在推理过程中动态适应隐藏表示来解决特征间约束,促进内容感知特征的改进。(2)动态多尺度核(DMSK)模块:为缓解特征内约束,该模块基于全局上下文线索自适应选择核大小,增强网络的多尺度特征集成能力。DSC块与体系结构无关,可以无缝地集成到现有的u型网络结构中。广泛的实验证明了所提出的DSC块在基于cnn、基于transformer、混合CNN-Transformer和基于mamba的U-like网络中的即插即用有效性。代码可在https://github.com/BlackJack-Cao/U-like-Networks-with-DSC上获得。
{"title":"Enhancing feature fusion of U-like networks with dynamic skip connections","authors":"Yue Cao,&nbsp;Quansong He,&nbsp;Kaishen Wang,&nbsp;Jianlong Xiong,&nbsp;Zhang Yi,&nbsp;Tao He","doi":"10.1016/j.media.2026.104010","DOIUrl":"10.1016/j.media.2026.104010","url":null,"abstract":"<div><div>U-like networks have become fundamental frameworks in medical image segmentation through skip connections that bridge high-level semantics and low-level spatial details. Despite their success, conventional skip connections exhibit two key limitations: inter-feature constraints and intra-feature constraints. The inter-feature constraint refers to the static nature of feature fusion in traditional skip connections, where information is transmitted along fixed pathways regardless of feature content. The intra-feature constraint arises from the insufficient modeling of multi-scale feature interactions, thereby hindering the effective aggregation of global contextual information. To overcome these limitations, we propose a novel Dynamic Skip Connection (DSC) block that fundamentally enhances cross-layer connectivity through adaptive mechanisms. The DSC block integrates two complementary components: (1) Test-Time Training (TTT) module: This module addresses the inter-feature constraint by enabling dynamic adaptation of hidden representations during inference, facilitating content-aware feature refinement. (2) Dynamic Multi-Scale Kernel (DMSK) module: To mitigate the intra-feature constraint, this module adaptively selects kernel sizes based on global contextual cues, enhancing the network’s capacity for multi-scale feature integration. The DSC block is architecture-agnostic and can be seamlessly incorporated into existing U-like network structures. Extensive experiments demonstrate the plug-and-play effectiveness of the proposed DSC block across CNN-based, Transformer-based, hybrid CNN-Transformer, and Mamba-based U-like networks. The code is available at <span><span>https://github.com/BlackJack-Cao/U-like-Networks-with-DSC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 104010"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147334541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IUGC: A benchmark of landmark detection in end-to-end intrapartum ultrasound biometry IUGC:端到端分娩时超声生物测量的地标检测基准
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-23 DOI: 10.1016/j.media.2026.103960
Jieyun Bai , Yitong Tang , Xiao Liu , Jiale Hu , Yunda Li , Xufan Chen , Yufeng Wang , Chen Ma , Yunshu Li , Bowen Guo , Jing Jiao , Yi Huang , Kun Wang , Lifei Li , Yuzhang Ma , Xiaoxin Han , Haochen Shao , Zi Yang , Qingchen Liu , Yuchen Hu , Shuo Li
Accurate intrapartum biometry plays a crucial role in monitoring labor progression and preventing complications. However, its clinical application is limited by challenges such as the difficulty in identifying anatomical landmarks and the variability introduced by operator dependency. To overcome these challenges, the Intrapartum Ultrasound Grand Challenge (IUGC) 2025, in collaboration with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), was organized to accelerate the development of automatic measurement techniques for intrapartum ultrasound analysis. The challenge featured a large-scale, multi-center dataset comprising over 32,000 images from 24 hospitals and research institutes. These images were annotated with key anatomical landmarks of the pubic symphysis (PS) and fetal head (FH), along with the corresponding biometric parameter-the angle of progression (AoP). Ten participating teams proposed a variety of end-to-end and semi-supervised frameworks, incorporating advanced strategies such as foundation model distillation, pseudo-label refinement, anatomical segmentation guidance, and ensemble learning. A comprehensive evaluation revealed that the winning team achieved superior accuracy, with a Mean Radial Error (MRE) of 6.53 ± 4.38 pixels for the right PS landmark, 8.60 ± 5.06 pixels for the left PS landmark, 19.90 ± 17.55 pixels for the FH tangent landmark, and an absolute AoP difference of 3.81 ± 3.12° This top-performing method demonstrated accuracy comparable to expert sonographers, emphasizing the clinical potential of automated intrapartum ultrasound analysis. However, challenges remain, such as the trade-off between accuracy and computational efficiency, the lack of segmentation labels and video data, and the need for extensive multi-center clinical validation. IUGC 2025 thus sets the first benchmark for landmark-based intrapartum biometry estimation and provides an open platform for developing and evaluating real-time, intelligent ultrasound analysis solutions for labor management.
准确的产时生物测量在监测产程和预防并发症中起着至关重要的作用。然而,它的临床应用受到诸如难以识别解剖标志和操作员依赖性引入的可变性等挑战的限制。为了克服这些挑战,在医学图像计算和计算机辅助干预国际会议(MICCAI)的合作下,组织了2025年产房超声大挑战(IUGC),以加速产房超声分析自动测量技术的发展。该挑战的特点是一个大规模的多中心数据集,包括来自24家医院和研究机构的32,000多张图像。这些图像标注了耻骨联合(PS)和胎头(FH)的关键解剖标志,以及相应的生物特征参数-进展角(AoP)。十个参与团队提出了各种端到端和半监督框架,结合了先进的策略,如基础模型蒸馏、伪标签细化、解剖分割指导和集成学习。综合评估显示,获胜团队取得了卓越的准确性,右侧PS标志的平均径向误差(MRE)为6.53±4.38像素,左侧PS标志为8.60±5.06像素,FH切线标志为19.90±17.55像素,绝对AoP差为3.81±3.12°。该方法表现出与专家超声仪相当的准确性,强调了自动产时超声分析的临床潜力。然而,挑战仍然存在,例如准确性和计算效率之间的权衡,缺乏分割标签和视频数据,以及需要广泛的多中心临床验证。因此,IUGC 2025为基于里程碑的产时生物测量估计设定了第一个基准,并为开发和评估用于劳动管理的实时智能超声分析解决方案提供了一个开放平台。
{"title":"IUGC: A benchmark of landmark detection in end-to-end intrapartum ultrasound biometry","authors":"Jieyun Bai ,&nbsp;Yitong Tang ,&nbsp;Xiao Liu ,&nbsp;Jiale Hu ,&nbsp;Yunda Li ,&nbsp;Xufan Chen ,&nbsp;Yufeng Wang ,&nbsp;Chen Ma ,&nbsp;Yunshu Li ,&nbsp;Bowen Guo ,&nbsp;Jing Jiao ,&nbsp;Yi Huang ,&nbsp;Kun Wang ,&nbsp;Lifei Li ,&nbsp;Yuzhang Ma ,&nbsp;Xiaoxin Han ,&nbsp;Haochen Shao ,&nbsp;Zi Yang ,&nbsp;Qingchen Liu ,&nbsp;Yuchen Hu ,&nbsp;Shuo Li","doi":"10.1016/j.media.2026.103960","DOIUrl":"10.1016/j.media.2026.103960","url":null,"abstract":"<div><div>Accurate intrapartum biometry plays a crucial role in monitoring labor progression and preventing complications. However, its clinical application is limited by challenges such as the difficulty in identifying anatomical landmarks and the variability introduced by operator dependency. To overcome these challenges, the Intrapartum Ultrasound Grand Challenge (IUGC) 2025, in collaboration with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), was organized to accelerate the development of automatic measurement techniques for intrapartum ultrasound analysis. The challenge featured a large-scale, multi-center dataset comprising over 32,000 images from 24 hospitals and research institutes. These images were annotated with key anatomical landmarks of the pubic symphysis (PS) and fetal head (FH), along with the corresponding biometric parameter-the angle of progression (AoP). Ten participating teams proposed a variety of end-to-end and semi-supervised frameworks, incorporating advanced strategies such as foundation model distillation, pseudo-label refinement, anatomical segmentation guidance, and ensemble learning. A comprehensive evaluation revealed that the winning team achieved superior accuracy, with a Mean Radial Error (MRE) of 6.53 ± 4.38 pixels for the right PS landmark, 8.60 ± 5.06 pixels for the left PS landmark, 19.90 ± 17.55 pixels for the FH tangent landmark, and an absolute AoP difference of 3.81 ± 3.12° This top-performing method demonstrated accuracy comparable to expert sonographers, emphasizing the clinical potential of automated intrapartum ultrasound analysis. However, challenges remain, such as the trade-off between accuracy and computational efficiency, the lack of segmentation labels and video data, and the need for extensive multi-center clinical validation. IUGC 2025 thus sets the first benchmark for landmark-based intrapartum biometry estimation and provides an open platform for developing and evaluating real-time, intelligent ultrasound analysis solutions for labor management.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103960"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust non-rigid image-to-patient registration for contactless dynamic thoracic tumor localization using recursive deformable diffusion models 基于递归可变形扩散模型的非接触动态胸部肿瘤定位鲁棒非刚性图像-患者配准
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-12 DOI: 10.1016/j.media.2026.103948
Dongyuan Li , Yixin Shan , Yuxuan Mao , Puxun Tu , Haochen Shi , Shenghao Huang , Weiyan Sun , Chang Chen , Xiaojun Chen
Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge-often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion-contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.
可变形的图像到患者的配准对于外科导航和医学成像至关重要,然而跨模式的空间转换的实时计算仍然是一个主要的临床挑战-通常是耗时的,容易出错的,并且可能增加创伤或辐射暴露。虽然最先进的方法在配对医学图像上取得了令人印象深刻的速度和准确性,但它们在跨模态胸部应用中面临明显的局限性,其中生理运动如呼吸使肿瘤定位复杂化。为了解决这个问题,我们提出了一个鲁棒的,非接触的,非刚性的注册框架,用于动态胸部肿瘤定位。通过训练一个高效的递归可变形扩散模型(RDDM),仅从吸气末和呼气末扫描中重建全面的4DCT序列,捕捉反映术中状态的呼吸动力学。对于实时患者对齐,我们引入了一种基于GICP的非接触式非刚性配准算法,利用立体RGB-D成像捕获的患者皮肤表面点云。该方法通过引入法向量约束和扩张收缩约束,增强了鲁棒性,避免了局部极小值。提出的框架在公开可用的数据集和志愿者试验上得到了验证。定量评估显示RDDM在呼吸期的解剖保真度,PSNR为34.01±2.78 dB。此外,我们初步开发了基于4dct的配准和手术导航模块,支持肿瘤定位和高精度跟踪。实验结果表明,该框架初步满足临床需求,并具有整合到下游手术系统的潜力。
{"title":"Robust non-rigid image-to-patient registration for contactless dynamic thoracic tumor localization using recursive deformable diffusion models","authors":"Dongyuan Li ,&nbsp;Yixin Shan ,&nbsp;Yuxuan Mao ,&nbsp;Puxun Tu ,&nbsp;Haochen Shi ,&nbsp;Shenghao Huang ,&nbsp;Weiyan Sun ,&nbsp;Chang Chen ,&nbsp;Xiaojun Chen","doi":"10.1016/j.media.2026.103948","DOIUrl":"10.1016/j.media.2026.103948","url":null,"abstract":"<div><div>Deformable image-to-patient registration is essential for surgical navigation and medical imaging, yet real-time computation of spatial transformations across modalities remains a major clinical challenge-often being time-consuming, error-prone, and potentially increasing trauma or radiation exposure. While state-of-the-art methods achieve impressive speed and accuracy on paired medical images, they face notable limitations in cross-modal thoracic applications, where physiological motions such as respiration complicate tumor localization. To address this, we propose a robust, contactless, non-rigid registration framework for dynamic thoracic tumor localization. A highly efficient Recursive Deformable Diffusion Model (RDDM) is trained to reconstruct comprehensive 4DCT sequences from only end-inhalation and end-exhalation scans, capturing respiratory dynamics reflective of the intraoperative state. For real-time patient alignment, we introduce a contactless non-rigid registration algorithm based on GICP, leveraging patient skin surface point clouds captured by stereo RGB-D imaging. By incorporating normal vector and expansion-contraction constraints, the method enhances robustness and avoids local minima. The proposed framework was validated on publicly available datasets and volunteer trials. Quantitative evaluations demonstrated the RDDM’s anatomical fidelity across respiratory phases, achieving an PSNR of 34.01 ± 2.78 dB. Moreover, we have preliminarily developed a 4DCT-based registration and surgical navigation module to support tumor localization and high-precision tracking. Experimental results indicate that the proposed framework preliminarily meets clinical requirements and demonstrates potential for integration into downstream surgical systems.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103948"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating synthetic MRI scans for improving Alzheimer’s disease diagnosis 合成MRI扫描提高阿尔茨海默病诊断
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-23 DOI: 10.1016/j.media.2026.103947
Rosanna Turrisi, Giuseppe Patané
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Magnetic Resonance Imaging (MRI) combined with Machine Learning (ML) enables early diagnosis, but ML models often underperform when trained on small, heterogeneous medical datasets. Transfer Learning (TL) helps mitigate this limitation, yet models pre-trained on 2D natural images still fall short of those trained directly on related 3D MRI data. To address this gap, we introduce an intermediate strategy based on synthetic data generation. Specifically, we propose a conditional Denoising Diffusion Probabilistic Model (DDPM) to synthesise 2D projections (axial, coronal, sagittal) of brain MRI scans across three clinical groups: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. A total of 9000 synthetic images are used for pre-training 2D models, which are subsequently extended to 3D via axial, coronal, and sagittal convolutions and fine-tuned on real-world small datasets. Our method achieves 91.3% accuracy in binary (CN vs. AD) and 74.5% in three-class (CN/MCI/AD) classification on the 3T ADNI dataset, outperforming both models trained from scratch and those pre-trained on ImageNet. Our 2D ADnet achieved state-of-the-art performance on OASIS-2 (59.3% accuracy, 57.6% F1), surpassing all competitor models and confirming the robustness of synthetic data pre-training. These results show synthetic diffusion-based pre-training as a promising bridge between natural image TL and medical MRI data.
阿尔茨海默病(AD)是一种进行性神经退行性疾病,也是痴呆症的主要原因。磁共振成像(MRI)与机器学习(ML)相结合可以实现早期诊断,但ML模型在小型异构医疗数据集上训练时往往表现不佳。迁移学习(TL)有助于缓解这一限制,但在2D自然图像上预训练的模型仍然不如直接在相关3D MRI数据上训练的模型。为了解决这一差距,我们引入了一种基于合成数据生成的中间策略。具体来说,我们提出了一个条件去噪扩散概率模型(DDPM)来合成三个临床组的脑MRI扫描的二维投影(轴向、冠状、矢状):认知正常(CN)、轻度认知障碍(MCI)和AD。总共有9000张合成图像用于预训练2D模型,随后通过轴向、冠状和矢状卷积扩展到3D,并在现实世界的小数据集上进行微调。在3T ADNI数据集上,我们的方法在二元分类(CN vs. AD)和三类分类(CN/ mci /AD)上的准确率分别达到91.3%和74.5%,优于从头训练的模型和在ImageNet上预训练的模型。我们的2D ADnet在OASIS-2上取得了最先进的性能(59.3%的准确率,57.6%的F1),超过了所有竞争对手的模型,并证实了合成数据预训练的鲁棒性。这些结果表明,基于合成扩散的预训练是连接自然图像TL和医学MRI数据的一个很有前途的桥梁。
{"title":"Generating synthetic MRI scans for improving Alzheimer’s disease diagnosis","authors":"Rosanna Turrisi,&nbsp;Giuseppe Patané","doi":"10.1016/j.media.2026.103947","DOIUrl":"10.1016/j.media.2026.103947","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the leading cause of dementia. Magnetic Resonance Imaging (MRI) combined with Machine Learning (ML) enables early diagnosis, but ML models often underperform when trained on small, heterogeneous medical datasets. Transfer Learning (TL) helps mitigate this limitation, yet models pre-trained on 2D natural images still fall short of those trained directly on related 3D MRI data. To address this gap, we introduce an intermediate strategy based on synthetic data generation. Specifically, we propose a conditional Denoising Diffusion Probabilistic Model (DDPM) to synthesise 2D projections (axial, coronal, sagittal) of brain MRI scans across three clinical groups: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. A total of 9000 synthetic images are used for pre-training 2D models, which are subsequently extended to 3D via axial, coronal, and sagittal convolutions and fine-tuned on real-world small datasets. Our method achieves 91.3% accuracy in binary (CN vs. AD) and 74.5% in three-class (CN/MCI/AD) classification on the 3T ADNI dataset, outperforming both models trained from scratch and those pre-trained on ImageNet. Our 2D ADnet achieved state-of-the-art performance on OASIS-2 (59.3% accuracy, 57.6% F1), surpassing all competitor models and confirming the robustness of synthetic data pre-training. These results show synthetic diffusion-based pre-training as a promising bridge between natural image TL and medical MRI data.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103947"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diversity-driven MG-MAE: Multi-granularity representation learning for non-salient object segmentation 多样性驱动的MG-MAE:非显著目标分割的多粒度表示学习
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-06 DOI: 10.1016/j.media.2026.103971
Chengjin Yu , Bin Zhang , Chenchu Xu , Dongsheng Ruan , Rui Wang , Huafeng Liu , Xiaohu Li , Shuo Li
Masked Autoencoders (MAEs) have grown increasingly prominent as a powerful self-supervised learning paradigm. They are capable of effectively leveraging inherent image prior information and are gaining traction in the field of medical image analysis. However, their application to feature representations of the non-salient objects, such as microvasculature, accessory organs, and early-stage tumors–is fundamentally limited by dimensional collapse problem, which diminishes feature diversity critical for non-salient structure discrimination. To address this, we propose a Multi-Granularity Masked Autoencoder (MG-MAE) framework for feature diversity learning: (1) We extend the conventional MAE into a multi-granularity framework, a global branch reconstructs global pixels, with a local branch recovering Histogram of Oriented Gradients (HOG) features, enabling hierarchical representation of both coarse-grained and fine-grained patterns; (2) Critically, in the local branch, a diversity-enhanced loss function incorporating Nuclear Norm Maximization (NNM) constraint to explicitly mitigate feature space collapse through orthogonal embedding regularization; and (3) A Dynamic Weight Adjustment (DWA) strategy that dynamically prioritizes hard-to-reconstruct regions via entropy-driven gradient modulation. Comprehensive evaluations across five clinical benchmarks–CCTA139, BTCV, LiTS, ACDC, and MSD Pancreas Tumour datasets–demonstrate that MG-MAE achieves statistically significant improvements in Dice Similarity Coefficient (DSC) scores for non-salient object segmentation, outperforming state-of-the-art methods. The code is available at https://github.com/zhangbbin/mgmae.
蒙面自编码器(MAEs)作为一种强大的自监督学习范式已经变得越来越突出。它们能够有效地利用固有的图像先验信息,并在医学图像分析领域获得牵引力。然而,它们在非显著物体(如微血管、附属器官和早期肿瘤)特征表征中的应用,从根本上受到了维度塌陷问题的限制,这降低了对非显著结构识别至关重要的特征多样性。为了解决这个问题,我们提出了一个用于特征多样性学习的多粒度掩膜自动编码器(MG-MAE)框架:(1)我们将传统的MAE扩展到一个多粒度框架,一个全局分支重建全局像素,一个局部分支恢复定向梯度直方图(HOG)特征,实现粗粒度和细粒度模式的分层表示;(2)关键的是,在局部分支中,引入核范数最大化(NNM)约束的多样性增强损失函数通过正交嵌入正则化显式减轻特征空间崩溃;(3)基于熵驱动梯度调制的动态权重调整(DWA)策略,对难以重构的区域进行动态优先排序。五个临床基准(ccta139、BTCV、LiTS、ACDC和MSD胰腺肿瘤数据集)的综合评估表明,MG-MAE在非显著目标分割的骰子相似系数(DSC)得分方面取得了统计学上显著的改善,优于最先进的方法。代码可在https://github.com/zhangbbin/mgmae上获得。
{"title":"Diversity-driven MG-MAE: Multi-granularity representation learning for non-salient object segmentation","authors":"Chengjin Yu ,&nbsp;Bin Zhang ,&nbsp;Chenchu Xu ,&nbsp;Dongsheng Ruan ,&nbsp;Rui Wang ,&nbsp;Huafeng Liu ,&nbsp;Xiaohu Li ,&nbsp;Shuo Li","doi":"10.1016/j.media.2026.103971","DOIUrl":"10.1016/j.media.2026.103971","url":null,"abstract":"<div><div>Masked Autoencoders (MAEs) have grown increasingly prominent as a powerful self-supervised learning paradigm. They are capable of effectively leveraging inherent image prior information and are gaining traction in the field of medical image analysis. However, their application to feature representations of the non-salient objects, such as microvasculature, accessory organs, and early-stage tumors–is fundamentally limited by dimensional collapse problem, which diminishes feature diversity critical for non-salient structure discrimination. To address this, we propose a Multi-Granularity Masked Autoencoder (MG-MAE) framework for feature diversity learning: (1) We extend the conventional MAE into a multi-granularity framework, a global branch reconstructs global pixels, with a local branch recovering Histogram of Oriented Gradients (HOG) features, enabling hierarchical representation of both coarse-grained and fine-grained patterns; (2) Critically, in the local branch, a diversity-enhanced loss function incorporating Nuclear Norm Maximization (NNM) constraint to explicitly mitigate feature space collapse through orthogonal embedding regularization; and (3) A Dynamic Weight Adjustment (DWA) strategy that dynamically prioritizes hard-to-reconstruct regions via entropy-driven gradient modulation. Comprehensive evaluations across five clinical benchmarks–CCTA139, BTCV, LiTS, ACDC, and MSD Pancreas Tumour datasets–demonstrate that MG-MAE achieves statistically significant improvements in Dice Similarity Coefficient (DSC) scores for non-salient object segmentation, outperforming state-of-the-art methods. The code is available at <span><span>https://github.com/zhangbbin/mgmae</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103971"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146134049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSFNet: Dual-source and spatiotemporal-feature fusion network for bedside diagnosis of lung injuries with electrical impedance tomography DSFNet:用于电阻抗断层扫描肺损伤床边诊断的双源和时空特征融合网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-02-17 DOI: 10.1016/j.media.2026.104003
Zhiwei Li , Yang Wu , Kai Liu , Yingqi Zhang , Bai Chen , Hao Wang , Jiafeng Yao
Electrical Impedance Tomography (EIT) is a promising tool for non-invasive and real-time lung monitoring, but the data heterogeneity and low spatial resolution limit its ability to diagnose lung injuries. To address these challenges, we propose DSFNet, a dual-source and spatiotemporal-feature fusion network that integrates EIT spatiotemporal boundary voltages and ventilation images to classify four lung conditions, including healthy (HE), pneumothorax (PN), pleural effusion (PE), and pneumonia (PM). The temporal dynamics modeling (TDM) module and multi-head self-attention (MHSA) module are designed to improve the temporal feature extraction and representation of DSFNet. We construct a novel EIT simulation dataset describing pathological respiratory patterns and introduce a hybrid data learning strategy that combines simulation data (SD) and experimental data (ED) to address the small sample problem and improve the accuracy of model classification. The DSFNet trained with the SD + 25 % ED pattern achieved an accuracy of 97.78 % and 96.55 % on the dynamic phantom dataset and the clinical human dataset, respectively, demonstrating its excellent performance and robustness. The SHAP analysis further revealed the feature contributions of the input data. This study provides an effective approach for bedside lung injury diagnosis based on multi-source EIT data.
电阻抗断层扫描(EIT)是一种很有前途的无创实时肺监测工具,但数据的异质性和低空间分辨率限制了其诊断肺损伤的能力。为了解决这些挑战,我们提出了DSFNet,这是一个双源和时空特征融合网络,整合了EIT时空边界电压和通气图像,用于分类四种肺部疾病,包括健康(HE)、气胸(PN)、胸腔积液(PE)和肺炎(PM)。设计了时间动态建模(TDM)模块和多头自注意(MHSA)模块来改进DSFNet的时间特征提取和表示。我们构建了一个描述病理呼吸模式的新型EIT模拟数据集,并引入了一种结合模拟数据(SD)和实验数据(ED)的混合数据学习策略,以解决小样本问题,提高模型分类的准确性。用SD + 25% ED模式训练的DSFNet在动态幻影数据集和临床人体数据集上的准确率分别达到97.78%和96.55%,显示了其优异的性能和鲁棒性。SHAP分析进一步揭示了输入数据的特征贡献。本研究为基于多源EIT数据的床边肺损伤诊断提供了有效的方法。
{"title":"DSFNet: Dual-source and spatiotemporal-feature fusion network for bedside diagnosis of lung injuries with electrical impedance tomography","authors":"Zhiwei Li ,&nbsp;Yang Wu ,&nbsp;Kai Liu ,&nbsp;Yingqi Zhang ,&nbsp;Bai Chen ,&nbsp;Hao Wang ,&nbsp;Jiafeng Yao","doi":"10.1016/j.media.2026.104003","DOIUrl":"10.1016/j.media.2026.104003","url":null,"abstract":"<div><div>Electrical Impedance Tomography (EIT) is a promising tool for non-invasive and real-time lung monitoring, but the data heterogeneity and low spatial resolution limit its ability to diagnose lung injuries. To address these challenges, we propose DSFNet, a dual-source and spatiotemporal-feature fusion network that integrates EIT spatiotemporal boundary voltages and ventilation images to classify four lung conditions, including healthy (HE), pneumothorax (PN), pleural effusion (PE), and pneumonia (PM). The temporal dynamics modeling (TDM) module and multi-head self-attention (MHSA) module are designed to improve the temporal feature extraction and representation of DSFNet. We construct a novel EIT simulation dataset describing pathological respiratory patterns and introduce a hybrid data learning strategy that combines simulation data (SD) and experimental data (ED) to address the small sample problem and improve the accuracy of model classification. The DSFNet trained with the SD + 25 % ED pattern achieved an accuracy of 97.78 % and 96.55 % on the dynamic phantom dataset and the clinical human dataset, respectively, demonstrating its excellent performance and robustness. The SHAP analysis further revealed the feature contributions of the input data. This study provides an effective approach for bedside lung injury diagnosis based on multi-source EIT data.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 104003"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146777317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification 时空解耦的多模态稀疏融合变压器网络用于乳腺肿瘤分类
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-01 Epub Date: 2026-01-28 DOI: 10.1016/j.media.2026.103966
Jiahao Xu , Shuxin Zhuang , Yi He , Haolin Wang , Zhemin Zhuang , Huancheng Zeng
Accurate analysis of tumor morphology, vascularity, and tissue stiffness under multimodal ultrasound imaging plays a critical role in the diagnosis of breast cancer. However, manual interpretation across multiple modalities is time-consuming and heavily dependent on the radiologist’s expertise. Computer-aided classification offers an efficient alternative, yet remains challenging due to significant modality heterogeneity, inconsistent image quality, and redundant information across modalities. To address these issues, we propose a novel Multimodal Sparse Fusion Transformer Network (MSFT-Net). First, a Spatio-Temporal Decoupling Attention architecture (STDA) is introduced to disentangle and extract dynamic and static features from different modalities along spatial and temporal dimensions, capturing modality-specific motion and morphological characteristics independently. Second, the Mixed-Scale Convolution Module (MSCM) obtains tumor features at multiple scales, enhancing geometric detail representation and improving receptive field coverage. Third, the Sparse Cross-Attention Module (SCAM) adaptively retains the most effective query-key interactions between modalities, thereby facilitating the aggregation of high-quality features for robust multimodal information fusion. MSFT-Net is trained and tested on a curated dataset comprising multimodal breast tumor videos collected from 458 patients, including ultrasound (US), superb microvascular imaging (SMI), and strain elastography (SE), and its generalizability is further validated on the public BraTS'21 MRI dataset. Extensive experiments demonstrate that MSFT-Net achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and reliable support for radiologists in diagnostic tasks.
在多模态超声成像下准确分析肿瘤形态、血管分布和组织刚度对乳腺癌的诊断起着至关重要的作用。然而,跨多种模式的人工解释是耗时的,并且严重依赖于放射科医生的专业知识。计算机辅助分类提供了一种有效的选择,但由于显著的模态异质性、不一致的图像质量和模态之间的冗余信息,仍然具有挑战性。为了解决这些问题,我们提出了一种新的多模态稀疏融合变压器网络(MSFT-Net)。首先,引入时空解耦注意力架构(STDA),沿时空维度从不同模态中分离和提取动态和静态特征,独立捕获模态特定的运动和形态特征;其次,混合尺度卷积模块(MSCM)在多个尺度上获取肿瘤特征,增强几何细节表征,提高感受野覆盖率;第三,稀疏交叉关注模块(SCAM)自适应保留模式之间最有效的查询键交互,从而促进高质量特征的聚合,用于鲁棒多模式信息融合。MSFT-Net在一个精心设计的数据集上进行训练和测试,该数据集包括从458名患者收集的多模态乳房肿瘤视频,包括超声(US)、一流微血管成像(SMI)和应变弹性成像(SE),并在公共BraTS的21个MRI数据集上进一步验证了其泛化性。大量的实验表明,MSFT-Net在多模式乳腺肿瘤分类方面的表现优于最先进的方法,为放射科医生的诊断任务提供了快速可靠的支持。
{"title":"Multimodal sparse fusion transformer network with spatio-temporal decoupling for breast tumor classification","authors":"Jiahao Xu ,&nbsp;Shuxin Zhuang ,&nbsp;Yi He ,&nbsp;Haolin Wang ,&nbsp;Zhemin Zhuang ,&nbsp;Huancheng Zeng","doi":"10.1016/j.media.2026.103966","DOIUrl":"10.1016/j.media.2026.103966","url":null,"abstract":"<div><div>Accurate analysis of tumor morphology, vascularity, and tissue stiffness under multimodal ultrasound imaging plays a critical role in the diagnosis of breast cancer. However, manual interpretation across multiple modalities is time-consuming and heavily dependent on the radiologist’s expertise. Computer-aided classification offers an efficient alternative, yet remains challenging due to significant modality heterogeneity, inconsistent image quality, and redundant information across modalities. To address these issues, we propose a novel Multimodal Sparse Fusion Transformer Network (MSFT-Net). First, a Spatio-Temporal Decoupling Attention architecture (STDA) is introduced to disentangle and extract dynamic and static features from different modalities along spatial and temporal dimensions, capturing modality-specific motion and morphological characteristics independently. Second, the Mixed-Scale Convolution Module (MSCM) obtains tumor features at multiple scales, enhancing geometric detail representation and improving receptive field coverage. Third, the Sparse Cross-Attention Module (SCAM) adaptively retains the most effective query-key interactions between modalities, thereby facilitating the aggregation of high-quality features for robust multimodal information fusion. MSFT-Net is trained and tested on a curated dataset comprising multimodal breast tumor videos collected from 458 patients, including ultrasound (US), superb microvascular imaging (SMI), and strain elastography (SE), and its generalizability is further validated on the public BraTS'21 MRI dataset. Extensive experiments demonstrate that MSFT-Net achieves superior performance in multimodal breast tumor classification compared to state-of-the-art methods, providing fast and reliable support for radiologists in diagnostic tasks.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103966"},"PeriodicalIF":11.8,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146072192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1