首页 > 最新文献

Medical image analysis最新文献

英文 中文
DTG: Dual transformers-based generative adversarial networks for retinal 2D/3D OCT image classification 基于双变换的生成对抗网络用于视网膜2D/3D OCT图像分类
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.media.2025.103915
Badr Ait Hammou , Renaud Duval , Marie-Carole Boucher , Farida Cheriet
The automated identification of retinal disorders is one of the most popular real-world computer vision applications related to ophthalmology. It has several advantages and can help ophthalmologists identify diseases more accurately. Technically, it represents a retinal data classification problem. With the recent advances in Artificial Intelligence (AI) technologies, Transformer-based architectures have become powerful models commonly used for solving a wide range of tasks such as image classification. In general, even though Transformers have demonstrated excellent performance compared to existing cutting-edge models, they are data-hungry architectures and still need to perform better in automated medical diagnosis applications.
In this paper, we propose a deep learning architecture named Dual Transformers-based Generative Adversarial Networks (DTG). It is designed for Optical Coherence Tomography (OCT) data classification. It adopts the Vision Transformer and Multiscale Vision Transformer to encode retinal 2D OCT images (i.e., B-scans) and 3D OCT images (i.e., OCT sequence of B-scans). Then, it employs a proposed Generative Adversarial Networks (GAN) architecture to infer high-quality semantic data representations. Next, it increases the training data by taking advantage of our proposed patient instance-based data augmentation technique. Finally, a weighted classifier analyzes the data and performs the retinal disease classification task. Extensive experiments are carried out on two real-world OCT datasets. The experimental results prove that our proposed approach DTG surpasses several competitors in terms of classification accuracy, precision, recall, f1-score, quadratic weighted kappa, AUC-PR, and AUC-ROC. In particular, it performs better than popular Convolutional Neural Networks and Transformers used for 2D image and 3D OCT image classification. Furthermore, it can improve the performance of several existing works for retinal data classification.
视网膜疾病的自动识别是现实世界中与眼科相关的最流行的计算机视觉应用之一。它有几个优点,可以帮助眼科医生更准确地识别疾病。从技术上讲,它代表了一个视网膜数据分类问题。随着人工智能(AI)技术的最新进展,基于变压器的架构已经成为强大的模型,通常用于解决诸如图像分类等广泛的任务。总的来说,尽管与现有的尖端模型相比,transformer已经展示了出色的性能,但它们是数据密集型架构,在自动医疗诊断应用程序中仍然需要表现得更好。在本文中,我们提出了一种深度学习架构,称为基于双变换的生成对抗网络(DTG)。它是为光学相干层析成像(OCT)数据分类而设计的。采用Vision Transformer和Multiscale Vision Transformer对视网膜二维OCT图像(即b扫描)和三维OCT图像(即b扫描的OCT序列)进行编码。然后,它采用了一种生成对抗网络(GAN)架构来推断高质量的语义数据表示。接下来,它利用我们提出的基于患者实例的数据增强技术来增加训练数据。最后,使用加权分类器对数据进行分析,完成视网膜疾病分类任务。在两个真实OCT数据集上进行了广泛的实验。实验结果表明,本文提出的DTG方法在分类准确率、精密度、召回率、f1-score、二次加权kappa、AUC-PR、AUC-ROC等方面均优于其他几种方法。特别是,它比流行的卷积神经网络和用于2D图像和3D OCT图像分类的transformer表现得更好。此外,它还可以提高现有几种视网膜数据分类工作的性能。
{"title":"DTG: Dual transformers-based generative adversarial networks for retinal 2D/3D OCT image classification","authors":"Badr Ait Hammou ,&nbsp;Renaud Duval ,&nbsp;Marie-Carole Boucher ,&nbsp;Farida Cheriet","doi":"10.1016/j.media.2025.103915","DOIUrl":"10.1016/j.media.2025.103915","url":null,"abstract":"<div><div>The automated identification of retinal disorders is one of the most popular real-world computer vision applications related to ophthalmology. It has several advantages and can help ophthalmologists identify diseases more accurately. Technically, it represents a retinal data classification problem. With the recent advances in Artificial Intelligence (AI) technologies, Transformer-based architectures have become powerful models commonly used for solving a wide range of tasks such as image classification. In general, even though Transformers have demonstrated excellent performance compared to existing cutting-edge models, they are data-hungry architectures and still need to perform better in automated medical diagnosis applications.</div><div>In this paper, we propose a deep learning architecture named Dual Transformers-based Generative Adversarial Networks (DTG). It is designed for Optical Coherence Tomography (OCT) data classification. It adopts the Vision Transformer and Multiscale Vision Transformer to encode retinal 2D OCT images (i.e., B-scans) and 3D OCT images (i.e., OCT sequence of B-scans). Then, it employs a proposed Generative Adversarial Networks (GAN) architecture to infer high-quality semantic data representations. Next, it increases the training data by taking advantage of our proposed patient instance-based data augmentation technique. Finally, a weighted classifier analyzes the data and performs the retinal disease classification task. Extensive experiments are carried out on two real-world OCT datasets. The experimental results prove that our proposed approach DTG surpasses several competitors in terms of classification accuracy, precision, recall, f1-score, quadratic weighted kappa, AUC-PR, and AUC-ROC. In particular, it performs better than popular Convolutional Neural Networks and Transformers used for 2D image and 3D OCT image classification. Furthermore, it can improve the performance of several existing works for retinal data classification.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103915"},"PeriodicalIF":11.8,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145784488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Template-guided reconstruction of pulmonary segments with neural implicit functions 模板引导下的肺段神经隐式重建
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.media.2025.103916
Kangxian Xie , Yufei Zhu , Kaiming Kuang , Li Zhang , Hongwei Bran Li , Mingchen Gao , Jiancheng Yang
High-quality 3D reconstruction of pulmonary segments plays a crucial role in segmentectomy and surgical planning for the treatment of lung cancer. Due to the resolution requirement of the target reconstruction, conventional deep learning-based methods often suffer from computational resource constraints or limited granularity. Conversely, implicit modeling is favored due to its computational efficiency and continuous representation at any resolution. We propose a neural implicit function-based method to learn a 3D surface to achieve anatomy-aware, precise pulmonary segment reconstruction, represented as a shape by deforming a learnable template. Additionally, we introduce two clinically relevant evaluation metrics to comprehensively assess the quality of the reconstruction. Furthermore, to address the lack of publicly available shape datasets for benchmarking reconstruction algorithms, we developed a shape dataset named Lung3D, which includes the 3D models of 800 labeled pulmonary segments and their corresponding airways, arteries, veins, and intersegmental veins. We demonstrate that the proposed approach outperforms existing methods, providing a new perspective for pulmonary segment reconstruction. Code and data will be available at https://github.com/HINTLab/ImPulSe.
高质量的肺节段三维重建在肺癌的节段切除和手术计划中起着至关重要的作用。由于目标重建对分辨率的要求,传统的基于深度学习的方法往往受到计算资源的限制或粒度的限制。相反,隐式建模由于其计算效率和在任何分辨率下的连续表示而受到青睐。我们提出了一种基于神经隐式函数的方法来学习三维表面,以实现解剖感知,精确的肺段重建,通过变形可学习模板表示为形状。此外,我们引入了两个临床相关的评估指标,以全面评估重建的质量。此外,为了解决缺乏公开可用的形状数据集用于基准重建算法的问题,我们开发了一个名为Lung3D的形状数据集,其中包括800个标记的肺段及其相应的气道、动脉、静脉和节段间静脉的3D模型。我们证明该方法优于现有方法,为肺段重建提供了新的视角。代码和数据可在https://github.com/HINTLab/ImPulSe上获得。
{"title":"Template-guided reconstruction of pulmonary segments with neural implicit functions","authors":"Kangxian Xie ,&nbsp;Yufei Zhu ,&nbsp;Kaiming Kuang ,&nbsp;Li Zhang ,&nbsp;Hongwei Bran Li ,&nbsp;Mingchen Gao ,&nbsp;Jiancheng Yang","doi":"10.1016/j.media.2025.103916","DOIUrl":"10.1016/j.media.2025.103916","url":null,"abstract":"<div><div>High-quality 3D reconstruction of pulmonary segments plays a crucial role in segmentectomy and surgical planning for the treatment of lung cancer. Due to the resolution requirement of the target reconstruction, conventional deep learning-based methods often suffer from computational resource constraints or limited granularity. Conversely, implicit modeling is favored due to its computational efficiency and continuous representation at any resolution. We propose a neural implicit function-based method to learn a 3D surface to achieve anatomy-aware, precise pulmonary segment reconstruction, represented as a shape by deforming a learnable template. Additionally, we introduce two clinically relevant evaluation metrics to comprehensively assess the quality of the reconstruction. Furthermore, to address the lack of publicly available shape datasets for benchmarking reconstruction algorithms, we developed a shape dataset named <em>Lung3D</em>, which includes the 3D models of 800 labeled pulmonary segments and their corresponding airways, arteries, veins, and intersegmental veins. We demonstrate that the proposed approach outperforms existing methods, providing a new perspective for pulmonary segment reconstruction. Code and data will be available at <span><span>https://github.com/HINTLab/ImPulSe</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103916"},"PeriodicalIF":11.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145784490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complex wavelet-based Transformer for neurodevelopmental disorder diagnosis via direct modeling of real and imaginary components 基于实虚分量直接建模的复杂小波变换神经发育障碍诊断
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-14 DOI: 10.1016/j.media.2025.103914
Ah-Yeong Jeong , Da-Woon Heo , Heung-Il Suk
Resting-state functional magnetic resonance imaging (rs-fMRI) measures intrinsic neural activity, and analyzing its frequency-domain characteristics provides insights into brain dynamics. Owing to these properties, rs-fMRI is widely used to investigate brain disorders such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD). Conventional frequency-domain analyses typically rely on the Fourier transform, which lacks flexibility in capturing non-stationary neural signals due to its fixed resolution. Furthermore, these methods primarily utilize only real-valued features, such as the magnitude or phase, derived from complex-valued spectral representations. Consequently, direct modeling of the real and imaginary components, particularly within fMRI analyses, remains largely unexplored, overlooking the distinct and complementary spectral information encoded in these components. To address these limitations, we propose a novel Transformer-based framework that explicitly models the real and imaginary components of continuous wavelet transform (CWT) coefficients from rs-fMRI signals. Our architecture integrates spectral, temporal, and spatial attention modules, employing self- and cross-attention mechanisms to jointly capture intra- and inter-component relationships. Applied to the Autism Brain Imaging Data Exchange (ABIDE)-I and ADHD-200 datasets, our approach achieved state-of-the-art classification performance compared to existing baselines. Comprehensive ablation studies demonstrated the advantages of directly utilizing real and imaginary components over conventional frequency-domain features and validate each module’s contribution. Moreover, attention-based analyses revealed frequency- and region-specific patterns consistent with known neurobiological alterations in ASD and ADHD. These findings highlight that preserving and jointly leveraging the real and imaginary components of CWT-based representations not only enhances diagnostic performance but also provides interpretable insights into neurodevelopmental disorders.
静息状态功能磁共振成像(rs-fMRI)测量内在的神经活动,分析其频域特征提供了对大脑动力学的见解。由于这些特性,rs-fMRI被广泛用于研究大脑疾病,如自闭症谱系障碍(ASD)和注意缺陷多动障碍(ADHD)。传统的频域分析通常依赖于傅里叶变换,由于其固定的分辨率,在捕获非平稳神经信号方面缺乏灵活性。此外,这些方法主要只利用实值特征,如从复值光谱表示中得到的幅度或相位。因此,真实和虚构成分的直接建模,特别是在功能磁共振成像分析中,在很大程度上仍未被探索,忽略了这些成分中编码的独特和互补的光谱信息。为了解决这些限制,我们提出了一个新的基于变压器的框架,该框架明确地从rs-fMRI信号中模拟连续小波变换(CWT)系数的实分量和虚分量。我们的架构集成了光谱、时间和空间注意模块,采用自注意和交叉注意机制来共同捕获组件内部和组件间的关系。应用于自闭症脑成像数据交换(ABIDE)- 1和ADHD-200数据集,与现有基线相比,我们的方法取得了最先进的分类性能。综合烧蚀研究表明,与传统的频域特征相比,直接利用实分量和虚分量具有优势,并验证了每个模块的贡献。此外,基于注意力的分析揭示了频率和区域特异性模式,与已知的ASD和ADHD的神经生物学改变相一致。这些发现强调,保留和共同利用基于cwt的表征的真实和虚构成分不仅提高了诊断性能,而且为神经发育障碍提供了可解释的见解。
{"title":"Complex wavelet-based Transformer for neurodevelopmental disorder diagnosis via direct modeling of real and imaginary components","authors":"Ah-Yeong Jeong ,&nbsp;Da-Woon Heo ,&nbsp;Heung-Il Suk","doi":"10.1016/j.media.2025.103914","DOIUrl":"10.1016/j.media.2025.103914","url":null,"abstract":"<div><div>Resting-state functional magnetic resonance imaging (rs-fMRI) measures intrinsic neural activity, and analyzing its frequency-domain characteristics provides insights into brain dynamics. Owing to these properties, rs-fMRI is widely used to investigate brain disorders such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD). Conventional frequency-domain analyses typically rely on the Fourier transform, which lacks flexibility in capturing non-stationary neural signals due to its fixed resolution. Furthermore, these methods primarily utilize only real-valued features, such as the magnitude or phase, derived from complex-valued spectral representations. Consequently, direct modeling of the real and imaginary components, particularly within fMRI analyses, remains largely unexplored, overlooking the distinct and complementary spectral information encoded in these components. To address these limitations, we propose a novel Transformer-based framework that explicitly models the real and imaginary components of continuous wavelet transform (CWT) coefficients from rs-fMRI signals. Our architecture integrates spectral, temporal, and spatial attention modules, employing self- and cross-attention mechanisms to jointly capture intra- and inter-component relationships. Applied to the Autism Brain Imaging Data Exchange (ABIDE)-I and ADHD-200 datasets, our approach achieved state-of-the-art classification performance compared to existing baselines. Comprehensive ablation studies demonstrated the advantages of directly utilizing real and imaginary components over conventional frequency-domain features and validate each module’s contribution. Moreover, attention-based analyses revealed frequency- and region-specific patterns consistent with known neurobiological alterations in ASD and ADHD. These findings highlight that preserving and jointly leveraging the real and imaginary components of CWT-based representations not only enhances diagnostic performance but also provides interpretable insights into neurodevelopmental disorders.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103914"},"PeriodicalIF":11.8,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145753384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLIP-Guided Generative network for pathology nuclei image augmentation 用于病理核图像增强的clip引导生成网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-14 DOI: 10.1016/j.media.2025.103908
Yanan Zhang , Qingyang Liu , Qian Chen , Xiangzhi Bai
Nuclei segmentation and classification play a crucial role in the quantitative analysis of computational pathology (CPath). However, the challenge of creating a large volume of labeled pathology nuclei images due to annotation costs has significantly limited the performance of deep learning-based nuclei segmentation methods. Generative data augmentation offers a promising solution by substantially expanding the available training data without additional annotations. In medical image analysis, Generative Adversarial Networks (GANs) were effective for data augmentation, enhancing model performance by generating realistic synthetic data. However, these approaches lack scalability for multi-class data, as nuclei masks cannot provide sufficient information for diverse image generation. Recently, visual-language foundation models, pretrained on large-scale image-caption pairs, have demonstrated robust performance in pathological diagnostic tasks. In this study, we propose a CLIP-guided generative data augmentation method for nuclei segmentation and classification, leveraging the pretrained pathological CLIP text and image encoders in both the generator and discriminator. Specifically, we first create text descriptions by processing paired histopathology images and nuclei masks, which include information such as organ tissue type, cell count, and nuclei types. These paired text descriptions and nuclei masks are then fed into our multi-modal conditional image generator to guide the synthesis of realistic histopathology images. To ensure the quality of synthesized images, we utilize a high-resolution image discriminator and a CLIP image encoder-based discriminator, focusing on both local and global features of histopathology images. The synthetic histopathology images, paired with corresponding nuclei masks, are integrated into the real dataset to train the nuclei segmentation and classification model. Our experiments, conducted on diverse publicly available pathology nuclei datasets, including both qualitative and quantitative analysis, demonstrate the effectiveness of our proposed method. The experimental results of the nuclei segmentation and classification task underscore the advantages of our data augmentation approach. The code is available at https://github.com/zhangyn1415/CGPN-GAN.
细胞核的分割和分类在计算病理(CPath)的定量分析中起着至关重要的作用。然而,由于标注成本的原因,创建大量标记病理核图像的挑战极大地限制了基于深度学习的核分割方法的性能。生成数据增强提供了一个很有前途的解决方案,它可以在不添加额外注释的情况下大量扩展可用的训练数据。在医学图像分析中,生成对抗网络(GANs)可以有效地增强数据,通过生成真实的合成数据来提高模型性能。然而,这些方法缺乏多类数据的可扩展性,因为核掩模不能为不同的图像生成提供足够的信息。最近,在大规模图像标题对上进行预训练的视觉语言基础模型在病理诊断任务中表现出了强大的性能。在这项研究中,我们提出了一种CLIP引导的生成数据增强方法,用于细胞核分割和分类,在生成器和鉴别器中利用预训练的病理CLIP文本和图像编码器。具体来说,我们首先通过处理成对的组织病理学图像和细胞核掩膜来创建文本描述,其中包括器官组织类型、细胞计数和细胞核类型等信息。然后将这些配对的文本描述和核掩模输入到我们的多模态条件图像生成器中,以指导真实组织病理学图像的合成。为了保证合成图像的质量,我们利用高分辨率图像鉴别器和基于CLIP图像编码器的鉴别器,同时关注组织病理学图像的局部和全局特征。将合成的组织病理学图像与相应的细胞核掩模配对,整合到真实数据集中,训练细胞核分割和分类模型。我们在不同的公开可用的病理核数据集上进行的实验,包括定性和定量分析,证明了我们提出的方法的有效性。核分割和分类任务的实验结果强调了我们的数据增强方法的优势。代码可在https://github.com/zhangyn1415/CGPN-GAN上获得。
{"title":"CLIP-Guided Generative network for pathology nuclei image augmentation","authors":"Yanan Zhang ,&nbsp;Qingyang Liu ,&nbsp;Qian Chen ,&nbsp;Xiangzhi Bai","doi":"10.1016/j.media.2025.103908","DOIUrl":"10.1016/j.media.2025.103908","url":null,"abstract":"<div><div>Nuclei segmentation and classification play a crucial role in the quantitative analysis of computational pathology (CPath). However, the challenge of creating a large volume of labeled pathology nuclei images due to annotation costs has significantly limited the performance of deep learning-based nuclei segmentation methods. Generative data augmentation offers a promising solution by substantially expanding the available training data without additional annotations. In medical image analysis, Generative Adversarial Networks (GANs) were effective for data augmentation, enhancing model performance by generating realistic synthetic data. However, these approaches lack scalability for multi-class data, as nuclei masks cannot provide sufficient information for diverse image generation. Recently, visual-language foundation models, pretrained on large-scale image-caption pairs, have demonstrated robust performance in pathological diagnostic tasks. In this study, we propose a CLIP-guided generative data augmentation method for nuclei segmentation and classification, leveraging the pretrained pathological CLIP text and image encoders in both the generator and discriminator. Specifically, we first create text descriptions by processing paired histopathology images and nuclei masks, which include information such as organ tissue type, cell count, and nuclei types. These paired text descriptions and nuclei masks are then fed into our multi-modal conditional image generator to guide the synthesis of realistic histopathology images. To ensure the quality of synthesized images, we utilize a high-resolution image discriminator and a CLIP image encoder-based discriminator, focusing on both local and global features of histopathology images. The synthetic histopathology images, paired with corresponding nuclei masks, are integrated into the real dataset to train the nuclei segmentation and classification model. Our experiments, conducted on diverse publicly available pathology nuclei datasets, including both qualitative and quantitative analysis, demonstrate the effectiveness of our proposed method. The experimental results of the nuclei segmentation and classification task underscore the advantages of our data augmentation approach. The code is available at <span><span>https://github.com/zhangyn1415/CGPN-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103908"},"PeriodicalIF":11.8,"publicationDate":"2025-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145753375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BIASNet: A bidirectional feature alignment and semantics-guided network for weakly-supervised medical image registration BIASNet:用于弱监督医学图像配准的双向特征对齐和语义引导网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.media.2025.103913
Housheng Xie, Xiaoru Gao, Guoyan Zheng
Medical image registration, which establishes spatial correspondences between different medical images, serves as a fundamental process in numerous clinical applications and diagnostic workflows. Despite significant advancement in unsupervised deep learning-based registration methods, these approaches consistently yield suboptimal results compared to their weakly-supervised counterparts. Recent advancements in universal segmentation models have made it easier to obtain anatomical labels from medical images. However, existing registration methods have not fully leveraged the rich anatomical and structural prior information provided by segmentation labels. To address this limitation, we propose a BIdirectional feature Alignment and Semantics-guided Network, referred to as BIASNet, for weakly-supervised image registration. Specifically, starting from multi-scale features extracted from the pre-trained VoCo, fine-tuned using Low-Rank Adaptation (LoRA), we propose a dual-attribute learning scheme, incorporating a novel BIdirectional Alignment and Fusion (BIAF) module for extracting both semantics-wise and intensity-wise features. These two types of features are subsequently fed into a semantics-guided progressive registration framework for accurate deformation field estimation. We further propose an anatomical region deformation consistency learning to regularize the target anatomical regions deformation. Comprehensive experiments conducted on three typical yet challenging datasets demonstrate that our method achieves consistently better results than other state-of-the-art deformable registration approaches. The source code is publicly available at https://github.com/xiehousheng/BIASNet.
医学图像配准是建立不同医学图像之间的空间对应关系,是许多临床应用和诊断工作流程中的基本过程。尽管基于无监督深度学习的注册方法取得了重大进展,但与弱监督的方法相比,这些方法始终产生次优结果。通用分割模型的最新进展使得从医学图像中获得解剖标签变得更加容易。然而,现有的配准方法并没有充分利用分割标签提供的丰富的解剖和结构先验信息。为了解决这一限制,我们提出了一个双向特征对齐和语义引导网络,称为BIASNet,用于弱监督图像配准。具体而言,我们提出了一种双属性学习方案,从预训练的VoCo中提取多尺度特征,使用低秩自适应(Low-Rank Adaptation, LoRA)进行微调,并结合一种新的双向对齐和融合(BIAF)模块来提取语义和强度特征。这两种类型的特征随后被输入到一个语义引导的渐进配准框架中,用于精确的变形场估计。我们进一步提出了一种解剖区域变形一致性学习来规范目标解剖区域的变形。在三个典型但具有挑战性的数据集上进行的综合实验表明,我们的方法始终比其他最先进的可变形配准方法取得更好的结果。源代码可在https://github.com/xiehousheng/BIASNet上公开获得。
{"title":"BIASNet: A bidirectional feature alignment and semantics-guided network for weakly-supervised medical image registration","authors":"Housheng Xie,&nbsp;Xiaoru Gao,&nbsp;Guoyan Zheng","doi":"10.1016/j.media.2025.103913","DOIUrl":"10.1016/j.media.2025.103913","url":null,"abstract":"<div><div>Medical image registration, which establishes spatial correspondences between different medical images, serves as a fundamental process in numerous clinical applications and diagnostic workflows. Despite significant advancement in unsupervised deep learning-based registration methods, these approaches consistently yield suboptimal results compared to their weakly-supervised counterparts. Recent advancements in universal segmentation models have made it easier to obtain anatomical labels from medical images. However, existing registration methods have not fully leveraged the rich anatomical and structural prior information provided by segmentation labels. To address this limitation, we propose a <strong>BI</strong>directional feature <strong>A</strong>lignment and <strong>S</strong>emantics-guided Network, referred to as BIASNet, for weakly-supervised image registration. Specifically, starting from multi-scale features extracted from the pre-trained VoCo, fine-tuned using Low-Rank Adaptation (LoRA), we propose a dual-attribute learning scheme, incorporating a novel BIdirectional Alignment and Fusion (BIAF) module for extracting both semantics-wise and intensity-wise features. These two types of features are subsequently fed into a semantics-guided progressive registration framework for accurate deformation field estimation. We further propose an anatomical region deformation consistency learning to regularize the target anatomical regions deformation. Comprehensive experiments conducted on three typical yet challenging datasets demonstrate that our method achieves consistently better results than other state-of-the-art deformable registration approaches. The source code is publicly available at <span><span>https://github.com/xiehousheng/BIASNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103913"},"PeriodicalIF":11.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reason like a radiologist: Chain-of-thought and reinforcement learning for verifiable report generation 像放射科医生一样的原因:可验证报告生成的思维链和强化学习
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.media.2025.103910
Peiyuan Jing , Kinhei Lee , Zhenxuan Zhang , Huichi Zhou , Zhengqing Yuan , Zhifan Gao , Lei Zhu , Giorgos Papanastasiou , Yingying Fang , Guang Yang
Radiology report generation is critical for efficiency, but current models often lack the structured reasoning of experts and the ability to explicitly ground findings in anatomical evidence, which limits clinical trust and explainability. This paper introduces BoxMed-RL, a unified training framework to generate spatially verifiable and explainable chest X-ray reports. BoxMed-RL advances chest X-ray report generation through two integrated phases: (1) Pretraining Phase. BoxMed-RL learns radiologist-like reasoning through medical concept learning and enforces spatial grounding with reinforcement learning. (2) Downstream Adapter Phase. Pretrained weights are frozen while a lightweight adapter ensures fluency and clinical credibility. Experiments on two widely used public benchmarks (MIMIC-CXR and IU X-Ray) demonstrate that BoxMed-RL achieves an average 7 % improvement in both METEOR and ROUGE-L metrics compared to state-of-the-art methods. An average 5 % improvement in large language model-based metrics further underscores BoxMed-RL’s robustness in generating high-quality reports. Related code and training templates are publicly available at https://github.com/ayanglab/BoxMed-RL.
放射学报告生成对效率至关重要,但目前的模型往往缺乏专家的结构化推理和明确基于解剖证据的发现的能力,这限制了临床信任和可解释性。本文介绍了BoxMed-RL,这是一个统一的训练框架,用于生成空间可验证和可解释的胸部x线报告。BoxMed-RL通过两个综合阶段推进胸部x线报告生成:(1)预训练阶段。BoxMed-RL通过医学概念学习来学习放射科医生般的推理,并通过强化学习来强化空间基础。(2)下游适配器阶段。预训练的重量被冻结,而轻量级适配器确保流畅性和临床可信度。在两个广泛使用的公共基准测试(MIMIC-CXR和IU X-Ray)上进行的实验表明,与最先进的方法相比,BoxMed-RL在METEOR和ROUGE-L指标上平均提高了7%。在基于大型语言模型的度量中平均5%的改进进一步强调了BoxMed-RL在生成高质量报告方面的健壮性。相关代码和培训模板可在https://github.com/ayanglab/BoxMed-RL上公开获取。
{"title":"Reason like a radiologist: Chain-of-thought and reinforcement learning for verifiable report generation","authors":"Peiyuan Jing ,&nbsp;Kinhei Lee ,&nbsp;Zhenxuan Zhang ,&nbsp;Huichi Zhou ,&nbsp;Zhengqing Yuan ,&nbsp;Zhifan Gao ,&nbsp;Lei Zhu ,&nbsp;Giorgos Papanastasiou ,&nbsp;Yingying Fang ,&nbsp;Guang Yang","doi":"10.1016/j.media.2025.103910","DOIUrl":"10.1016/j.media.2025.103910","url":null,"abstract":"<div><div>Radiology report generation is critical for efficiency, but current models often lack the structured reasoning of experts and the ability to explicitly ground findings in anatomical evidence, which limits clinical trust and explainability. This paper introduces BoxMed-RL, a unified training framework to generate spatially verifiable and explainable chest X-ray reports. BoxMed-RL advances chest X-ray report generation through two integrated phases: (1) Pretraining Phase. BoxMed-RL learns radiologist-like reasoning through medical concept learning and enforces spatial grounding with reinforcement learning. (2) Downstream Adapter Phase. Pretrained weights are frozen while a lightweight adapter ensures fluency and clinical credibility. Experiments on two widely used public benchmarks (MIMIC-CXR and IU X-Ray) demonstrate that BoxMed-RL achieves an average 7 % improvement in both METEOR and ROUGE-L metrics compared to state-of-the-art methods. An average 5 % improvement in large language model-based metrics further underscores BoxMed-RL’s robustness in generating high-quality reports. Related code and training templates are publicly available at <span><span>https://github.com/ayanglab/BoxMed-RL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103910"},"PeriodicalIF":11.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AtlasMorph: Learning conditional deformable templates for brain MRI AtlasMorph:学习脑MRI的条件可变形模板
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.media.2025.103893
Marianne Rakic , Andrew Hoopes , Mazdak S. Abulnaga , Mert R. Sabuncu , John V. Guttag , Adrian V. Dalca , for the Alzheimer’s Disease Neuroimaging Initiative
Deformable templates, or atlases, are images that represent a prototypical anatomy for a population, and are often enhanced with probabilistic anatomical label maps. They are commonly used in medical image analysis for population studies and computational anatomy tasks such as registration and segmentation. Because developing a template is a computationally expensive process, relatively few templates are available. As a result, analysis is often conducted with sub-optimal templates that are not truly representative of the study population, especially when there are large variations within this population.
We propose a machine learning framework that uses convolutional registration neural networks to efficiently learn a function that outputs templates conditioned on subject-specific attributes, such as age and sex. We also leverage segmentations, when available, to produce anatomical segmentation maps for the resulting templates. The learned network can also be used to register subject images to the templates. We demonstrate our method on a compilation of 3D brain MRI datasets, and show that it can learn high-quality templates that are representative of populations. We find that annotated conditional templates enable better registration than their unlabeled unconditional counterparts, and outperform other templates construction methods.
可变形模板或地图集是代表种群原型解剖结构的图像,通常使用概率解剖标签地图进行增强。它们通常用于人口研究的医学图像分析和计算解剖学任务,如配准和分割。因为开发模板是一个计算成本很高的过程,所以可用的模板相对较少。因此,分析通常是用次优模板进行的,这些模板不能真正代表研究人群,特别是当这个人群中有很大的变化时。我们提出了一个机器学习框架,该框架使用卷积注册神经网络来有效地学习一个函数,该函数输出基于主题特定属性(如年龄和性别)的模板。我们还利用分割,当可用时,为得到的模板生成解剖分割图。学习到的网络还可以用于将主题图像注册到模板上。我们在3D脑MRI数据集的汇编上展示了我们的方法,并表明它可以学习具有代表性的高质量模板。我们发现带注释的条件模板比未标记的无条件模板更容易注册,并且优于其他模板构建方法。
{"title":"AtlasMorph: Learning conditional deformable templates for brain MRI","authors":"Marianne Rakic ,&nbsp;Andrew Hoopes ,&nbsp;Mazdak S. Abulnaga ,&nbsp;Mert R. Sabuncu ,&nbsp;John V. Guttag ,&nbsp;Adrian V. Dalca ,&nbsp;for the Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.media.2025.103893","DOIUrl":"10.1016/j.media.2025.103893","url":null,"abstract":"<div><div>Deformable templates, or atlases, are images that represent a prototypical anatomy for a population, and are often enhanced with probabilistic anatomical label maps. They are commonly used in medical image analysis for population studies and computational anatomy tasks such as registration and segmentation. Because developing a template is a computationally expensive process, relatively few templates are available. As a result, analysis is often conducted with sub-optimal templates that are not truly representative of the study population, especially when there are large variations within this population.</div><div>We propose a machine learning framework that uses convolutional registration neural networks to efficiently learn a function that outputs templates conditioned on subject-specific attributes, such as age and sex. We also leverage segmentations, when available, to produce anatomical segmentation maps for the resulting templates. The learned network can also be used to register subject images to the templates. We demonstrate our method on a compilation of 3D brain MRI datasets, and show that it can learn high-quality templates that are representative of populations. We find that annotated conditional templates enable better registration than their unlabeled unconditional counterparts, and outperform other templates construction methods.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"110 ","pages":"Article 103893"},"PeriodicalIF":11.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIA-net: Cross-modality interaction and aggregation network for ovarian tumor segmentation from multi-modal MRI CIA-Net:用于多模态MRI卵巢肿瘤分割的跨模态交互和聚合网络
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.media.2025.103907
Yifan Gao , Yong’ai Li , Xin Gao
Magnetic resonance imaging (MRI) is an essential examination for ovarian cancer, in which ovarian tumor segmentation is crucial for personalized diagnosis and treatment planning. However, ovarian tumors often present with mixed cystic and solid regions in imaging, posing additional difficulties for automatic segmentation. In clinical practice, radiologists use T2-weighted imaging as the main modality to delineate tumor boundaries. In comparison, multi-modal MRI provides complementary information across modalities that can improve tumor segmentation. Therefore, it is important to fuse salient features from other modalities to the main modality. In this paper, we propose a cross-modality interaction and aggregation network (CIA-Net), a hybrid convolutional and Transformer architecture, for automatic ovarian tumor segmentation from multi-modal MRI. CIA-Net divides multi-modal MRI into one main (T2) and three minor modalities (T1, ADC, DWI), each with independent encoders. The novel cross-modality collaboration block selectively aggregates complementary features from minor modalities into the main modality through a progressive context injection module. Additionally, we introduce the progressive neighborhood integrated module to filter intra- and inter-modality noise and redundancies by refining adjacent slices of each modality. We evaluate our proposed method on a diverse, multi-center ovarian tumor dataset comprising 739 patients, and further validate its generalization and robustness on two public benchmarks for brain and cardiac segmentation. Comparative experiments with other cutting-edge techniques demonstrate the effectiveness of CIA-Net, highlighting its potential to be applied in clinical scenarios.
磁共振成像(MRI)是卵巢癌的一项重要检查,其中卵巢肿瘤分割对个性化诊断和治疗计划至关重要。然而,卵巢肿瘤在影像学上经常表现为囊性和实性混合区,这给自动分割带来了额外的困难。在临床实践中,放射科医生使用t2加权成像作为划定肿瘤边界的主要方式。相比之下,多模态MRI提供了跨模态的互补信息,可以改善肿瘤分割。因此,将其他模态的显著特征融合到主模态中是很重要的。在本文中,我们提出了一种跨模态交互和聚合网络(CIA-Net),一种混合卷积和Transformer架构,用于从多模态MRI中自动分割卵巢肿瘤。CIA-Net将多模态MRI分为一个主模态(T2)和三个次要模态(T1、ADC、DWI),每个模态都有独立的编码器。新的跨模态协作块通过渐进式上下文注入模块选择性地将次要模态的互补特征聚合到主模态中。此外,我们引入渐进式邻域集成模块,通过细化每个模态的相邻切片来过滤模态内和模态间的噪声和冗余。我们在包含739例患者的多样化、多中心卵巢肿瘤数据集上评估了我们提出的方法,并进一步验证了其在脑和心脏分割两个公共基准上的泛化和稳健性。与其他尖端技术的对比实验证明了CIA-Net的有效性,突出了其在临床场景中的应用潜力。
{"title":"CIA-net: Cross-modality interaction and aggregation network for ovarian tumor segmentation from multi-modal MRI","authors":"Yifan Gao ,&nbsp;Yong’ai Li ,&nbsp;Xin Gao","doi":"10.1016/j.media.2025.103907","DOIUrl":"10.1016/j.media.2025.103907","url":null,"abstract":"<div><div>Magnetic resonance imaging (MRI) is an essential examination for ovarian cancer, in which ovarian tumor segmentation is crucial for personalized diagnosis and treatment planning. However, ovarian tumors often present with mixed cystic and solid regions in imaging, posing additional difficulties for automatic segmentation. In clinical practice, radiologists use T2-weighted imaging as the main modality to delineate tumor boundaries. In comparison, multi-modal MRI provides complementary information across modalities that can improve tumor segmentation. Therefore, it is important to fuse salient features from other modalities to the main modality. In this paper, we propose a cross-modality interaction and aggregation network (CIA-Net), a hybrid convolutional and Transformer architecture, for automatic ovarian tumor segmentation from multi-modal MRI. CIA-Net divides multi-modal MRI into one main (T2) and three minor modalities (T1, ADC, DWI), each with independent encoders. The novel cross-modality collaboration block selectively aggregates complementary features from minor modalities into the main modality through a progressive context injection module. Additionally, we introduce the progressive neighborhood integrated module to filter intra- and inter-modality noise and redundancies by refining adjacent slices of each modality. We evaluate our proposed method on a diverse, multi-center ovarian tumor dataset comprising 739 patients, and further validate its generalization and robustness on two public benchmarks for brain and cardiac segmentation. Comparative experiments with other cutting-edge techniques demonstrate the effectiveness of CIA-Net, highlighting its potential to be applied in clinical scenarios.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103907"},"PeriodicalIF":11.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIRAGE: Medical image-text pre-training for robustness against noisy environments 海市蜃楼:医学图像-文本对噪声环境鲁棒性的预训练
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-10 DOI: 10.1016/j.media.2025.103912
Pujin Cheng , Yijin Huang , Li Lin , Junyan Lyu , Kenneth Kin-Yip Wong , Xiaoying Tang
Contrastive vision-language pre-training models have achieved significant success on large-scale general multi-modality datasets. However, in the medical domain, the high costs of data collection and expert annotation are likely to result in small-sized and noisy datasets, which can severely limit model performance due to overfitting unreliable data and misrepresenting patterns. To address this challenge, we present MIRAGE, a novel framework designed to handle mismatched false positives and semantically related false negatives during medical image-text pre-training. Cross-entropy-based optimization proves inadequate for noisy contrastive settings, as it tends to fail in distinguishing noisy samples and ends up fitting them, leading to suboptimal representations. To overcome this limitation, we introduce an optimal transport-based contrastive loss that effectively identifies noisy samples leveraging the nearest cross-modality neighbor prior, thereby reducing noisy samples’ adverse impact. Additionally, we propose an adaptive gradient balancing strategy that mitigates the influence of gradients from noisy samples. Extensive experiments demonstrate that MIRAGE achieves superior performance across six tasks and 14 datasets, largely outperforming representative state-of-the-art methods. Furthermore, comprehensive analyses on synthetic noisy data are performed, clearly demonstrating the contribution of each component in MIRAGE.
对比视觉语言预训练模型在大规模通用多模态数据集上取得了显著的成功。然而,在医疗领域,数据收集和专家注释的高成本可能导致数据集规模小且有噪声,这可能会由于过度拟合不可靠的数据和错误的模式而严重限制模型的性能。为了解决这一挑战,我们提出了MIRAGE,这是一个新的框架,旨在处理医学图像-文本预训练过程中不匹配的假阳性和语义相关的假阴性。基于交叉熵的优化被证明不适用于嘈杂的对比设置,因为它往往无法区分嘈杂的样本并最终拟合它们,导致次优表示。为了克服这一限制,我们引入了一种基于传输的最优对比损失,该损失利用最近的交叉模态邻居先验有效地识别噪声样本,从而减少噪声样本的不利影响。此外,我们提出了一种自适应梯度平衡策略,以减轻来自噪声样本的梯度的影响。大量实验表明,MIRAGE在6个任务和14个数据集上实现了卓越的性能,在很大程度上优于代表性的最先进的方法。此外,对合成噪声数据进行了综合分析,清楚地展示了MIRAGE中每个组件的贡献。
{"title":"MIRAGE: Medical image-text pre-training for robustness against noisy environments","authors":"Pujin Cheng ,&nbsp;Yijin Huang ,&nbsp;Li Lin ,&nbsp;Junyan Lyu ,&nbsp;Kenneth Kin-Yip Wong ,&nbsp;Xiaoying Tang","doi":"10.1016/j.media.2025.103912","DOIUrl":"10.1016/j.media.2025.103912","url":null,"abstract":"<div><div>Contrastive vision-language pre-training models have achieved significant success on large-scale general multi-modality datasets. However, in the medical domain, the high costs of data collection and expert annotation are likely to result in small-sized and noisy datasets, which can severely limit model performance due to overfitting unreliable data and misrepresenting patterns. To address this challenge, we present MIRAGE, a novel framework designed to handle mismatched false positives and semantically related false negatives during medical image-text pre-training. Cross-entropy-based optimization proves inadequate for noisy contrastive settings, as it tends to fail in distinguishing noisy samples and ends up fitting them, leading to suboptimal representations. To overcome this limitation, we introduce an optimal transport-based contrastive loss that effectively identifies noisy samples leveraging the nearest cross-modality neighbor prior, thereby reducing noisy samples’ adverse impact. Additionally, we propose an adaptive gradient balancing strategy that mitigates the influence of gradients from noisy samples. Extensive experiments demonstrate that MIRAGE achieves superior performance across six tasks and 14 datasets, largely outperforming representative state-of-the-art methods. Furthermore, comprehensive analyses on synthetic noisy data are performed, clearly demonstrating the contribution of each component in MIRAGE.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103912"},"PeriodicalIF":11.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation of the right ventricular myocardial infarction in multi-centre cardiac magnetic resonance images 多中心心脏磁共振图像右室心肌梗死的分割
IF 11.8 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 DOI: 10.1016/j.media.2025.103911
Chao Xu , Dongaolei An , Chaolu Feng , Zijian Bian , Lian-Ming Wu
Right ventricular myocardial infarction (RVMI) is associated with higher in-hospital morbidity and mortality. Cardiac magnetic resonance (CMR) imaging provides crucial pathological information for diagnosis and/or treatment of RVMI. Segmentation of RVMI in CMR images is significant but challenging. This is because, to the best of our knowledge, there is no publicly available dataset in this field. Furthermore, the severe class imbalance problem caused by mostly less than 0.2 % proportion and the extreme intensity overlap between RVMI and the background bring challenges to the design of segmentation model. Therefore, we release a benchmark CMR dataset, consist of short-axis MR images of 213 subjects from 3 centres acquired by Philips, GE, and Siemens equipments. A multi-stage sequential deep learning model RVMISegNet is proposed to segment RVMI and its related organs at different scales to tackle the class imbalance and intensity overlap problems. In the first stage, transfer learning is employed to localize the right ventricle region. In the second stage, the centroid of the right ventricle guides the extraction of a region of interest, where pseudo-labels are generated to assist a coarse segmentation of myocardial infarction. In the third stage, morphological post-processing is applied, and fine segmentation is performed. Both the coarse and fine segmentation stages use a modified UNet++ backbone, which integrates texture and semantic extraction modules. Extensive experiments validate the state-of-the-art performance of our model and the effectiveness of its constituent modules. The dataset and source codes are available at https://github.com/DFLAG-NEU/RVMISegNet.
右心室心肌梗死(RVMI)与较高的住院发病率和死亡率相关。心脏磁共振(CMR)成像为RVMI的诊断和/或治疗提供了重要的病理信息。CMR图像中RVMI的分割具有重要意义,但具有挑战性。这是因为,据我们所知,在这个领域没有公开可用的数据集。此外,大部分小于0.2%的比例导致的严重的类不平衡问题以及RVMI与背景之间的极端强度重叠给分割模型的设计带来了挑战。因此,我们发布了一个基准CMR数据集,该数据集由来自飞利浦、GE和西门子设备的3个中心的213名受试者的短轴MR图像组成。提出了一种多阶段序列深度学习模型RVMISegNet,对RVMI及其相关器官进行不同尺度的分割,以解决类不平衡和强度重叠问题。第一阶段采用迁移学习对右心室区域进行定位。在第二阶段,右心室质心引导感兴趣区域的提取,在该区域生成伪标签以辅助心肌梗死的粗分割。第三阶段,进行形态学后处理,进行精细分割。粗分割和细分割都使用改进的UNet++主干,该主干集成了纹理和语义提取模块。大量的实验验证了我们的模型的最先进的性能和其组成模块的有效性。数据集和源代码可在https://github.com/DFLAG-NEU/RVMISegNet上获得。
{"title":"Segmentation of the right ventricular myocardial infarction in multi-centre cardiac magnetic resonance images","authors":"Chao Xu ,&nbsp;Dongaolei An ,&nbsp;Chaolu Feng ,&nbsp;Zijian Bian ,&nbsp;Lian-Ming Wu","doi":"10.1016/j.media.2025.103911","DOIUrl":"10.1016/j.media.2025.103911","url":null,"abstract":"<div><div>Right ventricular myocardial infarction (RVMI) is associated with higher in-hospital morbidity and mortality. Cardiac magnetic resonance (CMR) imaging provides crucial pathological information for diagnosis and/or treatment of RVMI. Segmentation of RVMI in CMR images is significant but challenging. This is because, to the best of our knowledge, there is no publicly available dataset in this field. Furthermore, the severe class imbalance problem caused by mostly less than 0.2 % proportion and the extreme intensity overlap between RVMI and the background bring challenges to the design of segmentation model. Therefore, we release a benchmark CMR dataset, consist of short-axis MR images of 213 subjects from 3 centres acquired by Philips, GE, and Siemens equipments. A multi-stage sequential deep learning model RVMISegNet is proposed to segment RVMI and its related organs at different scales to tackle the class imbalance and intensity overlap problems. In the first stage, transfer learning is employed to localize the right ventricle region. In the second stage, the centroid of the right ventricle guides the extraction of a region of interest, where pseudo-labels are generated to assist a coarse segmentation of myocardial infarction. In the third stage, morphological post-processing is applied, and fine segmentation is performed. Both the coarse and fine segmentation stages use a modified UNet++ backbone, which integrates texture and semantic extraction modules. Extensive experiments validate the state-of-the-art performance of our model and the effectiveness of its constituent modules. The dataset and source codes are available at <span><span>https://github.com/DFLAG-NEU/RVMISegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103911"},"PeriodicalIF":11.8,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1