首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
An Unsupervised Learning Approach for Reconstructing 3T-Like Images From 0.3T MRI Without Paired Training Data 一种无监督学习方法在无配对训练数据的情况下从0.3T MRI重建3t样图像。
Pub Date : 2025-08-11 DOI: 10.1109/TMI.2025.3597401
Huaishui Yang;Shaojun Liu;Yilong Liu;Lingyan Zhang;Shoujin Huang;Jiayu Zheng;Jingzhe Liu;Hua Guo;Ed X. Wu;Mengye Lyu
Magnetic resonance imaging (MRI) is powerful in medical diagnostics, yet high-field MRI, despite offering superior image quality, incurs significant costs for procurement, installation, maintenance, and operation, restricting its availability and accessibility, especially in low- and middle-income countries. Addressing this, our study proposes an unsupervised learning algorithm based on cycle-consistent generative adversarial networks. This framework transforms 0.3T low-field MRI into higher-quality 3T-like images, bypassing the need for paired low/high-field training data. The proposed architecture integrates two novel modules to enhance reconstruction quality: (1) an attention block that dynamically balances high-field-like features with the original low-field input, and (2) an edge block that refines boundary details, providing more accurate structural reconstruction. The proposed generative model is trained on large-scale, unpaired, public datasets, and further validated on paired low/high-field acquisitions of three major clinical MRI sequences: T1-weighted, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) imaging. It demonstrates notable improvements in tissue contrast and signal-to-noise ratio while preserving anatomical fidelity. This approach utilizes rich information from publicly available MRI resources, providing a data-efficient unsupervised alternative that complements supervised methods to enhance the utility of low-field MRI.
磁共振成像(MRI)在医学诊断方面具有强大的功能,然而,尽管高场MRI提供了卓越的图像质量,但在采购、安装、维护和操作方面产生了巨大的成本,限制了其可用性和可及性,特别是在低收入和中等收入国家。为了解决这个问题,我们的研究提出了一种基于循环一致生成对抗网络的无监督学习算法。该框架将0.3T低场MRI转换为更高质量的3t样图像,而不需要配对的低场/高场训练数据。该架构集成了两个新的模块,以提高重建质量:(1)关注块,动态平衡高场特征与原始低场输入;(2)边缘块,细化边界细节,提供更精确的结构重建。所提出的生成模型在大规模、未配对的公共数据集上进行训练,并在三个主要临床MRI序列(t1加权、t2加权和流体衰减反转恢复(FLAIR)成像)的配对低场/高场采集上进一步验证。在保持解剖保真度的同时,它显示了组织对比度和信噪比的显着改善。该方法利用了来自公开可用的MRI资源的丰富信息,提供了一种数据高效的无监督替代方案,补充了监督方法,以增强低场MRI的实用性。
{"title":"An Unsupervised Learning Approach for Reconstructing 3T-Like Images From 0.3T MRI Without Paired Training Data","authors":"Huaishui Yang;Shaojun Liu;Yilong Liu;Lingyan Zhang;Shoujin Huang;Jiayu Zheng;Jingzhe Liu;Hua Guo;Ed X. Wu;Mengye Lyu","doi":"10.1109/TMI.2025.3597401","DOIUrl":"10.1109/TMI.2025.3597401","url":null,"abstract":"Magnetic resonance imaging (MRI) is powerful in medical diagnostics, yet high-field MRI, despite offering superior image quality, incurs significant costs for procurement, installation, maintenance, and operation, restricting its availability and accessibility, especially in low- and middle-income countries. Addressing this, our study proposes an unsupervised learning algorithm based on cycle-consistent generative adversarial networks. This framework transforms 0.3T low-field MRI into higher-quality 3T-like images, bypassing the need for paired low/high-field training data. The proposed architecture integrates two novel modules to enhance reconstruction quality: (1) an attention block that dynamically balances high-field-like features with the original low-field input, and (2) an edge block that refines boundary details, providing more accurate structural reconstruction. The proposed generative model is trained on large-scale, unpaired, public datasets, and further validated on paired low/high-field acquisitions of three major clinical MRI sequences: T1-weighted, T2-weighted, and fluid-attenuated inversion recovery (FLAIR) imaging. It demonstrates notable improvements in tissue contrast and signal-to-noise ratio while preserving anatomical fidelity. This approach utilizes rich information from publicly available MRI resources, providing a data-efficient unsupervised alternative that complements supervised methods to enhance the utility of low-field MRI.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5358-5371"},"PeriodicalIF":0.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144819720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPDiff: Erasure Perception Diffusion Model for Unsupervised Anomaly Detection in Preoperative Multimodal Images EPDiff:用于术前多模态图像无监督异常检测的擦除感知扩散模型。
Pub Date : 2025-08-11 DOI: 10.1109/TMI.2025.3597545
Jiazheng Wang;Min Liu;Wenting Shen;Renjie Ding;Yaonan Wang;Erik Meijering
Unsupervised anomaly detection (UAD) methods typically detect anomalies by learning and reconstructing the normative distribution. However, since anomalies constantly invade and affect their surroundings, sub-healthy areas in the junction present structural deformations that could be easily misidentified as anomalies, posing difficulties for UAD methods that solely learn the normative distribution. The use of multimodal images can facilitate to address the above challenges, as they can provide complementary information of anomalies. Therefore, this paper propose a novel method for UAD in preoperative multimodal images, called Erasure Perception Diffusion model (EPDiff). First, the Local Erasure Progressive Training (LEPT) framework is designed to better rebuild sub-healthy structures around anomalies through the diffusion model with a two-phase process. Initially, healthy images are used to capture deviation features labeled as potential anomalies. Then, these anomalies are locally erased in multimodal images to progressively learn sub-healthy structures, obtaining a more detailed reconstruction around anomalies. Second, the Global Structural Perception (GSP) module is developed in the diffusion model to realize global structural representation and correlation within images and between modalities through interactions of high-level semantic information. In addition, a training-free module, named Multimodal Attention Fusion (MAF) module, is presented for weighted fusion of anomaly maps between different modalities and obtaining binary anomaly outputs. Experimental results show that EPDiff improves the AUPRC and mDice scores by 2% and 3.9% on BraTS2021, and by 5.2% and 4.5% on Shifts over the state-of-the-art methods, which proves the applicability of EPDiff in diverse anomaly diagnosis. The code is available at https://github.com/wjiazheng/EPDiff
无监督异常检测(UAD)方法通常通过学习和重构规范分布来检测异常。然而,由于异常不断侵入并影响其周围环境,结区内的亚健康区域存在结构变形,容易被误认为是异常,这给仅学习规范分布的UAD方法带来了困难。多模态图像的使用有助于解决上述挑战,因为它们可以提供异常的补充信息。因此,本文提出了一种新的术前多模态图像UAD处理方法,即Erasure Perception Diffusion model (EPDiff)。首先,设计局部擦除渐进训练(Local Erasure Progressive Training, LEPT)框架,通过两阶段过程的扩散模型更好地重建异常周围的亚健康结构。最初,健康图像用于捕获标记为潜在异常的偏差特征。然后,在多模态图像中局部擦除这些异常,逐步学习亚健康结构,获得异常周围更详细的重建。其次,在扩散模型中开发了全局结构感知(Global structure Perception, GSP)模块,通过高级语义信息的交互实现图像内部和模态之间的全局结构表示和关联。此外,提出了一种无需训练的多模态注意融合(Multimodal Attention Fusion, MAF)模块,对不同模态间的异常映射进行加权融合,得到二元异常输出。实验结果表明,与现有方法相比,EPDiff在BraTS2021上的AUPRC和mice得分分别提高了2%和3.9%,在Shifts上的得分分别提高了5.2%和4.5%,证明了EPDiff在各种异常诊断中的适用性。代码可在https://github.com/wjiazheng/EPDiff上获得。
{"title":"EPDiff: Erasure Perception Diffusion Model for Unsupervised Anomaly Detection in Preoperative Multimodal Images","authors":"Jiazheng Wang;Min Liu;Wenting Shen;Renjie Ding;Yaonan Wang;Erik Meijering","doi":"10.1109/TMI.2025.3597545","DOIUrl":"10.1109/TMI.2025.3597545","url":null,"abstract":"Unsupervised anomaly detection (UAD) methods typically detect anomalies by learning and reconstructing the normative distribution. However, since anomalies constantly invade and affect their surroundings, sub-healthy areas in the junction present structural deformations that could be easily misidentified as anomalies, posing difficulties for UAD methods that solely learn the normative distribution. The use of multimodal images can facilitate to address the above challenges, as they can provide complementary information of anomalies. Therefore, this paper propose a novel method for UAD in preoperative multimodal images, called Erasure Perception Diffusion model (EPDiff). First, the Local Erasure Progressive Training (LEPT) framework is designed to better rebuild sub-healthy structures around anomalies through the diffusion model with a two-phase process. Initially, healthy images are used to capture deviation features labeled as potential anomalies. Then, these anomalies are locally erased in multimodal images to progressively learn sub-healthy structures, obtaining a more detailed reconstruction around anomalies. Second, the Global Structural Perception (GSP) module is developed in the diffusion model to realize global structural representation and correlation within images and between modalities through interactions of high-level semantic information. In addition, a training-free module, named Multimodal Attention Fusion (MAF) module, is presented for weighted fusion of anomaly maps between different modalities and obtaining binary anomaly outputs. Experimental results show that EPDiff improves the AUPRC and mDice scores by 2% and 3.9% on BraTS2021, and by 5.2% and 4.5% on Shifts over the state-of-the-art methods, which proves the applicability of EPDiff in diverse anomaly diagnosis. The code is available at <uri>https://github.com/wjiazheng/EPDiff</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"379-390"},"PeriodicalIF":0.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144819772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Choroid Segmentation and Thickness Measurement Based on Mixed Attention-Guided Multiscale Feature Fusion Network 基于混合注意引导的多尺度特征融合网络的脉络自动分割与厚度测量。
Pub Date : 2025-08-08 DOI: 10.1109/TMI.2025.3597026
Xiaoyu Zhu;Shiyin Li;HongLiang Bi;Lina Guan;Haiyang Liu;Zhaolin Lu
Choroidal thickness variations serve as critical biomarkers for numerous ophthalmic diseases. Accurate segmentation and quantification of the choroid in optical coherence tomography (OCT) images is essential for clinical diagnosis and disease progression monitoring. Due to the small number of disease types in the public OCT dataset involving changes in choroidal thickness and the lack of a publicly available labeled dataset, we constructed the Xuzhou Municipal Hospital (XZMH)-Choroid dataset. This dataset contains annotated OCT images of normal and eight choroid-related diseases. However, segmentation of the choroid in OCT images remains a formidable challenge due to the confounding factors of blurred boundaries, non-uniform texture, and lesions. To overcome these challenges, we proposed a mixed attention-guided multiscale feature fusion network (MAMFF-Net). This network integrates a Mixed Attention Encoder (MAE) for enhanced fine-grained feature extraction, a deformable multiscale feature fusion path (DMFFP) for adaptive feature integration across lesion deformations, and a multiscale pyramid layer aggregation (MPLA) module for improved contextual representation learning. Through comparative experiments with other deep learning methods, we found that the MAMFF-Net model has better segmentation performance than other deep learning methods (mDice: 97.44, mIoU: 95.11, mAcc: 97.71). Based on the choroidal segmentation implemented in MAMFF-Net, an algorithm for automated choroidal thickness measurement was developed, and the automated measurement results approached the level of senior specialists.
脉络膜厚度变化是许多眼科疾病的重要生物标志物。光学相干断层扫描(OCT)图像中脉络膜的准确分割和定量对临床诊断和疾病进展监测至关重要。由于公共OCT数据集中涉及脉络膜厚度变化的疾病类型较少,并且缺乏公开可用的标记数据集,我们构建了徐州市医院(XZMH)-脉络膜数据集。该数据集包含正常和八种脉络膜相关疾病的注释OCT图像。然而,由于边界模糊、纹理不均匀和病变等混杂因素,OCT图像中脉络膜的分割仍然是一个巨大的挑战。为了克服这些挑战,我们提出了一种混合注意力引导的多尺度特征融合网络(MAMFF-Net)。该网络集成了用于增强细粒度特征提取的混合注意编码器(MAE),用于跨病变变形进行自适应特征集成的可变形多尺度特征融合路径(DMFFP),以及用于改进上下文表示学习的多尺度金字塔层聚合(MPLA)模块。通过与其他深度学习方法的对比实验,我们发现MAMFF-Net模型比其他深度学习方法具有更好的分割性能(mdevice: 97.44, mIoU: 95.11, mAcc: 97.71)。在MAMFF-Net实现脉络膜分割的基础上,开发了脉络膜厚度自动测量算法,自动测量结果接近高级专家水平。
{"title":"Automatic Choroid Segmentation and Thickness Measurement Based on Mixed Attention-Guided Multiscale Feature Fusion Network","authors":"Xiaoyu Zhu;Shiyin Li;HongLiang Bi;Lina Guan;Haiyang Liu;Zhaolin Lu","doi":"10.1109/TMI.2025.3597026","DOIUrl":"10.1109/TMI.2025.3597026","url":null,"abstract":"Choroidal thickness variations serve as critical biomarkers for numerous ophthalmic diseases. Accurate segmentation and quantification of the choroid in optical coherence tomography (OCT) images is essential for clinical diagnosis and disease progression monitoring. Due to the small number of disease types in the public OCT dataset involving changes in choroidal thickness and the lack of a publicly available labeled dataset, we constructed the Xuzhou Municipal Hospital (XZMH)-Choroid dataset. This dataset contains annotated OCT images of normal and eight choroid-related diseases. However, segmentation of the choroid in OCT images remains a formidable challenge due to the confounding factors of blurred boundaries, non-uniform texture, and lesions. To overcome these challenges, we proposed a mixed attention-guided multiscale feature fusion network (MAMFF-Net). This network integrates a Mixed Attention Encoder (MAE) for enhanced fine-grained feature extraction, a deformable multiscale feature fusion path (DMFFP) for adaptive feature integration across lesion deformations, and a multiscale pyramid layer aggregation (MPLA) module for improved contextual representation learning. Through comparative experiments with other deep learning methods, we found that the MAMFF-Net model has better segmentation performance than other deep learning methods (mDice: 97.44, mIoU: 95.11, mAcc: 97.71). Based on the choroidal segmentation implemented in MAMFF-Net, an algorithm for automated choroidal thickness measurement was developed, and the automated measurement results approached the level of senior specialists.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"350-363"},"PeriodicalIF":0.0,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Brain Lesion Segmentation Using Posterior Distributions Learned by Subspace-Based Generative Model 基于子空间生成模型学习后验分布的无监督脑损伤分割。
Pub Date : 2025-08-08 DOI: 10.1109/TMI.2025.3597080
Huixiang Zhuang;Yue Guan;Yi Ding;Chang Xu;Zijun Cheng;Yuhao Ma;Ruihao Liu;Ziyu Meng;Li Cao;Yao Li;Zhi-Pei Liang
Unsupervised brain lesion segmentation, focusing on learning normative distributions from images of healthy subjects, are less dependent on lesion-labeled data, thus exhibiting better generalization capabilities. A fundamental challenge in learning normative distributions of images lies in the high dimensionality if image pixels are treated as correlated random variables to capture spatial dependence. In this study, we proposed a subspace-based deep generative model to learn the posterior normal distributions. Specifically, we used probabilistic subspace models to capture spatial-intensity distributions and spatial-structure distributions of brain images from healthy subjects. These models captured prior spatial-intensity and spatial-structure variations effectively by treating the subspace coefficients as random variables with basis functions being the eigen-images and eigen-density functions learned from the training data. These prior distributions were then converted to posterior distributions, including both the posterior normal and posterior lesion distributions for a given image using the subspace-based generative model and subspace-assisted Bayesian analysis, respectively. Finally, an unsupervised fusion classifier was used to combine the posterior and likelihood features for lesion segmentation. The proposed method has been evaluated on simulated and real lesion data, including tumor, multiple sclerosis, and stroke, demonstrating superior segmentation accuracy and robustness over the state-of-the-art methods. Our proposed method holds promise for enhancing unsupervised brain lesion delineation in clinical applications.
无监督脑损伤分割侧重于从健康受试者的图像中学习规范分布,对病变标记数据的依赖性较小,因此表现出更好的泛化能力。如果将图像像素作为相关随机变量来捕获空间依赖性,那么学习图像规范分布的一个基本挑战在于高维性。在这项研究中,我们提出了一种基于子空间的深度生成模型来学习后验正态分布。具体来说,我们使用概率子空间模型来捕获健康受试者脑图像的空间强度分布和空间结构分布。这些模型通过将子空间系数作为随机变量,基函数为从训练数据中学习到的特征图像和特征密度函数,有效地捕获了先验的空间强度和空间结构变化。然后将这些先验分布转换为后验分布,分别使用基于子空间的生成模型和子空间辅助贝叶斯分析,包括给定图像的后验正态分布和后验病变分布。最后,采用无监督融合分类器结合后验特征和似然特征进行病灶分割。该方法已在模拟和真实病变数据(包括肿瘤、多发性硬化症和中风)上进行了评估,显示出优于最先进方法的分割准确性和鲁棒性。我们提出的方法有望在临床应用中增强无监督的脑损伤描述。
{"title":"Unsupervised Brain Lesion Segmentation Using Posterior Distributions Learned by Subspace-Based Generative Model","authors":"Huixiang Zhuang;Yue Guan;Yi Ding;Chang Xu;Zijun Cheng;Yuhao Ma;Ruihao Liu;Ziyu Meng;Li Cao;Yao Li;Zhi-Pei Liang","doi":"10.1109/TMI.2025.3597080","DOIUrl":"10.1109/TMI.2025.3597080","url":null,"abstract":"Unsupervised brain lesion segmentation, focusing on learning normative distributions from images of healthy subjects, are less dependent on lesion-labeled data, thus exhibiting better generalization capabilities. A fundamental challenge in learning normative distributions of images lies in the high dimensionality if image pixels are treated as correlated random variables to capture spatial dependence. In this study, we proposed a subspace-based deep generative model to learn the posterior normal distributions. Specifically, we used probabilistic subspace models to capture spatial-intensity distributions and spatial-structure distributions of brain images from healthy subjects. These models captured prior spatial-intensity and spatial-structure variations effectively by treating the subspace coefficients as random variables with basis functions being the eigen-images and eigen-density functions learned from the training data. These prior distributions were then converted to posterior distributions, including both the posterior normal and posterior lesion distributions for a given image using the subspace-based generative model and subspace-assisted Bayesian analysis, respectively. Finally, an unsupervised fusion classifier was used to combine the posterior and likelihood features for lesion segmentation. The proposed method has been evaluated on simulated and real lesion data, including tumor, multiple sclerosis, and stroke, demonstrating superior segmentation accuracy and robustness over the state-of-the-art methods. Our proposed method holds promise for enhancing unsupervised brain lesion delineation in clinical applications.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"364-378"},"PeriodicalIF":0.0,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Anisotropic Cross-View Texture Transfer With Multi-Reference Non-Local Attention for CT Slice Interpolation 基于多参考非局部关注的CT切片插值各向异性交叉纹理传递。
Pub Date : 2025-08-08 DOI: 10.1109/TMI.2025.3596957
Kwang-Hyun Uhm;Hyunjun Cho;Sung-Hoo Hong;Seung-Won Jung
Computed tomography (CT) is one of the most widely used non-invasive imaging modalities for medical diagnosis. In clinical practice, CT images are usually acquired with large slice thicknesses due to the high cost of memory storage and operation time, resulting in an anisotropic CT volume with much lower inter-slice resolution than in-plane resolution. Since such inconsistent resolution may lead to difficulties in disease diagnosis, deep learning-based volumetric super-resolution methods have been developed to improve inter-slice resolution. Most existing methods conduct single-image super-resolution on the through-plane or synthesize intermediate slices from adjacent slices; however, the anisotropic characteristic of 3D CT volume has not been well explored. In this paper, we propose a novel cross-view texture transfer approach for CT slice interpolation by fully utilizing the anisotropic nature of 3D CT volume. Specifically, we design a unique framework that takes high-resolution in-plane texture details as a reference and transfers them to low-resolution through-plane images. To this end, we introduce a multi-reference non-local attention module that extracts meaningful features for reconstructing through-plane high-frequency details from multiple in-plane images. Through extensive experiments, we demonstrate that our method performs significantly better in CT slice interpolation than existing competing methods on public CT datasets including a real-paired benchmark, verifying the effectiveness of the proposed framework. The source code of this work is available at https://github.com/khuhm/ACVTT
计算机断层扫描(CT)是医学诊断中应用最广泛的非侵入性成像方式之一。在临床应用中,由于存储成本和操作时间较高,CT图像通常采用较大的切片厚度,导致CT体积各向异性,切片间分辨率远低于平面内分辨率。由于这种不一致的分辨率可能导致疾病诊断困难,因此开发了基于深度学习的体积超分辨率方法来提高层间分辨率。现有的方法大多是在透平面上进行单幅图像的超分辨,或从相邻切片合成中间切片;然而,三维CT体积的各向异性特征尚未得到很好的探讨。在本文中,我们充分利用三维CT体的各向异性,提出了一种新的横视纹理传输方法用于CT切片插值。具体来说,我们设计了一个独特的框架,以高分辨率的平面内纹理细节为参考,并将其转换为低分辨率的平面图像。为此,我们引入了一种多参考非局部关注模块,该模块从多幅平面内图像中提取有意义的特征,用于重建平面内高频细节。通过大量的实验,我们证明了我们的方法在公共CT数据集(包括实配对基准)上的CT切片插值性能明显优于现有的竞争方法,验证了所提出框架的有效性。该工作的源代码可从https://github.com/khuhm/ACVTT获得。
{"title":"An Anisotropic Cross-View Texture Transfer With Multi-Reference Non-Local Attention for CT Slice Interpolation","authors":"Kwang-Hyun Uhm;Hyunjun Cho;Sung-Hoo Hong;Seung-Won Jung","doi":"10.1109/TMI.2025.3596957","DOIUrl":"10.1109/TMI.2025.3596957","url":null,"abstract":"Computed tomography (CT) is one of the most widely used non-invasive imaging modalities for medical diagnosis. In clinical practice, CT images are usually acquired with large slice thicknesses due to the high cost of memory storage and operation time, resulting in an anisotropic CT volume with much lower inter-slice resolution than in-plane resolution. Since such inconsistent resolution may lead to difficulties in disease diagnosis, deep learning-based volumetric super-resolution methods have been developed to improve inter-slice resolution. Most existing methods conduct single-image super-resolution on the through-plane or synthesize intermediate slices from adjacent slices; however, the anisotropic characteristic of 3D CT volume has not been well explored. In this paper, we propose a novel cross-view texture transfer approach for CT slice interpolation by fully utilizing the anisotropic nature of 3D CT volume. Specifically, we design a unique framework that takes high-resolution in-plane texture details as a reference and transfers them to low-resolution through-plane images. To this end, we introduce a multi-reference non-local attention module that extracts meaningful features for reconstructing through-plane high-frequency details from multiple in-plane images. Through extensive experiments, we demonstrate that our method performs significantly better in CT slice interpolation than existing competing methods on public CT datasets including a real-paired benchmark, verifying the effectiveness of the proposed framework. The source code of this work is available at <uri>https://github.com/khuhm/ACVTT</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"336-349"},"PeriodicalIF":0.0,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
JustRAIGS: Justified Referral in AI Glaucoma Screening Challenge jusstraigs:人工智能青光眼筛查挑战的合理转诊。
Pub Date : 2025-08-07 DOI: 10.1109/TMI.2025.3596874
Yeganeh Madadi;Hina Raja;Koenraad A. Vermeer;Hans G. Lemij;Xiaoqin Huang;Eunjin Kim;Seunghoon Lee;Gitaek Kwon;Hyunwoo Kim;Jaeyoung Kim;Adrian Galdran;Miguel A. González Ballester;Dan Presil;Kristhian Aguilar;Victor Cavalcante;Celso Carvalho;Waldir Sabino;Mateus Oliveira;Hui Lin;Charilaos Apostolidis;Aggelos K. Katsaggelos;Tomasz Kubrak;Á. Casado-García;J. Heras;M. Ortega;L. Ramos;Philippe Zhang;Yihao Li;Jing Zhang;Weili Jiang;Pierre-Henri Conze;Mathieu Lamard;Gwenole Quellec;Mostafa El Habib Daho;Madukuri Shaurya;Anumeha Varma;Monika Agrawal;Siamak Yousefi
A major contributor to permanent vision loss is glaucoma. Early diagnosis is crucial for preventing vision loss due to glaucoma, making glaucoma screening essential. A more affordable method of glaucoma screening can be achieved by applying artificial intelligence to evaluate color fundus photographs (CFPs). We present the Justified Referral in AI Glaucoma Screening (JustRAIGS) challenge to further develop these AI algorithms for glaucoma screening and to assess their efficacy. To support this challenge, we have generated a distinctive big dataset containing more than 110,000 meticulously labeled CFPs obtained from approximately 60,000 patients and 500 distinct screening centers in the USA. Our objective is to assess the practicality of creating advanced and dependable AI systems that can take a CFP as input and produce the probability of referable glaucoma, as well as outputs for glaucoma justification by integrating both binary and multi-label classification tasks. This paper presents the evaluation of solutions provided by nine teams, recognizing the team with the highest level of performance. The highest achieved score of sensitivity at a specificity level of 95% was 85%, and the highest achieved score of Hamming losses average was 0.13. Additionally, we test the top three participants’ algorithms on an external dataset to validate the performance and generalization of these models. The outcomes of this research can offer valuable insights into the development of intelligent systems for detecting glaucoma. Ultimately, findings can aid in the early detection and treatment of glaucoma patients, hence decreasing preventable vision impairment and blindness caused by glaucoma.
青光眼是造成永久性视力丧失的主要原因。早期诊断对于预防青光眼导致的视力丧失至关重要,因此青光眼筛查必不可少。通过应用人工智能来评估彩色眼底照片(CFPs),可以实现一种更实惠的青光眼筛查方法。我们提出了人工智能青光眼筛查的合理转诊(JustRAIGS)挑战,以进一步开发用于青光眼筛查的人工智能算法并评估其有效性。为了支持这一挑战,我们生成了一个独特的大数据集,其中包含来自美国约60,000名患者和500个不同筛查中心的超过110,000个精心标记的cfp。我们的目标是评估创建先进可靠的人工智能系统的可行性,该系统可以将CFP作为输入,并通过整合二元和多标签分类任务,产生可参考青光眼的概率,以及青光眼的判断输出。本文对九个团队提供的解决方案进行了评估,并对绩效最高的团队进行了表彰。在95%特异性水平下,灵敏度最高得分为85%,汉明损失平均最高得分为0.13。此外,我们在外部数据集上测试了前三名参与者的算法,以验证这些模型的性能和泛化。这项研究的结果可以为青光眼检测智能系统的发展提供有价值的见解。最终,这些发现有助于青光眼患者的早期发现和治疗,从而减少由青光眼引起的可预防的视力损害和失明。
{"title":"JustRAIGS: Justified Referral in AI Glaucoma Screening Challenge","authors":"Yeganeh Madadi;Hina Raja;Koenraad A. Vermeer;Hans G. Lemij;Xiaoqin Huang;Eunjin Kim;Seunghoon Lee;Gitaek Kwon;Hyunwoo Kim;Jaeyoung Kim;Adrian Galdran;Miguel A. González Ballester;Dan Presil;Kristhian Aguilar;Victor Cavalcante;Celso Carvalho;Waldir Sabino;Mateus Oliveira;Hui Lin;Charilaos Apostolidis;Aggelos K. Katsaggelos;Tomasz Kubrak;Á. Casado-García;J. Heras;M. Ortega;L. Ramos;Philippe Zhang;Yihao Li;Jing Zhang;Weili Jiang;Pierre-Henri Conze;Mathieu Lamard;Gwenole Quellec;Mostafa El Habib Daho;Madukuri Shaurya;Anumeha Varma;Monika Agrawal;Siamak Yousefi","doi":"10.1109/TMI.2025.3596874","DOIUrl":"10.1109/TMI.2025.3596874","url":null,"abstract":"A major contributor to permanent vision loss is glaucoma. Early diagnosis is crucial for preventing vision loss due to glaucoma, making glaucoma screening essential. A more affordable method of glaucoma screening can be achieved by applying artificial intelligence to evaluate color fundus photographs (CFPs). We present the Justified Referral in AI Glaucoma Screening (JustRAIGS) challenge to further develop these AI algorithms for glaucoma screening and to assess their efficacy. To support this challenge, we have generated a distinctive big dataset containing more than 110,000 meticulously labeled CFPs obtained from approximately 60,000 patients and 500 distinct screening centers in the USA. Our objective is to assess the practicality of creating advanced and dependable AI systems that can take a CFP as input and produce the probability of referable glaucoma, as well as outputs for glaucoma justification by integrating both binary and multi-label classification tasks. This paper presents the evaluation of solutions provided by nine teams, recognizing the team with the highest level of performance. The highest achieved score of sensitivity at a specificity level of 95% was 85%, and the highest achieved score of Hamming losses average was 0.13. Additionally, we test the top three participants’ algorithms on an external dataset to validate the performance and generalization of these models. The outcomes of this research can offer valuable insights into the development of intelligent systems for detecting glaucoma. Ultimately, findings can aid in the early detection and treatment of glaucoma patients, hence decreasing preventable vision impairment and blindness caused by glaucoma.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"45 1","pages":"320-335"},"PeriodicalIF":0.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11119643","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144796864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLM-CPL: Consensus Pseudo-Labels From Vision-Language Models for Annotation-Free Pathological Image Classification VLM-CPL:用于无注释病理图像分类的一致伪标签视觉语言模型
Pub Date : 2025-08-04 DOI: 10.1109/TMI.2025.3595111
Lanfeng Zhong;Zongyao Huang;Yang Liu;Wenjun Liao;Shichuan Zhang;Guotai Wang;Shaoting Zhang
Classification of pathological images is the basis for automatic cancer diagnosis. Despite that deep learning methods have achieved remarkable performance, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo-labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain gap between the pre-training and target datasets. To address this issue, we introduce VLM-CPL, a novel approach that contains two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo-labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo-labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo-labels. We further propose High-confidence Cross Supervision by to learn from samples with reliable pseudo-labels and the remaining unlabeled samples. Additionally, we present an innovative open-set prompting strategy that filters irrelevant patches from whole slides to enhance the quality of selected patches. Experimental results on five public pathological image datasets for patch-level and slide-level classification showed that our method substantially outperformed zero-shot classification by VLMs, and was superior to existing noisy label learning methods. The code is publicly available at https://github.com/HiLab-git/VLM-CPL
病理图像的分类是肿瘤自动诊断的基础。尽管深度学习方法取得了显著的成绩,但它们严重依赖于标记数据,需要大量的人工注释工作。在这项研究中,我们提出了一种利用预训练的视觉语言模型(VLMs)的新的人类无注释方法。在没有人工标注的情况下,利用VLM的零射击推理能力获得训练集的伪标签,由于预训练集和目标数据集之间存在域间隙,可能包含大量噪声。为了解决这个问题,我们引入了VLM-CPL,这是一种包含两种带有半监督学习策略的噪声标签过滤技术的新方法。具体而言,我们首先利用VLM的多个增强视图,通过零射击推理获得具有不确定性估计的基于提示的伪标签。然后,利用VLM的特征表示能力,在特征空间中通过样本聚类获得基于特征的伪标签。基于两类伪标签之间的一致性,引入提示特征一致性来选择可靠样本。我们进一步提出了高置信度交叉监督,通过从具有可靠伪标签的样本和剩余未标记的样本中学习。此外,我们提出了一种创新的开放集提示策略,从整个幻灯片中过滤不相关的补丁,以提高所选补丁的质量。在5个公开的病理图像数据集上进行斑块级和幻灯片级分类的实验结果表明,我们的方法在很大程度上优于vlm的零射击分类,并且优于现有的噪声标签学习方法。该代码可在https://github.com/HiLab-git/VLM-CPL上公开获得
{"title":"VLM-CPL: Consensus Pseudo-Labels From Vision-Language Models for Annotation-Free Pathological Image Classification","authors":"Lanfeng Zhong;Zongyao Huang;Yang Liu;Wenjun Liao;Shichuan Zhang;Guotai Wang;Shaoting Zhang","doi":"10.1109/TMI.2025.3595111","DOIUrl":"10.1109/TMI.2025.3595111","url":null,"abstract":"Classification of pathological images is the basis for automatic cancer diagnosis. Despite that deep learning methods have achieved remarkable performance, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo-labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain gap between the pre-training and target datasets. To address this issue, we introduce VLM-CPL, a novel approach that contains two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo-labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo-labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo-labels. We further propose High-confidence Cross Supervision by to learn from samples with reliable pseudo-labels and the remaining unlabeled samples. Additionally, we present an innovative open-set prompting strategy that filters irrelevant patches from whole slides to enhance the quality of selected patches. Experimental results on five public pathological image datasets for patch-level and slide-level classification showed that our method substantially outperformed zero-shot classification by VLMs, and was superior to existing noisy label learning methods. The code is publicly available at <uri>https://github.com/HiLab-git/VLM-CPL</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 10","pages":"4023-4036"},"PeriodicalIF":0.0,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144778204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ToothMaker: Realistic Panoramic Dental Radiograph Generation via Disentangled Control 牙机:现实全景牙科x光片生成通过解开控制。
Pub Date : 2025-07-28 DOI: 10.1109/TMI.2025.3588466
Weihao Yu;Xiaoqing Guo;Wuyang Li;Xinyu Liu;Hui Chen;Yixuan Yuan
Generating high-fidelity dental radiographs is essential for training diagnostic models. Despite the development of numerous methods for other medical data, generative approaches in dental radiology remain unexplored. Due to the intricate tooth structures and specialized terminology, these methods often yield ambiguous tooth regions and incorrect dental concepts when applied to dentistry. In this paper, we take the first attempt to investigate diffusion-based teeth X-ray image generation and propose ToothMaker, a novel framework specifically designed for the dental domain. Firstly, to synthesize X-ray images that possess accurate tooth structures and realistic radiological styles simultaneously, we design control-disentangled fine-tuning (CDFT) strategy. Specifically, we present two separate controllers to handle style and layout control respectively, and introduce a gradient-based decoupling method that optimizes each using their corresponding disentangled gradients. Secondly, to enhance model’s understanding of dental terminology, we propose prior-disentangled guidance module (PDGM), enabling precise synthesis of dental concepts. It utilizes large language model to decompose dental terminology into a series of meta-knowledge elements and performs interactions and refinements through hypergraph neural network. These elements are then fed into the network to guide the generation of dental concepts. Extensive experiments demonstrate the high fidelity and diversity of the images synthesized by our approach. By incorporating the generated data, we achieve substantial performance improvements on downstream segmentation and visual question answering tasks, indicating that our method can greatly reduce the reliance on manually annotated data. Code will be public available at https://github.com/CUHK-AIM-Group/ToothMaker
生成高保真牙科x光片对于训练诊断模型至关重要。尽管发展了许多其他医学数据的方法,但牙科放射学的生成方法仍然未被探索。由于复杂的牙齿结构和专业术语,这些方法往往产生模糊的牙齿区域和不正确的牙科概念,当应用于牙科。在本文中,我们首次尝试研究基于扩散的牙齿x射线图像生成,并提出了专门为牙科领域设计的新框架ToothMaker。首先,为了合成同时具有准确牙齿结构和真实放射风格的x射线图像,我们设计了控制解纠缠微调(CDFT)策略。具体来说,我们分别提出了两个独立的控制器来处理样式和布局控制,并引入了一种基于梯度的解耦方法,该方法使用它们对应的解纠缠梯度来优化每个控制器。其次,为了增强模型对牙科术语的理解,我们提出了先验解纠缠引导模块(prior- disentanglement guidance module, PDGM),实现了牙科概念的精确合成。它利用大型语言模型将牙科术语分解为一系列元知识元素,并通过超图神经网络进行交互和细化。然后将这些元素输入到网络中,以指导牙科概念的生成。大量的实验表明,该方法合成的图像具有较高的保真度和多样性。通过合并生成的数据,我们在下游分割和可视化问答任务上取得了实质性的性能改进,这表明我们的方法可以大大减少对手动注释数据的依赖。代码将在https://github.com/CUHK-AIM-Group/ToothMaker上公开。
{"title":"ToothMaker: Realistic Panoramic Dental Radiograph Generation via Disentangled Control","authors":"Weihao Yu;Xiaoqing Guo;Wuyang Li;Xinyu Liu;Hui Chen;Yixuan Yuan","doi":"10.1109/TMI.2025.3588466","DOIUrl":"10.1109/TMI.2025.3588466","url":null,"abstract":"Generating high-fidelity dental radiographs is essential for training diagnostic models. Despite the development of numerous methods for other medical data, generative approaches in dental radiology remain unexplored. Due to the intricate tooth structures and specialized terminology, these methods often yield ambiguous tooth regions and incorrect dental concepts when applied to dentistry. In this paper, we take the first attempt to investigate diffusion-based teeth X-ray image generation and propose ToothMaker, a novel framework specifically designed for the dental domain. Firstly, to synthesize X-ray images that possess accurate tooth structures and realistic radiological styles simultaneously, we design control-disentangled fine-tuning (CDFT) strategy. Specifically, we present two separate controllers to handle style and layout control respectively, and introduce a gradient-based decoupling method that optimizes each using their corresponding disentangled gradients. Secondly, to enhance model’s understanding of dental terminology, we propose prior-disentangled guidance module (PDGM), enabling precise synthesis of dental concepts. It utilizes large language model to decompose dental terminology into a series of meta-knowledge elements and performs interactions and refinements through hypergraph neural network. These elements are then fed into the network to guide the generation of dental concepts. Extensive experiments demonstrate the high fidelity and diversity of the images synthesized by our approach. By incorporating the generated data, we achieve substantial performance improvements on downstream segmentation and visual question answering tasks, indicating that our method can greatly reduce the reliance on manually annotated data. Code will be public available at <uri>https://github.com/CUHK-AIM-Group/ToothMaker</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5233-5244"},"PeriodicalIF":0.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144720145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topology Optimization in Medical Image Segmentation With Fast χ Euler Characteristic 基于快速χ欧拉特征的医学图像分割拓扑优化。
Pub Date : 2025-07-28 DOI: 10.1109/TMI.2025.3589495
Liu Li;Qiang Ma;Cheng Ouyang;Johannes C. Paetzold;Daniel Rueckert;Bernhard Kainz
Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image segmentation, the correctness of a segmentation in terms of the required topological genus sometimes is even more important than the pixel-wise accuracy. Existing topology-aware approaches commonly estimate and constrain the topological structure via the concept of persistent homology (PH). However, these methods are difficult to implement for high dimensional data due to their polynomial computational complexity. To overcome this problem, we propose a novel and fast approach for topology-aware segmentation based on the Euler Characteristic ( $chi $ ). First, we propose a fast formulation for $chi $ computation in both 2D and 3D. The scalar $chi $ error between the prediction and ground-truth serves as the topological evaluation metric. Then we estimate the spatial topology correctness of any segmentation network via a so-called topological violation map, i.e., a detailed map that highlights regions with $chi $ errors. Finally, the segmentation results from the arbitrary network are refined based on the topological violation maps by a topology-aware correction network. Our experiments are conducted on both 2D and 3D datasets and show that our method can significantly improve topological correctness while preserving pixel-wise segmentation accuracy.
基于深度学习的医学图像分割技术在基于传统指标(如Dice分数或交集-over- union)进行评估时显示出了有希望的结果。然而,这些全自动方法往往不能满足临床可接受的精度,特别是当需要观察拓扑约束时,例如连续边界或封闭表面。在医学图像分割中,根据所需的拓扑属进行分割的正确性有时甚至比像素精度更重要。现有的拓扑感知方法通常通过持久同构(PH)的概念来估计和约束拓扑结构。然而,这些方法由于其多项式计算复杂度而难以实现高维数据。为了克服这一问题,我们提出了一种基于欧拉特征(χ)的新颖快速的拓扑感知分割方法。首先,我们提出了二维和三维χ计算的快速公式。预测值与真值之间的标量χ误差作为拓扑评价指标。然后,我们通过所谓的拓扑冲突图来估计任何分割网络的空间拓扑正确性,即突出显示具有χ误差的区域的详细地图。最后,利用拓扑感知校正网络对任意网络的分割结果进行拓扑冲突映射的细化。我们在2D和3D数据集上进行了实验,结果表明我们的方法可以显著提高拓扑正确性,同时保持像素级分割的准确性。
{"title":"Topology Optimization in Medical Image Segmentation With Fast χ Euler Characteristic","authors":"Liu Li;Qiang Ma;Cheng Ouyang;Johannes C. Paetzold;Daniel Rueckert;Bernhard Kainz","doi":"10.1109/TMI.2025.3589495","DOIUrl":"10.1109/TMI.2025.3589495","url":null,"abstract":"Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image segmentation, the correctness of a segmentation in terms of the required topological genus sometimes is even more important than the pixel-wise accuracy. Existing topology-aware approaches commonly estimate and constrain the topological structure via the concept of persistent homology (PH). However, these methods are difficult to implement for high dimensional data due to their polynomial computational complexity. To overcome this problem, we propose a novel and fast approach for topology-aware segmentation based on the Euler Characteristic (<inline-formula> <tex-math>$chi $ </tex-math></inline-formula>). First, we propose a fast formulation for <inline-formula> <tex-math>$chi $ </tex-math></inline-formula> computation in both 2D and 3D. The scalar <inline-formula> <tex-math>$chi $ </tex-math></inline-formula> error between the prediction and ground-truth serves as the topological evaluation metric. Then we estimate the spatial topology correctness of any segmentation network via a so-called topological violation map, i.e., a detailed map that highlights regions with <inline-formula> <tex-math>$chi $ </tex-math></inline-formula> errors. Finally, the segmentation results from the arbitrary network are refined based on the topological violation maps by a topology-aware correction network. Our experiments are conducted on both 2D and 3D datasets and show that our method can significantly improve topological correctness while preserving pixel-wise segmentation accuracy.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5221-5232"},"PeriodicalIF":0.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144720144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DDaTR: Dynamic Difference-Aware Temporal Residual Network for Longitudinal Radiology Report Generation 纵向放射学报告生成的动态差分感知时间残差网络
Pub Date : 2025-07-22 DOI: 10.1109/TMI.2025.3591364
Shanshan Song;Hui Tang;Honglong Yang;Xiaomeng Li
Radiology Report Generation (RRG) automates the creation of radiology reports from medical imaging, enhancing the efficiency of the reporting process. Longitudinal Radiology Report Generation (LRRG) extends RRG by incorporating the ability to compare current and prior exams, facilitating the tracking of temporal changes in clinical findings. Existing LRRG approaches only extract features from prior and current images using a visual pre-trained encoder, which are then concatenated to generate the final report. However, these methods struggle to effectively capture both spatial and temporal correlations during the feature extraction process. Consequently, the extracted features inadequately capture the information of difference across exams and thus underrepresent the expected progressions, leading to sub-optimal performance in LRRG. To address this, we develop a novel dynamic difference-aware temporal residual network (DDaTR). In DDaTR, we introduce two modules at each stage of the visual encoder to capture multi-level spatial correlations. The Dynamic Feature Alignment Module (DFAM) is designed to align prior features across modalities for the integrity of prior clinical information. Prompted by the enriched prior features, the dynamic difference-aware module (DDAM) captures favorable difference information by identifying relationships across exams. Furthermore, our DDaTR employs the dynamic residual network to unidirectionally transmit longitudinal information, effectively modeling temporal correlations. Extensive experiments demonstrated superior performance over existing methods on three benchmarks, proving its efficacy in both RRG and LRRG tasks. Our code is published at https://github.com/xmed-lab/DDaTR
放射学报告生成(RRG)可以从医学成像中自动创建放射学报告,提高报告过程的效率。纵向放射学报告生成系统(LRRG)扩展了纵向放射学报告生成系统,结合了比较当前和先前检查的能力,促进了对临床发现的时间变化的跟踪。现有的LRRG方法仅使用视觉预训练编码器从先前和当前图像中提取特征,然后将其连接起来生成最终报告。然而,这些方法在特征提取过程中难以有效地捕获空间和时间相关性。因此,提取的特征不能充分捕获考试之间的差异信息,从而不能充分代表预期的进展,导致LRRG的性能不理想。为了解决这个问题,我们开发了一种新的动态差分感知时间残差网络(DDaTR)。在DDaTR中,我们在视觉编码器的每个阶段引入两个模块来捕获多层次的空间相关性。动态特征对齐模块(DFAM)旨在跨模式对齐先前的特征,以确保先前临床信息的完整性。在丰富的先验特征的提示下,动态差异感知模块(DDAM)通过识别考试之间的关系来捕获有利的差异信息。此外,我们的DDaTR采用动态残差网络单向传输纵向信息,有效地模拟了时间相关性。大量的实验表明,在三个基准测试中,该方法优于现有方法,证明了其在RRG和LRRG任务中的有效性。我们的代码发布在https://github.com/xmed-lab/DDaTR
{"title":"DDaTR: Dynamic Difference-Aware Temporal Residual Network for Longitudinal Radiology Report Generation","authors":"Shanshan Song;Hui Tang;Honglong Yang;Xiaomeng Li","doi":"10.1109/TMI.2025.3591364","DOIUrl":"10.1109/TMI.2025.3591364","url":null,"abstract":"Radiology Report Generation (RRG) automates the creation of radiology reports from medical imaging, enhancing the efficiency of the reporting process. Longitudinal Radiology Report Generation (LRRG) extends RRG by incorporating the ability to compare current and prior exams, facilitating the tracking of temporal changes in clinical findings. Existing LRRG approaches only extract features from prior and current images using a visual pre-trained encoder, which are then concatenated to generate the final report. However, these methods struggle to effectively capture both spatial and temporal correlations during the feature extraction process. Consequently, the extracted features inadequately capture the information of difference across exams and thus underrepresent the expected progressions, leading to sub-optimal performance in LRRG. To address this, we develop a novel dynamic difference-aware temporal residual network (DDaTR). In DDaTR, we introduce two modules at each stage of the visual encoder to capture multi-level spatial correlations. The Dynamic Feature Alignment Module (DFAM) is designed to align prior features across modalities for the integrity of prior clinical information. Prompted by the enriched prior features, the dynamic difference-aware module (DDAM) captures favorable difference information by identifying relationships across exams. Furthermore, our DDaTR employs the dynamic residual network to unidirectionally transmit longitudinal information, effectively modeling temporal correlations. Extensive experiments demonstrated superior performance over existing methods on three benchmarks, proving its efficacy in both RRG and LRRG tasks. Our code is published at <uri>https://github.com/xmed-lab/DDaTR</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 12","pages":"5345-5357"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1