首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
S&D Messenger: Exchanging Semantic and Domain Knowledge for Generic Semi-Supervised Medical Image Segmentation 通用半监督医学图像分割的语义和领域知识交换
Pub Date : 2025-06-03 DOI: 10.1109/TMI.2025.3576163
Qixiang Zhang;Haonan Wang;Xiaomeng Li
Semi-supervised medical image segmentation (SSMIS) has emerged as a promising solution to tackle the challenges of time-consuming manual labeling in the medical field. However, in practical scenarios, there are often domain variations within the datasets, leading to derivative scenarios like semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA). In this paper, we aim to develop a generic framework that masters all three tasks. We notice a critical shared challenge across three scenarios: the explicit semantic knowledge for segmentation performance and rich domain knowledge for generalizability exclusively exist in the labeled set and unlabeled set respectively. Such discrepancy hinders existing methods from effectively comprehending both types of knowledge under semi-supervised settings. To tackle this challenge, we develop a Semantic & Domain Knowledge Messenger (S&D Messenger) which facilitates direct knowledge delivery between the labeled and unlabeled set, and thus allowing the model to comprehend both of them in each individual learning flow. Equipped with our S&D Messenger, a naive pseudo-labeling method can achieve huge improvement on ten benchmark datasets for SSMIS (+7.5%), UMDA (+5.6%), and Semi-MDG tasks (+1.14%), compared with state-of-the-art methods designed for specific tasks.
半监督医学图像分割(SSMIS)已成为解决医疗领域耗时的人工标记挑战的一种有前途的解决方案。然而,在实际场景中,数据集中经常存在域变化,导致衍生场景,如半监督医疗域泛化(Semi-MDG)和无监督医疗域自适应(UMDA)。在本文中,我们的目标是开发一个通用框架,掌握这三个任务。我们注意到在三种情况下存在一个关键的共同挑战:用于分割性能的显式语义知识和用于泛化的丰富领域知识分别只存在于标记集和未标记集中。这种差异阻碍了现有方法在半监督设置下有效地理解这两类知识。为了解决这一挑战,我们开发了一个语义和领域知识信使(S&D Messenger),它促进了标记集和未标记集之间的直接知识传递,从而允许模型在每个单独的学习流中理解它们。与针对特定任务设计的最先进方法相比,配备我们的S&D Messenger的朴素伪标记方法可以在SSMIS (+7.5%), UMDA(+5.6%)和Semi-MDG任务(+1.14%)的10个基准数据集上实现巨大的改进。
{"title":"S&D Messenger: Exchanging Semantic and Domain Knowledge for Generic Semi-Supervised Medical Image Segmentation","authors":"Qixiang Zhang;Haonan Wang;Xiaomeng Li","doi":"10.1109/TMI.2025.3576163","DOIUrl":"10.1109/TMI.2025.3576163","url":null,"abstract":"Semi-supervised medical image segmentation (SSMIS) has emerged as a promising solution to tackle the challenges of time-consuming manual labeling in the medical field. However, in practical scenarios, there are often domain variations within the datasets, leading to derivative scenarios like semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA). In this paper, we aim to develop a generic framework that masters all three tasks. We notice a critical shared challenge across three scenarios: the explicit semantic knowledge for segmentation performance and rich domain knowledge for generalizability exclusively exist in the labeled set and unlabeled set respectively. Such discrepancy hinders existing methods from effectively comprehending both types of knowledge under semi-supervised settings. To tackle this challenge, we develop a Semantic & Domain Knowledge Messenger (S&D Messenger) which facilitates direct knowledge delivery between the labeled and unlabeled set, and thus allowing the model to comprehend both of them in each individual learning flow. Equipped with our S&D Messenger, a naive pseudo-labeling method can achieve huge improvement on ten benchmark datasets for SSMIS (+7.5%), UMDA (+5.6%), and Semi-MDG tasks (+1.14%), compared with state-of-the-art methods designed for specific tasks.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4487-4498"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disease-Grading Networks With Asymmetric Gaussian Distribution for Medical Imaging 非对称高斯分布的医学影像疾病分级网络
Pub Date : 2025-06-02 DOI: 10.1109/TMI.2025.3575402
Wenqiang Tang;Zhouwang Yang
Deep learning-based disease grading technologies facilitate timely medical intervention due to their high efficiency and accuracy. Recent advancements have enhanced grading performance by incorporating the ordinal relationships of disease labels. However, existing methods often assume same probability distributions for disease labels across instances within the same category, overlooking variations in label distributions. Additionally, the hyperparameters of these distributions are typically determined empirically, which may not accurately reflect the true distribution. To address these limitations, we propose a disease grading network utilizing a sample-aware asymmetric Gaussian label distribution, termed DGN-AGLD. This approach includes a variance predictor designed to learn and predict parameters that control the asymmetry of the Gaussian distribution, enabling distinct label distributions within the same category. This module can be seamlessly integrated into standard deep learning networks. Experimental results on four disease datasets validate the effectiveness and superiority of the proposed method, particularly on the IDRiD dataset, where it achieves a diabetic retinopathy accuracy of 77.67%. Furthermore, our method extends to joint disease grading tasks, yielding superior results and demonstrating significant generalization capabilities. Visual analysis indicates that our method more accurately captures the trend of disease progression by leveraging the asymmetry in label distribution. Our code is publicly available on https://github.com/ahtwq/AGNet
基于深度学习的疾病分级技术由于其高效率和准确性,有助于及时的医疗干预。最近的进展通过纳入疾病标签的顺序关系来增强分级性能。然而,现有的方法通常假设同一类别中不同实例的疾病标签的概率分布相同,忽略了标签分布的变化。此外,这些分布的超参数通常是经验确定的,可能不能准确反映真实分布。为了解决这些限制,我们提出了一个疾病分级网络,利用样本感知的非对称高斯标签分布,称为DGN-AGLD。该方法包括一个方差预测器,用于学习和预测控制高斯分布不对称性的参数,从而在同一类别中实现不同的标签分布。该模块可以无缝集成到标准的深度学习网络中。在四个疾病数据集上的实验结果验证了该方法的有效性和优越性,特别是在IDRiD数据集上,该方法对糖尿病视网膜病变的准确率达到77.67%。此外,我们的方法扩展到关节疾病分级任务,产生了更好的结果,并展示了显著的泛化能力。视觉分析表明,我们的方法通过利用标签分布的不对称性更准确地捕捉疾病进展的趋势。我们的代码可以在https://github.com/ahtwq/AGNet上公开获得
{"title":"Disease-Grading Networks With Asymmetric Gaussian Distribution for Medical Imaging","authors":"Wenqiang Tang;Zhouwang Yang","doi":"10.1109/TMI.2025.3575402","DOIUrl":"10.1109/TMI.2025.3575402","url":null,"abstract":"Deep learning-based disease grading technologies facilitate timely medical intervention due to their high efficiency and accuracy. Recent advancements have enhanced grading performance by incorporating the ordinal relationships of disease labels. However, existing methods often assume same probability distributions for disease labels across instances within the same category, overlooking variations in label distributions. Additionally, the hyperparameters of these distributions are typically determined empirically, which may not accurately reflect the true distribution. To address these limitations, we propose a disease grading network utilizing a sample-aware asymmetric Gaussian label distribution, termed DGN-AGLD. This approach includes a variance predictor designed to learn and predict parameters that control the asymmetry of the Gaussian distribution, enabling distinct label distributions within the same category. This module can be seamlessly integrated into standard deep learning networks. Experimental results on four disease datasets validate the effectiveness and superiority of the proposed method, particularly on the IDRiD dataset, where it achieves a diabetic retinopathy accuracy of 77.67%. Furthermore, our method extends to joint disease grading tasks, yielding superior results and demonstrating significant generalization capabilities. Visual analysis indicates that our method more accurately captures the trend of disease progression by leveraging the asymmetry in label distribution. Our code is publicly available on <uri>https://github.com/ahtwq/AGNet</uri>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4457-4472"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models 基于蒙面视觉模型的高效医学视觉语言对齐
Pub Date : 2025-06-02 DOI: 10.1109/TMI.2025.3575853
Chenyu Lian;Hong-Yu Zhou;Dongyun Liang;Jing Qin;Liansheng Wang
Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at https://github.com/DopamineLcy/ALTA.
基于跨模态对比学习的医学视觉语言对齐在检索和零样本分类等图像-文本匹配任务中表现出良好的性能。然而,传统的跨模态对比学习(基于clip的)方法存在视觉表征能力欠佳的问题,这也限制了它们在视觉语言对齐方面的有效性。相比之下,尽管通过多模态掩模建模预训练的模型难以直接进行跨模态匹配,但它们在视觉表现方面表现出色。为了解决这一矛盾,我们提出了ALTA (ALign Through adaptation),这是一种高效的医学视觉语言对齐方法,它只利用了约8%的可训练参数和不到1/5的掩模记录建模所需的计算消耗。ALTA采用掩码记录建模的预训练视觉模型,在检索和零射击分类等视觉语言匹配任务中取得了优异的性能。此外,我们整合了时间-多视图x光片输入,以增强x光片与其报告中相应描述之间的信息一致性,进一步改善视觉语言一致性。实验评估表明,ALTA在文本到图像的准确性方面比表现最好的同类产品高出4%以上的绝对分数,在图像到文本的检索准确性方面高出约6%的绝对分数。在有效对齐过程中,视觉语言模型的适应也促进了更好的视觉和语言理解。代码可在https://github.com/DopamineLcy/ALTA上公开获取。
{"title":"Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models","authors":"Chenyu Lian;Hong-Yu Zhou;Dongyun Liang;Jing Qin;Liansheng Wang","doi":"10.1109/TMI.2025.3575853","DOIUrl":"10.1109/TMI.2025.3575853","url":null,"abstract":"Medical vision-language alignment through cross-modal contrastive learning shows promising performance in image-text matching tasks, such as retrieval and zero-shot classification. However, conventional cross-modal contrastive learning (CLIP-based) methods suffer from suboptimal visual representation capabilities, which also limits their effectiveness in vision-language alignment. In contrast, although the models pretrained via multimodal masked modeling struggle with direct cross-modal matching, they excel in visual representation. To address this contradiction, we propose ALTA (ALign Through Adapting), an efficient medical vision-language alignment method that utilizes only about 8% of the trainable parameters and less than 1/5 of the computational consumption required for masked record modeling. ALTA achieves superior performance in vision-language matching tasks like retrieval and zero-shot classification by adapting the pretrained vision model from masked record modeling. Additionally, we integrate temporal-multiview radiograph inputs to enhance the information consistency between radiographs and their corresponding descriptions in reports, further improving the vision-language alignment. Experimental evaluations show that ALTA outperforms the best-performing counterpart by over 4% absolute points in text-to-image accuracy and approximately 6% absolute points in image-to-text retrieval accuracy. The adaptation of vision-language models during efficient alignment also promotes better vision and language understanding. Code is publicly available at <uri>https://github.com/DopamineLcy/ALTA</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 11","pages":"4499-4510"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mamba-Sea: A Mamba-Based Framework With Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation Mamba-Sea:一种基于mamba的全局到局部序列增强框架,用于广义医学图像分割
Pub Date : 2025-04-30 DOI: 10.1109/TMI.2025.3564765
Zihan Cheng;Jintao Guo;Jian Zhang;Lei Qi;Luping Zhou;Yinghuan Shi;Yang Gao
To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation. The success of Mamba is primarily owing to its ability to capture long-range dependencies while keeping linear complexity with input sequence length, making it a promising alternative to CNNs and ViTs. Inspired by the success, in the paper, we explore the potential of the Mamba architecture to address distribution shifts in DG for medical image segmentation. Specifically, we propose a novel Mamba-based framework, Mamba-Sea, incorporating global-to-local sequence augmentation to improve the model’s generalizability under domain shift issues. Our Mamba-Sea introduces a global augmentation mechanism designed to simulate potential variations in appearance across different sites, aiming to suppress the model’s learning of domain-specific information. At the local level, we propose a sequence-wise augmentation along input sequences, which perturbs the style of tokens within random continuous sub-sequences by modeling and resampling style statistics associated with domain shifts. To our best knowledge, Mamba-Sea is the first work to explore the generalization of Mamba for medical image segmentation, providing an advanced and promising Mamba-based architecture with strong robustness to domain shifts. Remarkably, our proposed method is the first to surpass a Dice coefficient of 90% on the Prostate dataset, which exceeds previous SOTA of 88.61%. The code is available at https://github.com/orange-czh/Mamba-Sea.
为了分割具有分布移位的医学图像,域泛化(DG)已经成为一种很有前途的设置,可以在源域上训练模型,从而可以泛化到未知的目标域。现有的DG方法主要基于CNN或ViT架构。近年来,以Mamba为代表的先进状态空间模型在各种监督医学图像分割中显示出良好的效果。Mamba的成功主要是由于它能够捕获远程依赖关系,同时保持输入序列长度的线性复杂性,使其成为cnn和ViTs的有希望的替代品。受成功的启发,在本文中,我们探索了曼巴架构的潜力,以解决医学图像分割中DG的分布变化。具体来说,我们提出了一个新的基于mamba的框架,Mamba-Sea,结合全局到局部序列增强来提高模型在域漂移问题下的泛化性。我们的Mamba-Sea引入了一个全局增强机制,旨在模拟不同站点外观的潜在变化,旨在抑制模型对特定领域信息的学习。在局部层面上,我们提出了一种沿输入序列的序列增强方法,该方法通过建模和重采样与域移位相关的样式统计来干扰随机连续子序列中的令牌样式。据我们所知,Mamba- sea是第一个探索Mamba医学图像分割的推广工作,提供了一个先进的和有前途的基于Mamba的架构,具有很强的鲁棒性。值得注意的是,我们提出的方法是第一个在前列腺数据集上超过90% Dice系数的方法,这超过了之前的SOTA(88.61%)。代码可在https://github.com/orange-czh/Mamba-Sea上获得。
{"title":"Mamba-Sea: A Mamba-Based Framework With Global-to-Local Sequence Augmentation for Generalizable Medical Image Segmentation","authors":"Zihan Cheng;Jintao Guo;Jian Zhang;Lei Qi;Luping Zhou;Yinghuan Shi;Yang Gao","doi":"10.1109/TMI.2025.3564765","DOIUrl":"10.1109/TMI.2025.3564765","url":null,"abstract":"To segment medical images with distribution shifts, domain generalization (DG) has emerged as a promising setting to train models on source domains that can generalize to unseen target domains. Existing DG methods are mainly based on CNN or ViT architectures. Recently, advanced state space models, represented by Mamba, have shown promising results in various supervised medical image segmentation. The success of Mamba is primarily owing to its ability to capture long-range dependencies while keeping linear complexity with input sequence length, making it a promising alternative to CNNs and ViTs. Inspired by the success, in the paper, we explore the potential of the Mamba architecture to address distribution shifts in DG for medical image segmentation. Specifically, we propose a novel Mamba-based framework, Mamba-Sea, incorporating global-to-local sequence augmentation to improve the model’s generalizability under domain shift issues. Our Mamba-Sea introduces a global augmentation mechanism designed to simulate potential variations in appearance across different sites, aiming to suppress the model’s learning of domain-specific information. At the local level, we propose a sequence-wise augmentation along input sequences, which perturbs the style of tokens within random continuous sub-sequences by modeling and resampling style statistics associated with domain shifts. To our best knowledge, Mamba-Sea is the first work to explore the generalization of Mamba for medical image segmentation, providing an advanced and promising Mamba-based architecture with strong robustness to domain shifts. Remarkably, our proposed method is the first to surpass a Dice coefficient of 90% on the Prostate dataset, which exceeds previous SOTA of 88.61%. The code is available at <uri>https://github.com/orange-czh/Mamba-Sea</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 9","pages":"3741-3755"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143893129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Rib Fracture Instance Segmentation and Classification From CT on the RibFrac Challenge 基于ribrrac挑战的CT深肋骨骨折实例分割与分类
Pub Date : 2025-04-30 DOI: 10.1109/TMI.2025.3565514
Jiancheng Yang;Rui Shi;Liang Jin;Xiaoyang Huang;Kaiming Kuang;Donglai Wei;Shixuan Gu;Jianying Liu;Pengfei Liu;Zhizhong Chai;Yongjie Xiao;Hao Chen;Liming Xu;Bang Du;Xiangyi Yan;Hao Tang;Adam Alessio;Gregory Holste;Jiapeng Zhang;Xiaoming Wang;Jianye He;Lixuan Che;Hanspeter Pfister;Ming Li;Bingbing Ni
Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website (https://ribfrac.grand-challenge.org/). In addition, we further analyzed the impact of two post-challenge advancements—large-scale pretraining and rib segmentation—based on our internal baseline for rib fracture detection. These findings lay a foundation for future research and development in AI-assisted rib fracture diagnosis.
肋骨骨折是一种常见且潜在严重的损伤,在CT扫描中检测可能具有挑战性和劳动强度。虽然已经在努力解决这一领域的问题,但缺乏大规模注释数据集和评估基准阻碍了深度学习算法的开发和验证。为了解决这个问题,RibFrac挑战赛的引入,提供了来自660个CT扫描的5000多个肋骨骨折的基准数据集,具有体素级实例掩码注释和四种临床类别(扣、非移位、移位或节段)的诊断标签。挑战包括两个轨道:由froc风格度量评估的检测(实例分割)轨道和由f1风格度量评估的分类轨道。在MICCAI 2020挑战期间,对243项结果进行了评估,并邀请了7个团队参加挑战总结。分析显示,几种顶级肋骨骨折检测解决方案的性能可与人类专家媲美甚至更好。然而,目前的肋骨骨折分类方法在临床上几乎不适用,这在未来可能是一个有趣的领域。作为一个活跃的基准和研究资源,rifrac挑战的数据和在线评估可在挑战网站(https://ribfrac.grand-challenge.org/)上获得。此外,我们进一步分析了两项挑战后的进展——大规模预训练和基于肋骨骨折检测内部基线的肋骨分割。这些发现为未来人工智能辅助肋骨骨折诊断的研究和发展奠定了基础。
{"title":"Deep Rib Fracture Instance Segmentation and Classification From CT on the RibFrac Challenge","authors":"Jiancheng Yang;Rui Shi;Liang Jin;Xiaoyang Huang;Kaiming Kuang;Donglai Wei;Shixuan Gu;Jianying Liu;Pengfei Liu;Zhizhong Chai;Yongjie Xiao;Hao Chen;Liming Xu;Bang Du;Xiangyi Yan;Hao Tang;Adam Alessio;Gregory Holste;Jiapeng Zhang;Xiaoming Wang;Jianye He;Lixuan Che;Hanspeter Pfister;Ming Li;Bingbing Ni","doi":"10.1109/TMI.2025.3565514","DOIUrl":"10.1109/TMI.2025.3565514","url":null,"abstract":"Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website (<uri>https://ribfrac.grand-challenge.org/</uri>). In addition, we further analyzed the impact of two post-challenge advancements—large-scale pretraining and rib segmentation—based on our internal baseline for rib fracture detection. These findings lay a foundation for future research and development in AI-assisted rib fracture diagnosis.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3410-3427"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143893128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure Causal Models and LLMs Integration in Medical Visual Question Answering 结构因果模型与llm在医学视觉问答中的整合
Pub Date : 2025-04-29 DOI: 10.1109/TMI.2025.3564320
Zibo Xu;Qiang Li;Weizhi Nie;Weijie Wang;Anan Liu
Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers. In this work, we propose a causal inference framework for the MedVQA task, which effectively eliminates the relative confounding effect between the image and the question to ensure the precision of the question-answering (QA) session. We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements, explicitly capturing how different questions influence visual features. During optimization, we apply the mutual information to discover spurious correlations and propose a multi-variable resampling front-door adjustment method to eliminate the relative confounding effect, which aims to align features based on their true causal relevance to the question-answering task. In addition, we also introduce a prompt strategy that combines multiple prompt forms to improve the model’s ability to understand complex medical data and answer accurately. Extensive experiments on three MedVQA datasets demonstrate that 1) our method significantly improves the accuracy of MedVQA, and 2) our method achieves true causal correlations in the face of complex medical data.
医学视觉问答(MedVQA)旨在根据医学图像回答医学问题。然而,医疗数据的复杂性导致了难以观察到的混杂因素,因此图像和问题之间的偏差是不可避免的。这种跨模式的偏见使得我们很难推断出医学上有意义的答案。在这项工作中,我们提出了一个MedVQA任务的因果推理框架,该框架有效地消除了图像和问题之间的相对混淆效应,以确保问答(QA)会话的精度。我们首先引入了一种新的因果图结构,它代表了视觉和文本元素之间的相互作用,明确地捕捉了不同的问题如何影响视觉特征。在优化过程中,我们利用互信息来发现虚假相关性,并提出了一种多变量重采样前门调整方法来消除相对混淆效应,该方法旨在根据特征与问答任务的真实因果相关性来对齐特征。此外,我们还引入了多种提示形式组合的提示策略,以提高模型对复杂医疗数据的理解能力和准确回答能力。在三个MedVQA数据集上的大量实验表明:1)我们的方法显著提高了MedVQA的准确性;2)面对复杂的医疗数据,我们的方法实现了真正的因果关联。
{"title":"Structure Causal Models and LLMs Integration in Medical Visual Question Answering","authors":"Zibo Xu;Qiang Li;Weizhi Nie;Weijie Wang;Anan Liu","doi":"10.1109/TMI.2025.3564320","DOIUrl":"10.1109/TMI.2025.3564320","url":null,"abstract":"Medical Visual Question Answering (MedVQA) aims to answer medical questions according to medical images. However, the complexity of medical data leads to confounders that are difficult to observe, so bias between images and questions is inevitable. Such cross-modal bias makes it challenging to infer medically meaningful answers. In this work, we propose a causal inference framework for the MedVQA task, which effectively eliminates the relative confounding effect between the image and the question to ensure the precision of the question-answering (QA) session. We are the first to introduce a novel causal graph structure that represents the interaction between visual and textual elements, explicitly capturing how different questions influence visual features. During optimization, we apply the mutual information to discover spurious correlations and propose a multi-variable resampling front-door adjustment method to eliminate the relative confounding effect, which aims to align features based on their true causal relevance to the question-answering task. In addition, we also introduce a prompt strategy that combines multiple prompt forms to improve the model’s ability to understand complex medical data and answer accurately. Extensive experiments on three MedVQA datasets demonstrate that 1) our method significantly improves the accuracy of MedVQA, and 2) our method achieves true causal correlations in the face of complex medical data.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3476-3489"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSCPT: Few-Shot Whole Slide Image Classification With Multi-Scale and Context-Focused Prompt Tuning MSCPT:基于多尺度和上下文提示调整的少镜头整张幻灯片图像分类
Pub Date : 2025-04-29 DOI: 10.1109/TMI.2025.3564976
Minghao Han;Linhao Qu;Dingkang Yang;Xukun Zhang;Xiaoying Wang;Lihua Zhang
Multiple instance learning (MIL) has become a standard paradigm for the weakly supervised classification of whole slide images (WSIs). However, this paradigm relies on using a large number of labeled WSIs for training. The lack of training data and the presence of rare diseases pose significant challenges for these methods. Prompt tuning combined with pre-trained Vision-Language models (VLMs) is an effective solution to the Few-shot Weakly Supervised WSI Classification (FSWC) task. Nevertheless, applying prompt tuning methods designed for natural images to WSIs presents three significant challenges: 1) These methods fail to fully leverage the prior knowledge from the VLM’s text modality; 2) They overlook the essential multi-scale and contextual information in WSIs, leading to suboptimal results; and 3) They lack exploration of instance aggregation methods. To address these problems, we propose a Multi-Scale and Context-focused Prompt Tuning (MSCPT) method for FSWC task. Specifically, MSCPT employs the frozen large language model to generate pathological visual language prior knowledge at multiple scales, guiding hierarchical prompt tuning. Additionally, we design a graph prompt tuning module to learn essential contextual information within WSI, and finally, a non-parametric cross-guided instance aggregation module has been introduced to derive the WSI-level features. Extensive experiments, visualizations, and interpretability analyses were conducted on five datasets and three downstream tasks using three VLMs, demonstrating the strong performance of our MSCPT. All codes have been made publicly accessible at https://github.com/Hanminghao/MSCPT.
多实例学习(MIL)已成为全幻灯片图像弱监督分类的标准范式。然而,这种范例依赖于使用大量标记的wsi进行训练。训练数据的缺乏和罕见疾病的存在对这些方法构成了重大挑战。结合预训练视觉语言模型(VLMs)的提示调谐是解决少镜头弱监督WSI分类(FSWC)任务的有效方法。然而,将为自然图像设计的即时调优方法应用于wsi存在三个重大挑战:1)这些方法不能充分利用VLM文本模态的先验知识;2)忽略了wsi中重要的多尺度和上下文信息,导致结果不理想;3)缺乏对实例聚合方法的探索。为了解决这些问题,我们提出了一种针对FSWC任务的多尺度和以上下文为中心的提示调整(MSCPT)方法。具体而言,MSCPT利用冻结的大语言模型在多个尺度上生成病理视觉语言先验知识,指导分层提示调谐。此外,我们设计了一个图形提示调优模块来学习WSI中的基本上下文信息,最后,我们引入了一个非参数交叉引导的实例聚合模块来派生WSI级别的特性。使用三个vlm对五个数据集和三个下游任务进行了广泛的实验、可视化和可解释性分析,证明了我们的MSCPT的强大性能。所有代码都可以在https://github.com/Hanminghao/MSCPT上公开访问。
{"title":"MSCPT: Few-Shot Whole Slide Image Classification With Multi-Scale and Context-Focused Prompt Tuning","authors":"Minghao Han;Linhao Qu;Dingkang Yang;Xukun Zhang;Xiaoying Wang;Lihua Zhang","doi":"10.1109/TMI.2025.3564976","DOIUrl":"10.1109/TMI.2025.3564976","url":null,"abstract":"Multiple instance learning (MIL) has become a standard paradigm for the weakly supervised classification of whole slide images (WSIs). However, this paradigm relies on using a large number of labeled WSIs for training. The lack of training data and the presence of rare diseases pose significant challenges for these methods. Prompt tuning combined with pre-trained Vision-Language models (VLMs) is an effective solution to the Few-shot Weakly Supervised WSI Classification (FSWC) task. Nevertheless, applying prompt tuning methods designed for natural images to WSIs presents three significant challenges: 1) These methods fail to fully leverage the prior knowledge from the VLM’s text modality; 2) They overlook the essential multi-scale and contextual information in WSIs, leading to suboptimal results; and 3) They lack exploration of instance aggregation methods. To address these problems, we propose a Multi-Scale and Context-focused Prompt Tuning (MSCPT) method for FSWC task. Specifically, MSCPT employs the frozen large language model to generate pathological visual language prior knowledge at multiple scales, guiding hierarchical prompt tuning. Additionally, we design a graph prompt tuning module to learn essential contextual information within WSI, and finally, a non-parametric cross-guided instance aggregation module has been introduced to derive the WSI-level features. Extensive experiments, visualizations, and interpretability analyses were conducted on five datasets and three downstream tasks using three VLMs, demonstrating the strong performance of our MSCPT. All codes have been made publicly accessible at <uri>https://github.com/Hanminghao/MSCPT</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 9","pages":"3756-3769"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Pseudo Labeling and Active Selection for Fundus Single-Positive Multi-Label Learning 眼底单正多标签学习的共同伪标记和主动选择
Pub Date : 2025-04-28 DOI: 10.1109/TMI.2025.3565000
Tingxin Hu;Weihang Zhang;Jia Guo;Huiqi Li
Due to the difficulty of collecting multi-label annotations for retinal diseases, fundus images are usually annotated with only one label, while they actually have multiple labels. Given that deep learning requires accurate training data, incomplete disease information may lead to unsatisfactory classifiers and even misdiagnosis. To cope with these challenges, we propose a co-pseudo labeling and active selection method for Fundus Single-Positive multi-label learning, named FSP. FSP trains two networks simultaneously to generate pseudo labels through curriculum co-pseudo labeling and active sample selection. The curriculum co-pseudo labeling adjusts the thresholds according to the model’s learning status of each class. Then, the active sample selection maintains confident positive predictions with more precise pseudo labels based on loss modeling. A detailed experimental evaluation is conducted on seven retinal datasets. Comparison experiments show the effectiveness of FSP and its superiority over previous methods. Downstream experiments are also presented to validate the proposed method.
由于视网膜疾病的多标签注释难以收集,眼底图像通常只有一个标签注释,而实际上有多个标签。由于深度学习需要准确的训练数据,不完整的疾病信息可能导致分类器不满意甚至误诊。为了应对这些挑战,我们提出了一种用于眼底单阳性多标签学习的联合伪标记和主动选择方法,称为FSP。FSP同时训练两个网络,通过课程协同伪标记和主动样本选择来生成伪标签。课程协同伪标记根据模型中每个班级的学习状态调整阈值。然后,基于损失建模的主动样本选择使用更精确的伪标签保持自信的正预测。对7个视网膜数据集进行了详细的实验评估。对比实验表明了该方法的有效性和优越性。通过下游实验验证了该方法的有效性。
{"title":"Co-Pseudo Labeling and Active Selection for Fundus Single-Positive Multi-Label Learning","authors":"Tingxin Hu;Weihang Zhang;Jia Guo;Huiqi Li","doi":"10.1109/TMI.2025.3565000","DOIUrl":"10.1109/TMI.2025.3565000","url":null,"abstract":"Due to the difficulty of collecting multi-label annotations for retinal diseases, fundus images are usually annotated with only one label, while they actually have multiple labels. Given that deep learning requires accurate training data, incomplete disease information may lead to unsatisfactory classifiers and even misdiagnosis. To cope with these challenges, we propose a co-pseudo labeling and active selection method for Fundus Single-Positive multi-label learning, named FSP. FSP trains two networks simultaneously to generate pseudo labels through curriculum co-pseudo labeling and active sample selection. The curriculum co-pseudo labeling adjusts the thresholds according to the model’s learning status of each class. Then, the active sample selection maintains confident positive predictions with more precise pseudo labels based on loss modeling. A detailed experimental evaluation is conducted on seven retinal datasets. Comparison experiments show the effectiveness of FSP and its superiority over previous methods. Downstream experiments are also presented to validate the proposed method.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3428-3438"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Amplitude-Modulated Singular Value Decomposition for Ultrafast Ultrasound Imaging of Gas Vesicles 基于调幅奇异值分解的超快超声气泡成像
Pub Date : 2025-04-28 DOI: 10.1109/TMI.2025.3565023
Ge Zhang;Mathis Vert;Mohamed Nouhoum;Esteban Rivera;Nabil Haidour;Anatole Jimenez;Thomas Deffieux;Simon Barral;Pascal Hersen;Sophie Pezet;Claire Rabut;Mikhail G. Shapiro;Mickael Tanter
Ultrasound imaging holds significant promise for the observation of molecular and cellular phenomena through the utilization of acoustic contrast agents and acoustic reporter genes. Optimizing imaging methodologies for enhanced detection represents an imperative advancement in this field. Most advanced techniques relying on amplitude modulation schemes such as cross amplitude modulation (xAM) and ultrafast amplitude modulation (uAM) combined with Hadamard encoded multiplane wave transmissions have shown efficacy in capturing the acoustic signals of gas vesicles (GVs). Nonetheless, uAM sequence requires odd- or even-element transmissions leading to imprecise amplitude modulation emitting scheme, and the complex multiplane wave transmission scheme inherently yields overlong pulse durations. xAM sequence is limited in terms of field of view and imaging depth. To overcome these limitations, we introduce an innovative ultrafast imaging sequence called amplitude-modulated singular value decomposition (SVD) processing. Our method demonstrates a contrast imaging sensitivity comparable to the current gold-standard xAM and uAM, while requiring 4.8 times fewer pulse transmissions. With a similar number of transmit pulses, amplitude-modulated SVD outperforms xAM and uAM in terms of an improvement in signal-to-background ratio of $+ 4.78~pm ~0.35$ dB and $+ 8.29~pm ~3.52$ dB, respectively. Furthermore, the method exhibits superior robustness across a wide range of acoustic pressures and enables high-contrast imaging in ex vivo and in vivo settings. Furthermore, amplitude-modulated SVD is envisioned to be applicable for the detection of slow moving microbubbles in ultrasound localization microscopy (ULM).
超声成像通过声学造影剂和声学报告基因的利用,对分子和细胞现象的观察具有重要的前景。优化成像方法以增强检测是该领域势在必行的进步。基于交叉调幅(xAM)和超快调幅(uAM)等调幅方案的先进技术,结合Hadamard编码多平面波传输,在捕获气体囊泡(GVs)的声信号方面显示出了有效的效果。然而,uAM序列需要奇数元或偶数元传输,导致不精确的调幅发射方案,并且复杂的多平面波传输方案固有地产生过长的脉冲持续时间。xAM序列在视场和成像深度方面受到限制。为了克服这些限制,我们引入了一种创新的超快成像序列,称为调幅奇异值分解(SVD)处理。我们的方法显示了与当前金标准xAM和uAM相当的对比度成像灵敏度,同时需要的脉冲传输减少了4.8倍。在发射脉冲数量相似的情况下,调幅奇异值分解(SVD)的信背景比分别提高了$+ 4.78~pm ~0.35$ dB和$+ 8.29~pm ~3.52$ dB,优于xAM和uAM。此外,该方法在广泛的声压范围内表现出优越的鲁棒性,并能够在离体和体内环境中实现高对比度成像。在超声定位显微镜(ULM)中,调幅奇异值分解可用于慢速运动微泡的检测。
{"title":"Amplitude-Modulated Singular Value Decomposition for Ultrafast Ultrasound Imaging of Gas Vesicles","authors":"Ge Zhang;Mathis Vert;Mohamed Nouhoum;Esteban Rivera;Nabil Haidour;Anatole Jimenez;Thomas Deffieux;Simon Barral;Pascal Hersen;Sophie Pezet;Claire Rabut;Mikhail G. Shapiro;Mickael Tanter","doi":"10.1109/TMI.2025.3565023","DOIUrl":"10.1109/TMI.2025.3565023","url":null,"abstract":"Ultrasound imaging holds significant promise for the observation of molecular and cellular phenomena through the utilization of acoustic contrast agents and acoustic reporter genes. Optimizing imaging methodologies for enhanced detection represents an imperative advancement in this field. Most advanced techniques relying on amplitude modulation schemes such as cross amplitude modulation (xAM) and ultrafast amplitude modulation (uAM) combined with Hadamard encoded multiplane wave transmissions have shown efficacy in capturing the acoustic signals of gas vesicles (GVs). Nonetheless, uAM sequence requires odd- or even-element transmissions leading to imprecise amplitude modulation emitting scheme, and the complex multiplane wave transmission scheme inherently yields overlong pulse durations. xAM sequence is limited in terms of field of view and imaging depth. To overcome these limitations, we introduce an innovative ultrafast imaging sequence called amplitude-modulated singular value decomposition (SVD) processing. Our method demonstrates a contrast imaging sensitivity comparable to the current gold-standard xAM and uAM, while requiring 4.8 times fewer pulse transmissions. With a similar number of transmit pulses, amplitude-modulated SVD outperforms xAM and uAM in terms of an improvement in signal-to-background ratio of <inline-formula> <tex-math>$+ 4.78~pm ~0.35$ </tex-math></inline-formula> dB and <inline-formula> <tex-math>$+ 8.29~pm ~3.52$ </tex-math></inline-formula> dB, respectively. Furthermore, the method exhibits superior robustness across a wide range of acoustic pressures and enables high-contrast imaging in ex vivo and in vivo settings. Furthermore, amplitude-modulated SVD is envisioned to be applicable for the detection of slow moving microbubbles in ultrasound localization microscopy (ULM).","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3490-3501"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly Detection in Medical Images Using Encoder-Attention-2Decoders Reconstruction 基于编码器-注意力-解码器重构的医学图像异常检测
Pub Date : 2025-04-28 DOI: 10.1109/TMI.2025.3563482
Peng Tang;Xiaoxiao Yan;Xiaobin Hu;Kai Wu;Tobias Lasser;Kuangyu Shi
Anomaly detection (AD) in medical applications is a promising field, offering a cost-effective alternative to labor-intensive abnormal data collection and labeling. However, the success of feature reconstruction-based methods in AD is often hindered by two critical factors: the domain gap of pre-trained encoders and the exploration of decoder potential. The EA2D method we propose overcomes these challenges, paving the way for more effective AD in medical imaging. In this paper, we present encoder-attention-2decoder (EA2D), a novel method tailored for medical AD. Firstly, EA2D is optimized through two tasks: a primary feature reconstruction task between the encoder and decoder, which detects anomalies based on reconstruction errors, and an auxiliary transformation-consistency contrastive learning task that explicitly optimizes the encoder to reduce the domain gap between natural images and medical images. Furthermore, EA2D intensely exploits the decoder’s capabilities to improve AD performance. We introduce a self-attention skip connection to augment the reconstruction quality of normal cases, thereby magnifying the distinction between normal and abnormal samples. Additionally, we propose using dual decoders to reconstruct dual views of an image, leveraging diverse perspectives while mitigating the over-reconstruction issue of anomalies in AD. Extensive experiments across four medical image modalities demonstrates the superiority of our EA2D in various medical scenarios. Our method’s code will be released at https://github.com/TumCCC/E2AD.
异常检测(AD)在医疗应用中是一个很有前途的领域,它为劳动密集型的异常数据收集和标记提供了一种经济有效的替代方法。然而,基于特征重构方法在AD中的成功常常受到两个关键因素的阻碍:预训练编码器的域间隙和解码器潜力的探索。我们提出的EA2D方法克服了这些挑战,为医学成像中更有效的AD铺平了道路。在本文中,我们提出了编码器-注意-解码器(EA2D),这是一种为医疗AD量身定制的新方法。首先,通过两个任务对EA2D进行优化:一个是编码器和解码器之间的主要特征重构任务,该任务基于重构误差检测异常;另一个是辅助的变换一致性对比学习任务,该任务对编码器进行显式优化,以减小自然图像与医学图像之间的域间隙。此外,EA2D强烈利用解码器的能力来提高AD性能。我们引入了一个自关注跳跃连接,以提高正常情况下的重建质量,从而扩大正常和异常样本之间的区别。此外,我们建议使用双解码器来重建图像的双重视图,利用不同的视角,同时减轻AD中异常的过度重建问题。四种医学图像模式的广泛实验证明了我们的EA2D在各种医学场景中的优势。我们的方法代码将在https://github.com/TumCCC/E2AD上发布。
{"title":"Anomaly Detection in Medical Images Using Encoder-Attention-2Decoders Reconstruction","authors":"Peng Tang;Xiaoxiao Yan;Xiaobin Hu;Kai Wu;Tobias Lasser;Kuangyu Shi","doi":"10.1109/TMI.2025.3563482","DOIUrl":"10.1109/TMI.2025.3563482","url":null,"abstract":"Anomaly detection (AD) in medical applications is a promising field, offering a cost-effective alternative to labor-intensive abnormal data collection and labeling. However, the success of feature reconstruction-based methods in AD is often hindered by two critical factors: the domain gap of pre-trained encoders and the exploration of decoder potential. The EA2D method we propose overcomes these challenges, paving the way for more effective AD in medical imaging. In this paper, we present encoder-attention-2decoder (EA2D), a novel method tailored for medical AD. Firstly, EA2D is optimized through two tasks: a primary feature reconstruction task between the encoder and decoder, which detects anomalies based on reconstruction errors, and an auxiliary transformation-consistency contrastive learning task that explicitly optimizes the encoder to reduce the domain gap between natural images and medical images. Furthermore, EA2D intensely exploits the decoder’s capabilities to improve AD performance. We introduce a self-attention skip connection to augment the reconstruction quality of normal cases, thereby magnifying the distinction between normal and abnormal samples. Additionally, we propose using dual decoders to reconstruct dual views of an image, leveraging diverse perspectives while mitigating the over-reconstruction issue of anomalies in AD. Extensive experiments across four medical image modalities demonstrates the superiority of our EA2D in various medical scenarios. Our method’s code will be released at <uri>https://github.com/TumCCC/E2AD</uri>.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":"44 8","pages":"3370-3382"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1