首页 > 最新文献

Medical image analysis最新文献

英文 中文
Editorial for Special Issue on Foundation Models for Medical Image Analysis. 医学图像分析基础模型》特刊编辑。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 Epub Date: 2024-11-06 DOI: 10.1016/j.media.2024.103389
Xiaosong Wang, Dequan Wang, Xiaoxiao Li, Jens Rittscher, Dimitris Metaxas, Shaoting Zhang
{"title":"Editorial for Special Issue on Foundation Models for Medical Image Analysis.","authors":"Xiaosong Wang, Dequan Wang, Xiaoxiao Li, Jens Rittscher, Dimitris Metaxas, Shaoting Zhang","doi":"10.1016/j.media.2024.103389","DOIUrl":"10.1016/j.media.2024.103389","url":null,"abstract":"","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":" ","pages":"103389"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few-shot medical image segmentation with high-fidelity prototypes. 基于高保真原型的少镜头医学图像分割。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 Epub Date: 2024-11-30 DOI: 10.1016/j.media.2024.103412
Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu

Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labeled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel DetailSelf-refinedPrototypeNetwork (DSPNet) to construct high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modeling the multimodal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods. The code and data are available at https://github.com/tntek/DSPNet.

少射语义分割(FSS)的目的是使预训练模型适应新的类,每个类只有一个标记的训练样本。尽管基于原型的方法已经取得了巨大的成功,但现有的模型仅限于具有相当不同的物体和不高度复杂的背景的成像场景,例如自然图像。这使得这种模型在两种情况下都不适合医学成像。为了解决这个问题,我们提出了一种新的detailself - refinedprototypennetwork (dpnet)来构建更全面地代表对象前景和背景的高保真原型。具体来说,为了在保持捕获的细节语义的同时构建全局语义,我们通过聚类对多模态结构建模,然后以通道方式融合每个结构来学习前景原型。考虑到背景在空间维度上往往没有明显的语义关系,我们在稀疏的信道感知调节下集成了特定信道的结构信息。在三个具有挑战性的医学图像基准上进行的广泛实验表明,dpspnet优于以前最先进的方法。代码和数据可在https://github.com/tntek/DSPNet上获得。
{"title":"Few-shot medical image segmentation with high-fidelity prototypes.","authors":"Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu","doi":"10.1016/j.media.2024.103412","DOIUrl":"10.1016/j.media.2024.103412","url":null,"abstract":"<p><p>Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labeled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel DetailSelf-refinedPrototypeNetwork (DSPNet) to construct high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modeling the multimodal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods. The code and data are available at https://github.com/tntek/DSPNet.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"100 ","pages":"103412"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to "Detection and analysis of cerebral aneurysms based on X-ray rotational angiography - the CADA 2020 challenge" [Medical Image Analysis, April 2022, Volume 77, 102333]. 基于 X 射线旋转血管造影的脑动脉瘤检测和分析--CADA 2020 挑战"[《医学图像分析》,2022 年 4 月,第 77 卷,102333] 勘误。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 Epub Date: 2024-10-10 DOI: 10.1016/j.media.2024.103363
Matthias Ivantsits, Leonid Goubergrits, Jan-Martin Kuhnigk, Markus Huellebrand, Jan Bruening, Tabea Kossen, Boris Pfahringer, Jens Schaller, Andreas Spuler, Titus Kuehne, Yizhuan Jia, Xuesong Li, Suprosanna Shit, Bjoern Menze, Ziyu Su, Jun Ma, Ziwei Nie, Kartik Jain, Yanfei Liu, Yi Lin, Anja Hennemuth
{"title":"Corrigendum to \"Detection and analysis of cerebral aneurysms based on X-ray rotational angiography - the CADA 2020 challenge\" [Medical Image Analysis, April 2022, Volume 77, 102333].","authors":"Matthias Ivantsits, Leonid Goubergrits, Jan-Martin Kuhnigk, Markus Huellebrand, Jan Bruening, Tabea Kossen, Boris Pfahringer, Jens Schaller, Andreas Spuler, Titus Kuehne, Yizhuan Jia, Xuesong Li, Suprosanna Shit, Bjoern Menze, Ziyu Su, Jun Ma, Ziwei Nie, Kartik Jain, Yanfei Liu, Yi Lin, Anja Hennemuth","doi":"10.1016/j.media.2024.103363","DOIUrl":"10.1016/j.media.2024.103363","url":null,"abstract":"","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":" ","pages":"103363"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Developing Human Connectome Project: A fast deep learning-based pipeline for neonatal cortical surface reconstruction. 发展中的人类连接体项目:一个快速的基于深度学习的新生儿皮层表面重建管道。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-01 Epub Date: 2024-11-26 DOI: 10.1016/j.media.2024.103394
Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C Robinson, Bernhard Kainz, Daniel Rueckert

The Developing Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 h to process a single MRI scan, making it expensive for large-scale neuroimaging studies. In this paper, we propose a fast deep learning (DL) based pipeline for dHCP neonatal cortical surface reconstruction, incorporating DL-based brain extraction, cortical surface reconstruction and spherical projection, as well as GPU-accelerated cortical surface inflation and cortical feature estimation. We introduce a multiscale deformation network to learn diffeomorphic cortical surface reconstruction end-to-end from T2-weighted brain MRI. A fast unsupervised spherical mapping approach is integrated to minimize metric distortions between cortical surfaces and projected spheres. The entire workflow of our DL-based dHCP pipeline completes within only 24 s on a modern GPU, which is nearly 1000 times faster than the original dHCP pipeline. The qualitative assessment demonstrates that for 82.5% of the test samples, the cortical surfaces reconstructed by our DL-based pipeline achieve superior (54.2%) or equal (28.3%) surface quality compared to the original dHCP pipeline.

发育中的人类连接组计划(dHCP)旨在探索围产期人类大脑的发育模式。已经开发出一种自动化处理管道,用于从dHCP新生儿数据集的结构脑磁共振(MR)图像中提取高质量的皮质表面。然而,目前该管道的实施需要超过6.5小时来处理一次MRI扫描,这使得大规模神经成像研究成本高昂。在本文中,我们提出了一种基于深度学习(DL)的快速管道用于dHCP新生儿皮质表面重建,包括基于DL的脑提取、皮质表面重建和球面投影,以及gpu加速的皮质表面膨胀和皮质特征估计。我们引入了一个多尺度变形网络来学习从t2加权脑MRI端到端的微分形态皮质表面重建。集成了一种快速无监督球面映射方法,以最小化皮质表面和投影球体之间的度量畸变。在现代GPU上,我们基于dl的dHCP管道的整个工作流程仅在24秒内完成,比原始dHCP管道快了近1000倍。定性评价表明,在82.5%的测试样本中,我们基于dl的管道重建的皮质表面质量优于(54.2%)或等于(28.3%)原始dHCP管道。
{"title":"The Developing Human Connectome Project: A fast deep learning-based pipeline for neonatal cortical surface reconstruction.","authors":"Qiang Ma, Kaili Liang, Liu Li, Saga Masui, Yourong Guo, Chiara Nosarti, Emma C Robinson, Bernhard Kainz, Daniel Rueckert","doi":"10.1016/j.media.2024.103394","DOIUrl":"10.1016/j.media.2024.103394","url":null,"abstract":"<p><p>The Developing Human Connectome Project (dHCP) aims to explore developmental patterns of the human brain during the perinatal period. An automated processing pipeline has been developed to extract high-quality cortical surfaces from structural brain magnetic resonance (MR) images for the dHCP neonatal dataset. However, the current implementation of the pipeline requires more than 6.5 h to process a single MRI scan, making it expensive for large-scale neuroimaging studies. In this paper, we propose a fast deep learning (DL) based pipeline for dHCP neonatal cortical surface reconstruction, incorporating DL-based brain extraction, cortical surface reconstruction and spherical projection, as well as GPU-accelerated cortical surface inflation and cortical feature estimation. We introduce a multiscale deformation network to learn diffeomorphic cortical surface reconstruction end-to-end from T2-weighted brain MRI. A fast unsupervised spherical mapping approach is integrated to minimize metric distortions between cortical surfaces and projected spheres. The entire workflow of our DL-based dHCP pipeline completes within only 24 s on a modern GPU, which is nearly 1000 times faster than the original dHCP pipeline. The qualitative assessment demonstrates that for 82.5% of the test samples, the cortical surfaces reconstructed by our DL-based pipeline achieve superior (54.2%) or equal (28.3%) surface quality compared to the original dHCP pipeline.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"100 ","pages":"103394"},"PeriodicalIF":10.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benefit from public unlabeled data: A Frangi filter-based pretraining network for 3D cerebrovascular segmentation. 受益于公共未标记数据:一种基于Frangi滤波器的三维脑血管分割预训练网络。
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-17 DOI: 10.1016/j.media.2024.103442
Gen Shi, Hao Lu, Hui Hui, Jie Tian

Precise cerebrovascular segmentation in Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) data is crucial for computer-aided clinical diagnosis. The sparse distribution of cerebrovascular structures within TOF-MRA images often results in high costs for manual data labeling. Leveraging unlabeled TOF-MRA data can significantly enhance model performance. In this study, we have constructed the largest preprocessed unlabeled TOF-MRA dataset to date, comprising 1510 subjects. Additionally, we provide manually annotated segmentation masks for 113 subjects based on existing external image datasets to facilitate evaluation. We propose a simple yet effective pretraining strategy utilizing the Frangi filter, known for its capability to enhance vessel-like structures, to optimize the use of the unlabeled data for 3D cerebrovascular segmentation. This involves a Frangi filter-based preprocessing workflow tailored for large-scale unlabeled datasets and a multi-task pretraining strategy to efficiently utilize the preprocessed data. This approach ensures maximal extraction of useful knowledge from the unlabeled data. The efficacy of the pretrained model is assessed across four cerebrovascular segmentation datasets, where it demonstrates superior performance, improving the clDice metric by approximately 2%-3% compared to the latest semi- and self-supervised methods. Additionally, ablation studies validate the generalizability and effectiveness of our pretraining method across various backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.

飞行时间磁共振血管成像(TOF-MRA)数据中精确的脑血管分割对计算机辅助临床诊断至关重要。脑血管结构在TOF-MRA图像中的稀疏分布往往导致人工数据标记成本高。利用未标记的TOF-MRA数据可以显著提高模型性能。在这项研究中,我们构建了迄今为止最大的预处理无标记TOF-MRA数据集,包括1510名受试者。此外,我们还基于现有的外部图像数据集为113个受试者提供了手动标注的分割掩码,以方便评估。我们提出了一种简单而有效的预训练策略,利用Frangi过滤器,以其增强血管样结构的能力而闻名,以优化未标记数据的3D脑血管分割的使用。这包括为大规模未标记数据集定制的基于Frangi滤波器的预处理工作流和多任务预训练策略,以有效利用预处理数据。这种方法确保从未标记的数据中最大限度地提取有用的知识。在四个脑血管分割数据集上对预训练模型的有效性进行了评估,在这些数据集上,它表现出了卓越的性能,与最新的半监督和自监督方法相比,clDice指标提高了约2%-3%。此外,消融研究验证了我们的预训练方法在各种骨干结构中的广泛性和有效性。代码和数据已在https://github.com/shigen-StoneRoot/FFPN上开放源代码。
{"title":"Benefit from public unlabeled data: A Frangi filter-based pretraining network for 3D cerebrovascular segmentation.","authors":"Gen Shi, Hao Lu, Hui Hui, Jie Tian","doi":"10.1016/j.media.2024.103442","DOIUrl":"https://doi.org/10.1016/j.media.2024.103442","url":null,"abstract":"<p><p>Precise cerebrovascular segmentation in Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) data is crucial for computer-aided clinical diagnosis. The sparse distribution of cerebrovascular structures within TOF-MRA images often results in high costs for manual data labeling. Leveraging unlabeled TOF-MRA data can significantly enhance model performance. In this study, we have constructed the largest preprocessed unlabeled TOF-MRA dataset to date, comprising 1510 subjects. Additionally, we provide manually annotated segmentation masks for 113 subjects based on existing external image datasets to facilitate evaluation. We propose a simple yet effective pretraining strategy utilizing the Frangi filter, known for its capability to enhance vessel-like structures, to optimize the use of the unlabeled data for 3D cerebrovascular segmentation. This involves a Frangi filter-based preprocessing workflow tailored for large-scale unlabeled datasets and a multi-task pretraining strategy to efficiently utilize the preprocessed data. This approach ensures maximal extraction of useful knowledge from the unlabeled data. The efficacy of the pretrained model is assessed across four cerebrovascular segmentation datasets, where it demonstrates superior performance, improving the clDice metric by approximately 2%-3% compared to the latest semi- and self-supervised methods. Additionally, ablation studies validate the generalizability and effectiveness of our pretraining method across various backbone structures. The code and data have been open source at: https://github.com/shigen-StoneRoot/FFPN.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103442"},"PeriodicalIF":10.7,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143008076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying multilayer network hub by graph representation learning.
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-16 DOI: 10.1016/j.media.2025.103463
Defu Yang, Minjeong Kim, Yu Zhang, Guorong Wu

The recent advances in neuroimaging technology allow us to understand how the human brain is wired in vivo and how functional activity is synchronized across multiple regions. Growing evidence shows that the complexity of the functional connectivity is far beyond the widely used mono-layer network. Indeed, the hierarchical processing information among distinct brain regions and across multiple channels requires using a more advanced multilayer model to understand the synchronization across the brain that underlies functional brain networks. However, the principled approach for characterizing network organization in the context of multilayer topologies is largely unexplored. In this work, we present a novel multi-variate hub identification method that takes both the intra- and inter-layer network topologies into account. Specifically, we put the spotlight on the multilayer graph embeddings that allow us to separate connector hubs (connecting across network modules) with their peripheral nodes. The removal of these hub nodes breaks down the entire multilayer brain network into a set of disconnected communities. We have evaluated our novel multilayer hub identification method in task-based and resting-state functional images. Complimenting ongoing findings using mono-layer brain networks, our multilayer network analysis provides a new understanding of brain network topology that links functional connectivities with brain states and disease progression.

{"title":"Identifying multilayer network hub by graph representation learning.","authors":"Defu Yang, Minjeong Kim, Yu Zhang, Guorong Wu","doi":"10.1016/j.media.2025.103463","DOIUrl":"https://doi.org/10.1016/j.media.2025.103463","url":null,"abstract":"<p><p>The recent advances in neuroimaging technology allow us to understand how the human brain is wired in vivo and how functional activity is synchronized across multiple regions. Growing evidence shows that the complexity of the functional connectivity is far beyond the widely used mono-layer network. Indeed, the hierarchical processing information among distinct brain regions and across multiple channels requires using a more advanced multilayer model to understand the synchronization across the brain that underlies functional brain networks. However, the principled approach for characterizing network organization in the context of multilayer topologies is largely unexplored. In this work, we present a novel multi-variate hub identification method that takes both the intra- and inter-layer network topologies into account. Specifically, we put the spotlight on the multilayer graph embeddings that allow us to separate connector hubs (connecting across network modules) with their peripheral nodes. The removal of these hub nodes breaks down the entire multilayer brain network into a set of disconnected communities. We have evaluated our novel multilayer hub identification method in task-based and resting-state functional images. Complimenting ongoing findings using mono-layer brain networks, our multilayer network analysis provides a new understanding of brain network topology that links functional connectivities with brain states and disease progression.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103463"},"PeriodicalIF":10.7,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143023985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UnICLAM: Contrastive representation learning with adversarial masking for unified and interpretable Medical Vision Question Answering.
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-15 DOI: 10.1016/j.media.2025.103464
Chenlu Zhan, Peng Peng, Hongwei Wang, Gaoang Wang, Yu Lin, Tao Chen, Hongsen Wang

Medical Visual Question Answering aims to assist doctors in decision-making when answering clinical questions regarding radiology images. Nevertheless, current models learn cross-modal representations through residing vision and text encoders in dual separate spaces, which inevitably leads to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. To achieve the learning of an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy for alignment. Specifically, the proposed strategy learns a constraint for the vision and text encoders to be close in the same space, which is gradually loosened as the number of layers increases, so as to narrow the distance between the two different modalities. For grasping the unified semantic cross-modal representation, we extend the adversarial masking data augmentation to the contrastive representation learning of vision and text in a unified manner. While the encoder training minimizes the distance between the original and masking samples, the adversarial masking module keeps adversarial learning to conversely maximize the distance. We also intuitively take a further exploration of the unified adversarial masking augmentation method, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA methods. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis. The codes and models will be released upon the acceptance of the paper.

{"title":"UnICLAM: Contrastive representation learning with adversarial masking for unified and interpretable Medical Vision Question Answering.","authors":"Chenlu Zhan, Peng Peng, Hongwei Wang, Gaoang Wang, Yu Lin, Tao Chen, Hongsen Wang","doi":"10.1016/j.media.2025.103464","DOIUrl":"https://doi.org/10.1016/j.media.2025.103464","url":null,"abstract":"<p><p>Medical Visual Question Answering aims to assist doctors in decision-making when answering clinical questions regarding radiology images. Nevertheless, current models learn cross-modal representations through residing vision and text encoders in dual separate spaces, which inevitably leads to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. To achieve the learning of an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy for alignment. Specifically, the proposed strategy learns a constraint for the vision and text encoders to be close in the same space, which is gradually loosened as the number of layers increases, so as to narrow the distance between the two different modalities. For grasping the unified semantic cross-modal representation, we extend the adversarial masking data augmentation to the contrastive representation learning of vision and text in a unified manner. While the encoder training minimizes the distance between the original and masking samples, the adversarial masking module keeps adversarial learning to conversely maximize the distance. We also intuitively take a further exploration of the unified adversarial masking augmentation method, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA methods. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis. The codes and models will be released upon the acceptance of the paper.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103464"},"PeriodicalIF":10.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIRE: Scale-invariant, rotation-equivariant estimation of artery orientations using graph neural networks.
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-15 DOI: 10.1016/j.media.2025.103467
Dieuwertje Alblas, Julian Suk, Christoph Brune, Kak Khee Yeung, Jelmer M Wolterink

The orientation of a blood vessel as visualized in 3D medical images is an important descriptor of its geometry that can be used for centerline extraction and subsequent segmentation, labeling, and visualization. Blood vessels appear at multiple scales and levels of tortuosity, and determining the exact orientation of a vessel is a challenging problem. Recent works have used 3D convolutional neural networks (CNNs) for this purpose, but CNNs are sensitive to variations in vessel size and orientation. We present SIRE: a scale-invariant rotation-equivariant estimator for local vessel orientation. SIRE is modular and has strongly generalizing properties due to symmetry preservations. SIRE consists of a gauge equivariant mesh CNN (GEM-CNN) that operates in parallel on multiple nested spherical meshes with different sizes. The features on each mesh are a projection of image intensities within the corresponding sphere. These features are intrinsic to the sphere and, in combination with the gauge equivariant properties of GEM-CNN, lead to SO(3) rotation equivariance. Approximate scale invariance is achieved by weight sharing and use of a symmetric maximum aggregation function to combine predictions at multiple scales. Hence, SIRE can be trained with arbitrarily oriented vessels with varying radii to generalize to vessels with a wide range of calibres and tortuosity. We demonstrate the efficacy of SIRE using three datasets containing vessels of varying scales; the vascular model repository (VMR), the ASOCA coronary artery set, and an in-house set of abdominal aortic aneurysms (AAAs). We embed SIRE in a centerline tracker which accurately tracks large calibre AAAs, regardless of the data SIRE is trained with. Moreover, a tracker can use SIRE to track small-calibre tortuous coronary arteries, even when trained only with large-calibre, non-tortuous AAAs. Additional experiments are performed to verify the rotational equivariant and scale invariant properties of SIRE. In conclusion, by incorporating SO(3) and scale symmetries, SIRE can be used to determine orientations of vessels outside of the training domain, offering a robust and data-efficient solution to geometric analysis of blood vessels in 3D medical images.

{"title":"SIRE: Scale-invariant, rotation-equivariant estimation of artery orientations using graph neural networks.","authors":"Dieuwertje Alblas, Julian Suk, Christoph Brune, Kak Khee Yeung, Jelmer M Wolterink","doi":"10.1016/j.media.2025.103467","DOIUrl":"https://doi.org/10.1016/j.media.2025.103467","url":null,"abstract":"<p><p>The orientation of a blood vessel as visualized in 3D medical images is an important descriptor of its geometry that can be used for centerline extraction and subsequent segmentation, labeling, and visualization. Blood vessels appear at multiple scales and levels of tortuosity, and determining the exact orientation of a vessel is a challenging problem. Recent works have used 3D convolutional neural networks (CNNs) for this purpose, but CNNs are sensitive to variations in vessel size and orientation. We present SIRE: a scale-invariant rotation-equivariant estimator for local vessel orientation. SIRE is modular and has strongly generalizing properties due to symmetry preservations. SIRE consists of a gauge equivariant mesh CNN (GEM-CNN) that operates in parallel on multiple nested spherical meshes with different sizes. The features on each mesh are a projection of image intensities within the corresponding sphere. These features are intrinsic to the sphere and, in combination with the gauge equivariant properties of GEM-CNN, lead to SO(3) rotation equivariance. Approximate scale invariance is achieved by weight sharing and use of a symmetric maximum aggregation function to combine predictions at multiple scales. Hence, SIRE can be trained with arbitrarily oriented vessels with varying radii to generalize to vessels with a wide range of calibres and tortuosity. We demonstrate the efficacy of SIRE using three datasets containing vessels of varying scales; the vascular model repository (VMR), the ASOCA coronary artery set, and an in-house set of abdominal aortic aneurysms (AAAs). We embed SIRE in a centerline tracker which accurately tracks large calibre AAAs, regardless of the data SIRE is trained with. Moreover, a tracker can use SIRE to track small-calibre tortuous coronary arteries, even when trained only with large-calibre, non-tortuous AAAs. Additional experiments are performed to verify the rotational equivariant and scale invariant properties of SIRE. In conclusion, by incorporating SO(3) and scale symmetries, SIRE can be used to determine orientations of vessels outside of the training domain, offering a robust and data-efficient solution to geometric analysis of blood vessels in 3D medical images.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103467"},"PeriodicalIF":10.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When multiple instance learning meets foundation models: Advancing histological whole slide image analysis.
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-14 DOI: 10.1016/j.media.2025.103456
Hongming Xu, Mingkang Wang, Duanbo Shi, Huamin Qin, Yunpeng Zhang, Zaiyi Liu, Anant Madabhushi, Peng Gao, Fengyu Cong, Cheng Lu

Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.

{"title":"When multiple instance learning meets foundation models: Advancing histological whole slide image analysis.","authors":"Hongming Xu, Mingkang Wang, Duanbo Shi, Huamin Qin, Yunpeng Zhang, Zaiyi Liu, Anant Madabhushi, Peng Gao, Fengyu Cong, Cheng Lu","doi":"10.1016/j.media.2025.103456","DOIUrl":"https://doi.org/10.1016/j.media.2025.103456","url":null,"abstract":"<p><p>Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely used approaches compare to each other, given the recent proliferation of foundation models (FMs) for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six FMs and six recent MIL methods by organizing different feature extractions and aggregations across seven clinically relevant end-to-end prediction tasks using WSIs from 4044 patients with four different cancer types. We tested state-of-the-art (SOTA) FMs in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on cancer grading, biomarker status prediction, and microsatellite instability (MSI) prediction suggest that (1) FMs like UNI, trained with more diverse histological images, outperform generic models with smaller training datasets in patch embeddings, significantly enhancing downstream MIL classification accuracy and model training convergence speed, (2) instance feature fine-tuning, known as online feature re-embedding, to capture both fine-grained details and spatial interactions can often further improve WSI classification performance, (3) FMs advance MIL models by enabling promising grading classifications, biomarker status, and MSI predictions without requiring pixel- or patch-level annotations. These findings encourage the development of advanced, domain-specific FMs, aimed at more universally applicable diagnostic tasks, aligning with the evolving needs of clinical AI in pathology.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103456"},"PeriodicalIF":10.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic spectrum-driven hierarchical learning network for polyp segmentation.
IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-14 DOI: 10.1016/j.media.2024.103449
Haolin Wang, Kai-Ni Wang, Jie Hua, Yi Tang, Yang Chen, Guang-Quan Zhou, Shuo Li

Accurate automatic polyp segmentation in colonoscopy is crucial for the prompt prevention of colorectal cancer. However, the heterogeneous nature of polyps and differences in lighting and visibility conditions present significant challenges in achieving reliable and consistent segmentation across different cases. Therefore, this study proposes a novel dynamic spectrum-driven hierarchical learning model (DSHNet), the first to specifically leverage image frequency domain information to explore region-level salience differences among and within polyps for precise segmentation. A novel spectral decoupler is advanced to separate low-frequency and high-frequency components, leveraging their distinct characteristics to guide the model in learning valuable frequency features without bias through automatic masking. The low-frequency driven region-level saliency modeling then generates dynamic convolution kernels with individual frequency-aware features, which regulate region-level saliency modeling together with the supervision of the hierarchy of labels, thus enabling adaptation to polyp heterogeneous and illumination variation simultaneously. Meanwhile, the high-frequency attention module is designed to preserve the detailed information at the skip connections, which complements the focus on spatial features at various stages. Experimental results demonstrate that the proposed method outperforms other state-of-the-art polyp segmentation techniques, achieving robust and superior results on five diverse datasets. Codes are available at https://github.com/gardnerzhou/DSHNet.

{"title":"Dynamic spectrum-driven hierarchical learning network for polyp segmentation.","authors":"Haolin Wang, Kai-Ni Wang, Jie Hua, Yi Tang, Yang Chen, Guang-Quan Zhou, Shuo Li","doi":"10.1016/j.media.2024.103449","DOIUrl":"https://doi.org/10.1016/j.media.2024.103449","url":null,"abstract":"<p><p>Accurate automatic polyp segmentation in colonoscopy is crucial for the prompt prevention of colorectal cancer. However, the heterogeneous nature of polyps and differences in lighting and visibility conditions present significant challenges in achieving reliable and consistent segmentation across different cases. Therefore, this study proposes a novel dynamic spectrum-driven hierarchical learning model (DSHNet), the first to specifically leverage image frequency domain information to explore region-level salience differences among and within polyps for precise segmentation. A novel spectral decoupler is advanced to separate low-frequency and high-frequency components, leveraging their distinct characteristics to guide the model in learning valuable frequency features without bias through automatic masking. The low-frequency driven region-level saliency modeling then generates dynamic convolution kernels with individual frequency-aware features, which regulate region-level saliency modeling together with the supervision of the hierarchy of labels, thus enabling adaptation to polyp heterogeneous and illumination variation simultaneously. Meanwhile, the high-frequency attention module is designed to preserve the detailed information at the skip connections, which complements the focus on spatial features at various stages. Experimental results demonstrate that the proposed method outperforms other state-of-the-art polyp segmentation techniques, achieving robust and superior results on five diverse datasets. Codes are available at https://github.com/gardnerzhou/DSHNet.</p>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"101 ","pages":"103449"},"PeriodicalIF":10.7,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Medical image analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1