首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
Blind CT Image Quality Assessment Using DDPM-derived Content and Transformer-based Evaluator. 使用 DDPM 派生内容和基于变换器的评估器进行盲 CT 图像质量评估。
Pub Date : 2024-06-24 DOI: 10.1109/TMI.2024.3418652
Yongyi Shi, Wenjun Xia, Ge Wang, Xuanqin Mou

Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intriguing direction involves developing BIQA methods that mimic the operational characteristic of the human visual system (HVS). The internal generative mechanism (IGM) theory reveals that the HVS actively deduces primary content to enhance comprehension. In this study, we introduce an innovative BIQA metric that emulates the active inference process of IGM. Initially, an active inference module, implemented as a denoising diffusion probabilistic model (DDPM), is constructed to anticipate the primary content. Then, the dissimilarity map is derived by assessing the interrelation between the distorted image and its primary content. Subsequently, the distorted image and dissimilarity map are combined into a multi-channel image, which is inputted into a transformer-based image quality evaluator. By leveraging the DDPM-derived primary content, our approach achieves competitive performance on a low-dose CT dataset.

降低每个视图的辐射剂量和利用每次扫描的稀疏视图是两种常见的 CT 扫描模式,但往往会导致以噪声和条纹伪影为特征的图像失真。盲图像质量评估(BIQA)致力于评估与放射科医生感知一致的感知质量,这在推进低剂量 CT 重建技术方面发挥着重要作用。一个令人感兴趣的方向是开发能模仿人类视觉系统(HVS)运行特征的 BIQA 方法。内部生成机制(IGM)理论揭示了 HVS 会主动推导主要内容以提高理解能力。在本研究中,我们引入了一种创新的 BIQA 指标,以模拟 IGM 的主动推理过程。首先,构建一个以去噪扩散概率模型(DDPM)形式实现的主动推理模块,以预测主要内容。然后,通过评估失真图像与其主要内容之间的相互关系得出相似性图。随后,将扭曲图像和差异图组合成多通道图像,并将其输入基于变换器的图像质量评估器。通过利用从 DDPM 派生的主要内容,我们的方法在低剂量 CT 数据集上实现了具有竞争力的性能。
{"title":"Blind CT Image Quality Assessment Using DDPM-derived Content and Transformer-based Evaluator.","authors":"Yongyi Shi, Wenjun Xia, Ge Wang, Xuanqin Mou","doi":"10.1109/TMI.2024.3418652","DOIUrl":"10.1109/TMI.2024.3418652","url":null,"abstract":"<p><p>Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intriguing direction involves developing BIQA methods that mimic the operational characteristic of the human visual system (HVS). The internal generative mechanism (IGM) theory reveals that the HVS actively deduces primary content to enhance comprehension. In this study, we introduce an innovative BIQA metric that emulates the active inference process of IGM. Initially, an active inference module, implemented as a denoising diffusion probabilistic model (DDPM), is constructed to anticipate the primary content. Then, the dissimilarity map is derived by assessing the interrelation between the distorted image and its primary content. Subsequently, the distorted image and dissimilarity map are combined into a multi-channel image, which is inputted into a transformer-based image quality evaluator. By leveraging the DDPM-derived primary content, our approach achieves competitive performance on a low-dose CT dataset.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model. MCPL:医学视觉语言模型的多模式协作提示学习。
Pub Date : 2024-06-24 DOI: 10.1109/TMI.2024.3418408
Pengyu Wang, Huaqi Zhang, Yixuan Yuan

Multi-modal prompt learning is a high-performance and cost-effective learning paradigm, which learns text as well as image prompts to tune pre-trained vision-language (V-L) models like CLIP for adapting multiple downstream tasks. However, recent methods typically treat text and image prompts as independent components without considering the dependency between prompts. Moreover, extending multi-modal prompt learning into the medical field poses challenges due to a significant gap between general- and medical-domain data. To this end, we propose a Multi-modal Collaborative Prompt Learning (MCPL) pipeline to tune a frozen V-L model for aligning medical text-image representations, thereby achieving medical downstream tasks. We first construct the anatomy-pathology (AP) prompt for multi-modal prompting jointly with text and image prompts. The AP prompt introduces instance-level anatomy and pathology information, thereby making a V-L model better comprehend medical reports and images. Next, we propose graph-guided prompt collaboration module (GPCM), which explicitly establishes multi-way couplings between the AP, text, and image prompts, enabling collaborative multi-modal prompt producing and updating for more effective prompting. Finally, we develop a novel prompt configuration scheme, which attaches the AP prompt to the query and key, and the text/image prompt to the value in self-attention layers for improving the interpretability of multi-modal prompts. Extensive experiments on numerous medical classification and object detection datasets show that the proposed pipeline achieves excellent effectiveness and generalization. Compared with state-of-the-art prompt learning methods, MCPL provides a more reliable multi-modal prompt paradigm for reducing tuning costs of V-L models on medical downstream tasks. Our code: https://github.com/CUHK-AIM-Group/MCPL.

多模态提示学习是一种高性能、高性价比的学习范式,它通过学习文本和图像提示来调整像 CLIP 这样的预训练视觉语言(V-L)模型,以适应多种下游任务。然而,最近的方法通常将文本和图像提示作为独立组件处理,而不考虑提示之间的依赖关系。此外,由于通用数据和医疗领域数据之间存在巨大差距,将多模态提示学习扩展到医疗领域面临着挑战。为此,我们提出了多模态协同提示学习(MCPL)管道,以调整用于对齐医学文本-图像表征的冻结 V-L 模型,从而实现医学下游任务。我们首先构建了解剖病理学(AP)提示,用于联合文本和图像提示进行多模态提示。解剖病理提示引入了实例级的解剖和病理信息,从而使 V-L 模型能更好地理解医疗报告和图像。接着,我们提出了图引导提示协作模块(GPCM),该模块明确地在AP、文本和图像提示之间建立了多向耦合,实现了多模态提示的协作生成和更新,从而提高了提示效率。最后,我们开发了一种新颖的提示配置方案,将 AP 提示与查询和密钥相连,将文本/图像提示与自我关注层中的值相连,以提高多模态提示的可解释性。在大量医疗分类和物体检测数据集上进行的广泛实验表明,所提出的管道具有出色的有效性和泛化能力。与最先进的提示学习方法相比,MCPL 提供了一种更可靠的多模态提示范例,可降低医疗下游任务中 V-L 模型的调整成本。我们的代码:https://github.com/CUHK-AIM-Group/MCPL。
{"title":"MCPL: Multi-modal Collaborative Prompt Learning for Medical Vision-Language Model.","authors":"Pengyu Wang, Huaqi Zhang, Yixuan Yuan","doi":"10.1109/TMI.2024.3418408","DOIUrl":"10.1109/TMI.2024.3418408","url":null,"abstract":"<p><p>Multi-modal prompt learning is a high-performance and cost-effective learning paradigm, which learns text as well as image prompts to tune pre-trained vision-language (V-L) models like CLIP for adapting multiple downstream tasks. However, recent methods typically treat text and image prompts as independent components without considering the dependency between prompts. Moreover, extending multi-modal prompt learning into the medical field poses challenges due to a significant gap between general- and medical-domain data. To this end, we propose a Multi-modal Collaborative Prompt Learning (MCPL) pipeline to tune a frozen V-L model for aligning medical text-image representations, thereby achieving medical downstream tasks. We first construct the anatomy-pathology (AP) prompt for multi-modal prompting jointly with text and image prompts. The AP prompt introduces instance-level anatomy and pathology information, thereby making a V-L model better comprehend medical reports and images. Next, we propose graph-guided prompt collaboration module (GPCM), which explicitly establishes multi-way couplings between the AP, text, and image prompts, enabling collaborative multi-modal prompt producing and updating for more effective prompting. Finally, we develop a novel prompt configuration scheme, which attaches the AP prompt to the query and key, and the text/image prompt to the value in self-attention layers for improving the interpretability of multi-modal prompts. Extensive experiments on numerous medical classification and object detection datasets show that the proposed pipeline achieves excellent effectiveness and generalization. Compared with state-of-the-art prompt learning methods, MCPL provides a more reliable multi-modal prompt paradigm for reducing tuning costs of V-L models on medical downstream tasks. Our code: https://github.com/CUHK-AIM-Group/MCPL.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-reversion Fast-sampling Score-based Model for Limited-angle CT Reconstruction. 用于有限角度 CT 重建的基于时间转换快速采样分数的模型
Pub Date : 2024-06-24 DOI: 10.1109/TMI.2024.3418838
Yanyang Wang, Zirong Li, Weiwen Wu

The score-based generative model (SGM) has received significant attention in the field of medical imaging, particularly in the context of limited-angle computed tomography (LACT). Traditional SGM approaches achieved robust reconstruction performance by incorporating a substantial number of sampling steps during the inference phase. However, these established SGM-based methods require large computational cost to reconstruct one case. The main challenge lies in achieving high-quality images with rapid sampling while preserving sharp edges and small features. In this study, we propose an innovative rapid-sampling strategy for SGM, which we have aptly named the time-reversion fast-sampling (TIFA) score-based model for LACT reconstruction. The entire sampling procedure adheres steadfastly to the principles of robust optimization theory and is firmly grounded in a comprehensive mathematical model. TIFA's rapid-sampling mechanism comprises several essential components, including jump sampling, time-reversion with re-sampling, and compressed sampling. In the initial jump sampling stage, multiple sampling steps are bypassed to expedite the attainment of preliminary results. Subsequently, during the time-reversion process, the initial results undergo controlled corruption by introducing small-scale noise. The re-sampling process then diligently refines the initially corrupted results. Finally, compressed sampling fine-tunes the refinement outcomes by imposing regularization term. Quantitative and qualitative assessments conducted on numerical simulations, real physical phantom, and clinical cardiac datasets, unequivocally demonstrate that TIFA method (using 200 steps) outperforms other state-of-the-art methods (using 2000 steps) from available [0°, 90°] and [0°, 60°]. Furthermore, experimental results underscore that our TIFA method continues to reconstruct high-quality images even with 10 steps. Our code at https://github.com/tianzhijiaoziA/TIFADiffusion.

基于分数的生成模型(SGM)在医学成像领域,尤其是在有限角度计算机断层扫描(LACT)方面受到了极大关注。传统的 SGM 方法通过在推理阶段加入大量的采样步骤来实现稳健的重建性能。然而,这些成熟的基于 SGM 的方法重建一个病例需要大量的计算成本。主要挑战在于如何通过快速采样获得高质量图像,同时保留锐利边缘和小特征。在本研究中,我们为 SGM 提出了一种创新的快速采样策略,并将其命名为基于时间反演快速采样(TIFA)的 LACT 重建得分模型。整个采样过程严格遵循稳健优化理论的原则,并以一个全面的数学模型为坚实基础。TIFA 的快速采样机制由几个重要部分组成,包括跳跃采样、带重新采样的时间反演和压缩采样。在最初的跳跃采样阶段,多个采样步骤被绕过,以加快获得初步结果。随后,在时间反演过程中,通过引入小尺度噪声,对初步结果进行受控破坏。然后,重新采样过程会不断完善最初被破坏的结果。最后,压缩采样通过施加正则化项对细化结果进行微调。在数值模拟、真实物理模型和临床心脏数据集上进行的定量和定性评估明确表明,TIFA 方法(使用 200 步)在可用的 [0°, 90°] 和 [0°, 60°] 范围内优于其他最先进的方法(使用 2000 步)。此外,实验结果还表明,即使使用 10 个步骤,我们的 TIFA 方法也能继续重建高质量的图像。我们的代码见 https://github.com/tianzhijiaoziA/TIFADiffusion。
{"title":"Time-reversion Fast-sampling Score-based Model for Limited-angle CT Reconstruction.","authors":"Yanyang Wang, Zirong Li, Weiwen Wu","doi":"10.1109/TMI.2024.3418838","DOIUrl":"10.1109/TMI.2024.3418838","url":null,"abstract":"<p><p>The score-based generative model (SGM) has received significant attention in the field of medical imaging, particularly in the context of limited-angle computed tomography (LACT). Traditional SGM approaches achieved robust reconstruction performance by incorporating a substantial number of sampling steps during the inference phase. However, these established SGM-based methods require large computational cost to reconstruct one case. The main challenge lies in achieving high-quality images with rapid sampling while preserving sharp edges and small features. In this study, we propose an innovative rapid-sampling strategy for SGM, which we have aptly named the time-reversion fast-sampling (TIFA) score-based model for LACT reconstruction. The entire sampling procedure adheres steadfastly to the principles of robust optimization theory and is firmly grounded in a comprehensive mathematical model. TIFA's rapid-sampling mechanism comprises several essential components, including jump sampling, time-reversion with re-sampling, and compressed sampling. In the initial jump sampling stage, multiple sampling steps are bypassed to expedite the attainment of preliminary results. Subsequently, during the time-reversion process, the initial results undergo controlled corruption by introducing small-scale noise. The re-sampling process then diligently refines the initially corrupted results. Finally, compressed sampling fine-tunes the refinement outcomes by imposing regularization term. Quantitative and qualitative assessments conducted on numerical simulations, real physical phantom, and clinical cardiac datasets, unequivocally demonstrate that TIFA method (using 200 steps) outperforms other state-of-the-art methods (using 2000 steps) from available [0°, 90°] and [0°, 60°]. Furthermore, experimental results underscore that our TIFA method continues to reconstruct high-quality images even with 10 steps. Our code at https://github.com/tianzhijiaoziA/TIFADiffusion.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images. MedSyn:高保真三维 CT 图像的文本指导解剖学感知合成。
Pub Date : 2024-06-20 DOI: 10.1109/TMI.2024.3415032
Yanwu Xu, Li Sun, Wei Peng, Shuyue Jia, Katelyn Morrison, Adam Perer, Afrooz Zandifar, Shyam Visweswaran, Motahhare Eslami, Kayhan Batmanghelich

This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.

本文介绍了一种在文本信息指导下生成高质量三维肺部 CT 图像的创新方法。虽然基于扩散的生成模型越来越多地应用于医学影像领域,但目前最先进的方法仅限于低分辨率输出,对放射报告的丰富信息利用不足。放射学报告可以通过提供额外的指导和对图像合成的精细控制来增强生成过程。然而,将文本引导生成扩展到高分辨率三维图像会给内存和解剖细节保护带来巨大挑战。为了解决内存问题,我们引入了一种使用改进的 UNet 架构的分层方案。我们首先根据文本合成低分辨率图像,作为后续生成完整容积数据的基础。为确保生成样本的解剖学可信度,我们结合 CT 图像生成血管、气道和小叶分割掩膜,从而提供进一步的指导。该模型展示了使用文本输入和分割任务生成合成图像的能力。算法比较评估和由 10 位经委员会认证的放射科医生进行的盲评表明,与基于 GAN 和扩散技术的最先进模型相比,我们的方法表现出更优越的性能,尤其是在准确保留裂隙线和气道等关键解剖特征方面。这一创新带来了新的可能性。这项研究有两个主要目标:(1) 开发基于文字提示和解剖成分的图像创建方法;(2) 根据解剖元素生成新图像的能力。图像生成方面的进步可用于加强许多下游任务。
{"title":"MedSyn: Text-guided Anatomy-aware Synthesis of High-Fidelity 3D CT Images.","authors":"Yanwu Xu, Li Sun, Wei Peng, Shuyue Jia, Katelyn Morrison, Adam Perer, Afrooz Zandifar, Shyam Visweswaran, Motahhare Eslami, Kayhan Batmanghelich","doi":"10.1109/TMI.2024.3415032","DOIUrl":"10.1109/TMI.2024.3415032","url":null,"abstract":"<p><p>This paper introduces an innovative methodology for producing high-quality 3D lung CT images guided by textual information. While diffusion-based generative models are increasingly used in medical imaging, current state-of-the-art approaches are limited to low-resolution outputs and underutilize radiology reports' abundant information. The radiology reports can enhance the generation process by providing additional guidance and offering fine-grained control over the synthesis of images. Nevertheless, expanding text-guided generation to high-resolution 3D images poses significant memory and anatomical detail-preserving challenges. Addressing the memory issue, we introduce a hierarchical scheme that uses a modified UNet architecture. We start by synthesizing low-resolution images conditioned on the text, serving as a foundation for subsequent generators for complete volumetric data. To ensure the anatomical plausibility of the generated samples, we provide further guidance by generating vascular, airway, and lobular segmentation masks in conjunction with the CT images. The model demonstrates the capability to use textual input and segmentation tasks to generate synthesized images. Algorithmic comparative assessments and blind evaluations conducted by 10 board-certified radiologists indicate that our approach exhibits superior performance compared to the most advanced models based on GAN and diffusion techniques, especially in accurately retaining crucial anatomical features such as fissure lines and airways. This innovation introduces novel possibilities. This study focuses on two main objectives: (1) the development of a method for creating images based on textual prompts and anatomical components, and (2) the capability to generate new images conditioning on anatomical elements. The advancements in image generation can be applied to enhance numerous downstream tasks.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141433613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PolarFormer: A Transformer-based Method for Multi-lesion Segmentation in Intravascular OCT. PolarFormer:基于变换器的血管内 OCT 多病灶分割方法
Pub Date : 2024-06-20 DOI: 10.1109/TMI.2024.3417007
Zhili Huang, Jingyi Sun, Yifan Shao, Zixuan Wang, Su Wang, Qiyong Li, Jinsong Li, Qian Yu

Several deep learning-based methods have been proposed to extract vulnerable plaques of a single class from intravascular optical coherence tomography (OCT) images. However, further research is limited by the lack of publicly available large-scale intravascular OCT datasets with multi-class vulnerable plaque annotations. Additionally, multi-class vulnerable plaque segmentation is extremely challenging due to the irregular distribution of plaques, their unique geometric shapes, and fuzzy boundaries. Existing methods have not adequately addressed the geometric features and spatial prior information of vulnerable plaques. To address these issues, we collected a dataset containing 70 pullback data and developed a multi-class vulnerable plaque segmentation model, called PolarFormer, that incorporates the prior knowledge of vulnerable plaques in spatial distribution. The key module of our proposed model is Polar Attention, which models the spatial relationship of vulnerable plaques in the radial direction. Extensive experiments conducted on the new dataset demonstrate that our proposed method outperforms other baseline methods. Code and data can be accessed via this link: https://github.com/sunjingyi0415/IVOCT-segementaion.

目前已经提出了几种基于深度学习的方法,用于从血管内光学相干断层扫描(OCT)图像中提取单一类别的易损斑块。然而,由于缺乏可公开获取的具有多类易损斑块注释的大规模血管内光学相干断层扫描数据集,进一步的研究受到了限制。此外,由于斑块的不规则分布、独特的几何形状和模糊的边界,多类易损斑块分割极具挑战性。现有方法没有充分解决易损斑块的几何特征和空间先验信息。为了解决这些问题,我们收集了一个包含 70 个回拉数据的数据集,并开发了一个名为 PolarFormer 的多类易损斑块分割模型,该模型结合了易损斑块在空间分布方面的先验知识。我们提出的模型的关键模块是 "Polar Attention",它对易损斑块在径向的空间关系进行建模。在新数据集上进行的大量实验证明,我们提出的方法优于其他基线方法。代码和数据可通过以下链接获取:https://github.com/sunjingyi0415/IVOCT-segementaion。
{"title":"PolarFormer: A Transformer-based Method for Multi-lesion Segmentation in Intravascular OCT.","authors":"Zhili Huang, Jingyi Sun, Yifan Shao, Zixuan Wang, Su Wang, Qiyong Li, Jinsong Li, Qian Yu","doi":"10.1109/TMI.2024.3417007","DOIUrl":"10.1109/TMI.2024.3417007","url":null,"abstract":"<p><p>Several deep learning-based methods have been proposed to extract vulnerable plaques of a single class from intravascular optical coherence tomography (OCT) images. However, further research is limited by the lack of publicly available large-scale intravascular OCT datasets with multi-class vulnerable plaque annotations. Additionally, multi-class vulnerable plaque segmentation is extremely challenging due to the irregular distribution of plaques, their unique geometric shapes, and fuzzy boundaries. Existing methods have not adequately addressed the geometric features and spatial prior information of vulnerable plaques. To address these issues, we collected a dataset containing 70 pullback data and developed a multi-class vulnerable plaque segmentation model, called PolarFormer, that incorporates the prior knowledge of vulnerable plaques in spatial distribution. The key module of our proposed model is Polar Attention, which models the spatial relationship of vulnerable plaques in the radial direction. Extensive experiments conducted on the new dataset demonstrate that our proposed method outperforms other baseline methods. Code and data can be accessed via this link: https://github.com/sunjingyi0415/IVOCT-segementaion.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141433614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN-O-ELMNet: Optimized Lightweight and Generalized Model for Lung Disease Classification and Severity Assessment. CNN-O-ELMNet:用于肺病分类和严重程度评估的优化轻量级通用模型
Pub Date : 2024-06-19 DOI: 10.1109/TMI.2024.3416744
Saurabh Agarwal, K V Arya, Yogesh Kumar Meena

The high burden of lung diseases on healthcare necessitates effective detection methods. Current Computer-aided design (CAD) systems are limited by their focus on specific diseases and computationally demanding deep learning models. To overcome these challenges, we introduce CNN-O-ELMNet, a lightweight classification model designed to efficiently detect various lung diseases, surpassing the limitations of disease-specific CAD systems and the complexity of deep learning models. This model combines a convolutional neural network for deep feature extraction with an optimized extreme learning machine, utilizing the imperialistic competitive algorithm for enhanced predictions. We then evaluated the effectiveness of CNN-O-ELMNet using benchmark datasets for lung diseases: distinguishing pneumothorax vs. non-pneumothorax, tuberculosis vs. normal, and lung cancer vs. healthy cases. Our findings demonstrate that CNN-O-ELMNet significantly outperformed (p < 0.05) state-of-the-art methods in binary classifications for tuberculosis and cancer, achieving accuracies of 97.85% and 97.70%, respectively, while maintaining low computational complexity with only 2481 trainable parameters. We also extended the model to categorize lung disease severity based on Brixia scores. Achieving a 96.20% accuracy in multi-class assessment for mild, moderate, and severe cases, makes it suitable for deployment in lightweight healthcare devices.

肺部疾病给医疗保健带来了沉重负担,因此需要有效的检测方法。目前的计算机辅助设计(CAD)系统受限于对特定疾病的关注和计算要求极高的深度学习模型。为了克服这些挑战,我们引入了 CNN-O-ELMNet,这是一种轻量级分类模型,旨在有效检测各种肺部疾病,超越了特定疾病 CAD 系统的局限性和深度学习模型的复杂性。该模型将用于深度特征提取的卷积神经网络与优化的极限学习机相结合,利用帝国主义竞争算法增强预测能力。然后,我们使用肺部疾病的基准数据集评估了 CNN-O-ELMNet 的有效性:区分气胸与非气胸、肺结核与正常、肺癌与健康病例。我们的研究结果表明,在肺结核和癌症的二元分类中,CNN-O-ELMNet 的表现明显优于最先进的方法(p < 0.05),准确率分别达到 97.85% 和 97.70%,同时保持了较低的计算复杂度,只有 2481 个可训练参数。我们还扩展了该模型,根据布里夏评分对肺病严重程度进行分类。该模型在轻度、中度和重度病例的多类评估中达到了 96.20% 的准确率,因此适合部署在轻型医疗保健设备中。
{"title":"CNN-O-ELMNet: Optimized Lightweight and Generalized Model for Lung Disease Classification and Severity Assessment.","authors":"Saurabh Agarwal, K V Arya, Yogesh Kumar Meena","doi":"10.1109/TMI.2024.3416744","DOIUrl":"10.1109/TMI.2024.3416744","url":null,"abstract":"<p><p>The high burden of lung diseases on healthcare necessitates effective detection methods. Current Computer-aided design (CAD) systems are limited by their focus on specific diseases and computationally demanding deep learning models. To overcome these challenges, we introduce CNN-O-ELMNet, a lightweight classification model designed to efficiently detect various lung diseases, surpassing the limitations of disease-specific CAD systems and the complexity of deep learning models. This model combines a convolutional neural network for deep feature extraction with an optimized extreme learning machine, utilizing the imperialistic competitive algorithm for enhanced predictions. We then evaluated the effectiveness of CNN-O-ELMNet using benchmark datasets for lung diseases: distinguishing pneumothorax vs. non-pneumothorax, tuberculosis vs. normal, and lung cancer vs. healthy cases. Our findings demonstrate that CNN-O-ELMNet significantly outperformed (p < 0.05) state-of-the-art methods in binary classifications for tuberculosis and cancer, achieving accuracies of 97.85% and 97.70%, respectively, while maintaining low computational complexity with only 2481 trainable parameters. We also extended the model to categorize lung disease severity based on Brixia scores. Achieving a 96.20% accuracy in multi-class assessment for mild, moderate, and severe cases, makes it suitable for deployment in lightweight healthcare devices.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141428587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhraseAug: An Augmented Medical Report Generation Model with Phrasebook. PhraseAug:带短语集的增强型医疗报告生成模型。
Pub Date : 2024-06-18 DOI: 10.1109/TMI.2024.3416190
Xin Mei, Libin Yang, Denghong Gao, Xiaoyan Cai, Junwei Han, Tianming Liu

Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists' writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.

医学报告生成是一项极具价值和挑战性的任务,它能自动生成准确流畅的医学影像诊断报告,减轻放射科医生的工作量,提高疾病诊断的效率。医疗图像和报告的精细对齐有助于探索图像和文本之间的密切关联,这对跨模态生成至关重要。然而,由于放射科医生的写作风格造成的视觉和语言偏差,使得跨模态图像-文本配准变得十分困难。为了减轻视觉语言偏差,本文将医疗报告离散化,并引入了一种中间模态,即由关键名词短语组成的短语集。作为医疗报告的离散化表示,短语集既包含与疾病相关的医学术语,也包含代表不同写作风格的同义短语,可以识别同义句子,从而促进图像和报告之间的精细配准。本文开发了一种带短语集的两阶段医疗报告生成增强模型(PhraseAug),该模型结合了医学图像、临床病历和写作风格来生成诊断报告。在第一阶段,短语集用于提取与语义相关的重要特征,并预测报告中包含的关键短语。第二阶段,根据预测的关键短语生成包含同义短语的医疗报告,从而促进我们的模型适应不同的写作风格,生成多样化的医疗报告。在 IU-Xray 和 MIMIC-CXR 两个公共数据集上的实验结果表明,我们提出的 PhraseAug 优于最先进的基线。
{"title":"PhraseAug: An Augmented Medical Report Generation Model with Phrasebook.","authors":"Xin Mei, Libin Yang, Denghong Gao, Xiaoyan Cai, Junwei Han, Tianming Liu","doi":"10.1109/TMI.2024.3416190","DOIUrl":"10.1109/TMI.2024.3416190","url":null,"abstract":"<p><p>Medical report generation is a valuable and challenging task, which automatically generates accurate and fluent diagnostic reports for medical images, reducing workload of radiologists and improving efficiency of disease diagnosis. Fine-grained alignment of medical images and reports facilitates the exploration of close correlations between images and texts, which is crucial for cross-modal generation. However, visual and linguistic biases caused by radiologists' writing styles make cross-modal image-text alignment difficult. To alleviate visual-linguistic bias, this paper discretizes medical reports and introduces an intermediate modality, i.e. phrasebook, consisting of key noun phrases. As discretized representation of medical reports, phrasebook contains both disease-related medical terms, and synonymous phrases representing different writing styles which can identify synonymous sentences, thereby promoting fine-grained alignment between images and reports. In this paper, an augmented two-stage medical report generation model with phrasebook (PhraseAug) is developed, which combines medical images, clinical histories and writing styles to generate diagnostic reports. In the first stage, phrasebook is used to extract semantically relevant important features and predict key phrases contained in the report. In the second stage, medical reports are generated according to the predicted key phrases which contain synonymous phrases, promoting our model to adapt to different writing styles and generating diverse medical reports. Experimental results on two public datasets, IU-Xray and MIMIC-CXR, demonstrate that our proposed PhraseAug outperforms state-of-the-art baselines.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pathological Asymmetry-Guided Progressive Learning for Acute Ischemic Stroke Infarct Segmentation. 用于急性缺血性中风梗塞分割的病理不对称引导的渐进式学习
Pub Date : 2024-06-14 DOI: 10.1109/TMI.2024.3414842
Jiarui Sun, Qiuxuan Li, Yuhao Liu, Yichuan Liu, Gouenou Coatrieux, Jean-Louis Coatrieux, Yang Chen, Jie Lu

Quantitative infarct estimation is crucial for diagnosis, treatment and prognosis in acute ischemic stroke (AIS) patients. As the early changes of ischemic tissue are subtle and easily confounded by normal brain tissue, it remains a very challenging task. However, existing methods often ignore or confuse the contribution of different types of anatomical asymmetry caused by intrinsic and pathological changes to segmentation. Further, inefficient domain knowledge utilization leads to mis-segmentation for AIS infarcts. Inspired by this idea, we propose a pathological asymmetry-guided progressive learning (PAPL) method for AIS infarct segmentation. PAPL mimics the step-by-step learning patterns observed in humans, including three progressive stages: knowledge preparation stage, formal learning stage, and examination improvement stage. First, knowledge preparation stage accumulates the preparatory domain knowledge of the infarct segmentation task, helping to learn domain-specific knowledge representations to enhance the discriminative ability for pathological asymmetries by constructed contrastive learning task. Then, formal learning stage efficiently performs end-to-end training guided by learned knowledge representations, in which the designed feature compensation module (FCM) can leverage the anatomy similarity between adjacent slices from the volumetric medical image to help aggregate rich anatomical context information. Finally, examination improvement stage encourages improving the infarct prediction from the previous stage, where the proposed perception refinement strategy (RPRS) further exploits the bilateral difference comparison to correct the mis-segmentation infarct regions by adaptively regional shrink and expansion. Extensive experiments on public and in-house NCCT datasets demonstrated the superiority of the proposed PAPL, which is promising to help better stroke evaluation and treatment.

对急性缺血性脑卒中(AIS)患者的诊断、治疗和预后进行定量梗死估计至关重要。由于缺血组织的早期变化很微妙,很容易与正常脑组织混淆,因此这仍然是一项非常具有挑战性的任务。然而,现有的方法往往忽略或混淆了由内在和病理变化引起的不同类型的解剖不对称对分割的贡献。此外,对领域知识的低效利用也会导致对 AIS 梗死的错误分割。受此启发,我们提出了一种病理不对称引导的渐进学习(PAPL)方法,用于 AIS 梗死分割。PAPL 模仿人类的逐步学习模式,包括三个渐进阶段:知识准备阶段、正式学习阶段和检查改进阶段。首先,知识准备阶段积累梗死分割任务的预备领域知识,帮助学习特定领域的知识表征,通过构建对比学习任务提高对病理不对称的辨别能力。然后,正式学习阶段以学习到的知识表征为指导,高效地执行端到端训练,其中设计的特征补偿模块(FCM)可利用容积医学图像中相邻切片之间的解剖相似性,帮助汇总丰富的解剖背景信息。最后,检查改进阶段鼓励改进前一阶段的梗死预测,其中提出的感知改进策略(RPRS)进一步利用双侧差异比较,通过自适应区域缩小和扩大来纠正梗死区域的错误分割。在公共和内部 NCCT 数据集上的广泛实验证明了所提出的 PAPL 的优越性,有望帮助更好地评估和治疗中风。
{"title":"Pathological Asymmetry-Guided Progressive Learning for Acute Ischemic Stroke Infarct Segmentation.","authors":"Jiarui Sun, Qiuxuan Li, Yuhao Liu, Yichuan Liu, Gouenou Coatrieux, Jean-Louis Coatrieux, Yang Chen, Jie Lu","doi":"10.1109/TMI.2024.3414842","DOIUrl":"10.1109/TMI.2024.3414842","url":null,"abstract":"<p><p>Quantitative infarct estimation is crucial for diagnosis, treatment and prognosis in acute ischemic stroke (AIS) patients. As the early changes of ischemic tissue are subtle and easily confounded by normal brain tissue, it remains a very challenging task. However, existing methods often ignore or confuse the contribution of different types of anatomical asymmetry caused by intrinsic and pathological changes to segmentation. Further, inefficient domain knowledge utilization leads to mis-segmentation for AIS infarcts. Inspired by this idea, we propose a pathological asymmetry-guided progressive learning (PAPL) method for AIS infarct segmentation. PAPL mimics the step-by-step learning patterns observed in humans, including three progressive stages: knowledge preparation stage, formal learning stage, and examination improvement stage. First, knowledge preparation stage accumulates the preparatory domain knowledge of the infarct segmentation task, helping to learn domain-specific knowledge representations to enhance the discriminative ability for pathological asymmetries by constructed contrastive learning task. Then, formal learning stage efficiently performs end-to-end training guided by learned knowledge representations, in which the designed feature compensation module (FCM) can leverage the anatomy similarity between adjacent slices from the volumetric medical image to help aggregate rich anatomical context information. Finally, examination improvement stage encourages improving the infarct prediction from the previous stage, where the proposed perception refinement strategy (RPRS) further exploits the bilateral difference comparison to correct the mis-segmentation infarct regions by adaptively regional shrink and expansion. Extensive experiments on public and in-house NCCT datasets demonstrated the superiority of the proposed PAPL, which is promising to help better stroke evaluation and treatment.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the capacity of a denoising diffusion probabilistic model to reproduce spatial context. 评估去噪扩散概率模型再现空间背景的能力。
Pub Date : 2024-06-14 DOI: 10.1109/TMI.2024.3414931
Rucha Deshpande, Muzaffer Ozbey, Hua Li, Mark A Anastasio, Frank J Brooks

Diffusion models have emerged as a popular family of deep generative models (DGMs). In the literature, it has been claimed that one class of diffusion models-denoising diffusion probabilistic models (DDPMs)-demonstrate superior image synthesis performance as compared to generative adversarial networks (GANs). To date, these claims have been evaluated using either ensemble-based methods designed for natural images, or conventional measures of image quality such as structural similarity. However, there remains an important need to understand the extent to which DDPMs can reliably learn medical imaging domain-relevant information, which is referred to as 'spatial context' in this work. To address this, a systematic assessment of the ability of DDPMs to learn spatial context relevant to medical imaging applications is reported for the first time. A key aspect of the studies is the use of stochastic context models (SCMs) to produce training data. In this way, the ability of the DDPMs to reliably reproduce spatial context can be quantitatively assessed by use of post-hoc image analyses. Error-rates in DDPM-generated ensembles are reported, and compared to those corresponding to other modern DGMs. The studies reveal new and important insights regarding the capacity of DDPMs to learn spatial context. Notably, the results demonstrate that DDPMs hold significant capacity for generating contextually correct images that are 'interpolated' between training samples, which may benefit data-augmentation tasks in ways that GANs cannot.

扩散模型已成为深度生成模型(DGM)的一个流行家族。有文献称,与生成式对抗网络(GANs)相比,一类扩散模型--失真扩散概率模型(DDPMs)--显示出更优越的图像合成性能。迄今为止,人们一直在使用为自然图像设计的基于集合的方法或传统的图像质量度量方法(如结构相似性)对这些说法进行评估。然而,我们仍然需要了解 DDPM 在多大程度上能够可靠地学习医学影像领域的相关信息(在本文中称为 "空间上下文")。为了解决这个问题,本报告首次系统地评估了 DDPMs 学习与医学成像应用相关的空间上下文的能力。研究的一个关键方面是使用随机上下文模型(SCM)生成训练数据。这样,就可以通过事后图像分析来定量评估 DDPMs 可靠地再现空间上下文的能力。报告了 DDPM 生成的集合的误差率,并与其他现代 DGM 对应的误差率进行了比较。这些研究揭示了有关 DDPM 学习空间上下文能力的重要新见解。值得注意的是,研究结果表明,DDPMs 在生成训练样本之间 "插值 "的正确图像方面具有很强的能力,这可能有利于数据扩充任务,而 GANs 则无法做到这一点。
{"title":"Assessing the capacity of a denoising diffusion probabilistic model to reproduce spatial context.","authors":"Rucha Deshpande, Muzaffer Ozbey, Hua Li, Mark A Anastasio, Frank J Brooks","doi":"10.1109/TMI.2024.3414931","DOIUrl":"10.1109/TMI.2024.3414931","url":null,"abstract":"<p><p>Diffusion models have emerged as a popular family of deep generative models (DGMs). In the literature, it has been claimed that one class of diffusion models-denoising diffusion probabilistic models (DDPMs)-demonstrate superior image synthesis performance as compared to generative adversarial networks (GANs). To date, these claims have been evaluated using either ensemble-based methods designed for natural images, or conventional measures of image quality such as structural similarity. However, there remains an important need to understand the extent to which DDPMs can reliably learn medical imaging domain-relevant information, which is referred to as 'spatial context' in this work. To address this, a systematic assessment of the ability of DDPMs to learn spatial context relevant to medical imaging applications is reported for the first time. A key aspect of the studies is the use of stochastic context models (SCMs) to produce training data. In this way, the ability of the DDPMs to reliably reproduce spatial context can be quantitatively assessed by use of post-hoc image analyses. Error-rates in DDPM-generated ensembles are reported, and compared to those corresponding to other modern DGMs. The studies reveal new and important insights regarding the capacity of DDPMs to learn spatial context. Notably, the results demonstrate that DDPMs hold significant capacity for generating contextually correct images that are 'interpolated' between training samples, which may benefit data-augmentation tasks in ways that GANs cannot.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BrainMass: Advancing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning. BrainMass:利用大规模自我监督学习推进用于诊断的大脑网络分析。
Pub Date : 2024-06-14 DOI: 10.1109/TMI.2024.3414476
Yanwu Yang, Chenfei Ye, Guinan Su, Ziyao Zhang, Zhikai Chang, Hairui Chen, Piu Chan, Yue Yu, Ting Ma

Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there has been limited investigation into brain network foundation models, limiting their adaptability and generalizability for broad neuroscience studies. In this study, we aim to bridge this gap. In particular, (1) we curated a comprehensive dataset by collating images from 30 datasets, which comprises 70,781 samples of 46,686 participants. Moreover, we introduce pseudo-functional connectivity (pFC) to further generates millions of augmented brain networks by randomly dropping certain timepoints of the BOLD signal. (2) We propose the BrainMass framework for brain network self-supervised learning via mask modeling and feature alignment. BrainMass employs Mask-ROI Modeling (MRM) to bolster intra-network dependencies and regional specificity. Furthermore, Latent Representation Alignment (LRA) module is utilized to regularize augmented brain networks of the same participant with similar topological properties to yield similar latent representations by aligning their latent embeddings. Extensive experiments on eight internal tasks and seven external brain disorder diagnosis tasks show BrainMass's superior performance, highlighting its significant generalizability and adaptability. Nonetheless, BrainMass demonstrates powerful few/zero-shot learning abilities and exhibits meaningful interpretation to various diseases, showcasing its potential use for clinical applications.

通过自我监督学习在大规模数据集上进行预训练的基础模型在各种任务中表现出了卓越的多功能性。由于医疗数据的异质性和难以收集的特点,这种方法尤其适用于医学图像分析和神经科学研究,因为它可以简化广泛的下游任务,而无需大量昂贵的注释。然而,目前对大脑网络基础模型的研究还很有限,限制了它们在广泛的神经科学研究中的适应性和通用性。在本研究中,我们旨在弥补这一不足。具体来说,(1) 我们通过整理 30 个数据集的图像建立了一个综合数据集,其中包括 46,686 名参与者的 70,781 个样本。此外,我们还引入了伪功能连接(pFC),通过随机丢弃 BOLD 信号的某些时间点,进一步生成数百万个增强大脑网络。(2) 我们提出了 BrainMass 框架,用于通过掩码建模和特征对齐进行脑网络自监督学习。BrainMass 采用掩码-ROI 建模(MRM)来加强网络内部的依赖性和区域特异性。此外,还利用潜在表征对齐(LRA)模块对拓扑特性相似的同一参与者的增强脑网络进行正则化,通过对齐其潜在嵌入,产生相似的潜在表征。在八项内部任务和七项外部脑部疾病诊断任务上进行的大量实验表明,BrainMass 的性能优越,突出了其显著的通用性和适应性。此外,BrainMass 还展示了强大的少量/零次学习能力,并对各种疾病做出了有意义的解释,显示了其在临床应用中的潜力。
{"title":"BrainMass: Advancing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning.","authors":"Yanwu Yang, Chenfei Ye, Guinan Su, Ziyao Zhang, Zhikai Chang, Hairui Chen, Piu Chan, Yue Yu, Ting Ma","doi":"10.1109/TMI.2024.3414476","DOIUrl":"10.1109/TMI.2024.3414476","url":null,"abstract":"<p><p>Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical image analysis and neuroscience research, as it streamlines broad downstream tasks without the need for numerous costly annotations. However, there has been limited investigation into brain network foundation models, limiting their adaptability and generalizability for broad neuroscience studies. In this study, we aim to bridge this gap. In particular, (1) we curated a comprehensive dataset by collating images from 30 datasets, which comprises 70,781 samples of 46,686 participants. Moreover, we introduce pseudo-functional connectivity (pFC) to further generates millions of augmented brain networks by randomly dropping certain timepoints of the BOLD signal. (2) We propose the BrainMass framework for brain network self-supervised learning via mask modeling and feature alignment. BrainMass employs Mask-ROI Modeling (MRM) to bolster intra-network dependencies and regional specificity. Furthermore, Latent Representation Alignment (LRA) module is utilized to regularize augmented brain networks of the same participant with similar topological properties to yield similar latent representations by aligning their latent embeddings. Extensive experiments on eight internal tasks and seven external brain disorder diagnosis tasks show BrainMass's superior performance, highlighting its significant generalizability and adaptability. Nonetheless, BrainMass demonstrates powerful few/zero-shot learning abilities and exhibits meaningful interpretation to various diseases, showcasing its potential use for clinical applications.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141322234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1