首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery. 用于机器人手术中转诊视频器械分割的视频器械协同网络
Pub Date : 2024-07-11 DOI: 10.1109/TMI.2024.3426953
Hongqiu Wang, Guang Yang, Shichen Zhang, Jing Qin, Yike Guo, Bo Xu, Yueming Jin, Lei Zhu

Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).

手术器械分割对于促进机器人辅助手术中的认知智能至关重要。虽然现有的方法已经获得了精确的器械分割结果,但它们同时生成了所有器械的分割掩模,缺乏指定目标对象和实现交互体验的能力。本文的重点是机器人手术中一项新颖而重要的任务,即参考手术视频器械分割(RSVIS),其目的是从每个视频帧中自动识别和分割目标手术器械,并通过给定的语言表达进行参考。这种交互式功能提高了用户参与度和定制化体验,对下一代外科手术教育系统的开发大有裨益。为此,本文构建了两个手术视频数据集,以促进 RSVIS 的研究。然后,我们设计了一个新颖的视频-器械协同网络(VIS-Net)来学习视频级和器械级知识,以提高性能,而之前的工作只利用了视频级信息。同时,我们设计了一个基于图的关系感知模块(GRM)来模拟多模态信息(即文本描述和视频帧)之间的相关性,从而促进乐器级信息的提取。在两个 RSVIS 数据集上的大量实验结果表明,VIS-Net 的性能明显优于现有的最先进的指代分割方法。我们将发布我们的代码和数据集,供未来研究使用(Git)。
{"title":"Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery.","authors":"Hongqiu Wang, Guang Yang, Shichen Zhang, Jing Qin, Yike Guo, Bo Xu, Yueming Jin, Lei Zhu","doi":"10.1109/TMI.2024.3426953","DOIUrl":"https://doi.org/10.1109/TMI.2024.3426953","url":null,"abstract":"<p><p>Surgical instrument segmentation is fundamentally important for facilitating cognitive intelligence in robot-assisted surgery. Although existing methods have achieved accurate instrument segmentation results, they simultaneously generate segmentation masks of all instruments, which lack the capability to specify a target object and allow an interactive experience. This paper focuses on a novel and essential task in robotic surgery, i.e., Referring Surgical Video Instrument Segmentation (RSVIS), which aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression. This interactive feature offers enhanced user engagement and customized experiences, greatly benefiting the development of the next generation of surgical education systems. To achieve this, this paper constructs two surgery video datasets to promote the RSVIS research. Then, we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance, while previous work only utilized video-level information. Meanwhile, we design a Graph-based Relation-aware Module (GRM) to model the correlation between multi-modal information (i.e., textual description and video frame) to facilitate the extraction of instrument-level information. Extensive experimental results on two RSVIS datasets exhibit that the VIS-Net can significantly outperform existing state-of-the-art referring segmentation methods. We will release our code and dataset for future research (Git).</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering. 用于可解释医学视觉问题解答的反事实因果干预。
Pub Date : 2024-07-09 DOI: 10.1109/TMI.2024.3425533
Linqin Cai, Haodu Fang, Nuoying Xu, Bo Ren

Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. To explore the interpretability of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and performance of CCIS-MVQA.

医学视觉问题解答(VQA-Med)是一项具有挑战性的任务,涉及回答与医学图像相关的临床问题。然而,目前大多数 VQA-Med 方法都忽略了特定病变或异常特征与答案之间的因果关系,同时也无法为其决策提供准确的解释。为了探索 VQA-Med 的可解释性,本文提出了一种基于反事实因果干预策略的新型 CCIS-MVQA VQA-Med 模型。该模型由用于图像特征提取的改进 ResNet、用于问题特征提取的 GloVe 解码器、用于视觉和语言特征融合的双线性注意网络以及用于生成可解释性和预测结果的可解释性生成器组成。所提出的 CCIS-MVQA 引入了一种分层相关性传播方法,可自动生成反事实样本。此外,CCIS-MVQA 还将反事实因果推理应用于整个训练阶段,以提高可解释性和泛化能力。在三个基准数据集上进行的广泛实验表明,所提出的 CCIS-MVQA 模型优于最先进的方法。实验产生的可视化结果足以分析 CCIS-MVQA 的可解释性和性能。
{"title":"Counterfactual Causal-Effect Intervention for Interpretable Medical Visual Question Answering.","authors":"Linqin Cai, Haodu Fang, Nuoying Xu, Bo Ren","doi":"10.1109/TMI.2024.3425533","DOIUrl":"https://doi.org/10.1109/TMI.2024.3425533","url":null,"abstract":"<p><p>Medical Visual Question Answering (VQA-Med) is a challenging task that involves answering clinical questions related to medical images. However, most current VQA-Med methods ignore the causal correlation between specific lesion or abnormality features and answers, while also failing to provide accurate explanations for their decisions. To explore the interpretability of VQA-Med, this paper proposes a novel CCIS-MVQA model for VQA-Med based on a counterfactual causal-effect intervention strategy. This model consists of the modified ResNet for image feature extraction, a GloVe decoder for question feature extraction, a bilinear attention network for vision and language feature fusion, and an interpretability generator for producing the interpretability and prediction results. The proposed CCIS-MVQA introduces a layer-wise relevance propagation method to automatically generate counterfactual samples. Additionally, CCIS-MVQA applies counterfactual causal reasoning throughout the training phase to enhance interpretability and generalization. Extensive experiments on three benchmark datasets show that the proposed CCIS-MVQA model outperforms the state-of-the-art methods. Enough visualization results are produced to analyze the interpretability and performance of CCIS-MVQA.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attribute Prototype-guided Iterative Scene Graph for Explainable Radiology Report Generation. 属性原型指导下的迭代场景图,用于生成可解释的放射学报告。
Pub Date : 2024-07-08 DOI: 10.1109/TMI.2024.3424505
Ke Zhang, Yan Yang, Jun Yu, Jianping Fan, Hanliang Jiang, Qingming Huang, Weidong Han

The potential of automated radiology report generation in alleviating the time-consuming tasks of radiologists is increasingly being recognized in medical practice. Existing report generation methods have evolved from using image-level features to the latest approach of utilizing anatomical regions, significantly enhancing interpretability. However, directly and simplistically using region features for report generation compromises the capability of relation reasoning and overlooks the common attributes potentially shared across regions. To address these limitations, we propose a novel region-based Attribute Prototype-guided Iterative Scene Graph generation framework (AP-ISG) for report generation, utilizing scene graph generation as an auxiliary task to further enhance interpretability and relational reasoning capability. The core components of AP-ISG are the Iterative Scene Graph Generation (ISGG) module and the Attribute Prototype-guided Learning (APL) module. Specifically, ISSG employs an autoregressive scheme for structural edge reasoning and a contextualization mechanism for relational reasoning. APL enhances intra-prototype matching and reduces inter-prototype semantic overlap in the visual space to fully model the potential attribute commonalities among regions. Extensive experiments on the MIMIC-CXR with Chest ImaGenome datasets demonstrate the superiority of AP-ISG across multiple metrics.

在医疗实践中,人们越来越认识到自动生成放射报告在减轻放射医师耗时工作方面的潜力。现有的报告生成方法已从使用图像级特征发展到最新的利用解剖区域的方法,大大提高了可解释性。然而,直接简单地使用区域特征生成报告,会损害关系推理的能力,并忽略区域之间可能共享的共同属性。为了解决这些局限性,我们提出了一种新颖的基于区域属性原型引导的迭代场景图生成框架(AP-ISG)来生成报告,利用场景图生成作为辅助任务,进一步提高可解释性和关系推理能力。AP-ISG 的核心组件是迭代场景图生成(ISSG)模块和属性原型指导学习(APL)模块。具体来说,ISSG 采用自回归方案进行结构边缘推理,并采用上下文机制进行关系推理。APL 增强了视觉空间中的原型内匹配,减少了原型间的语义重叠,从而为区域间潜在的属性共性建立了完整的模型。利用胸部 ImaGenome 数据集在 MIMIC-CXR 上进行的大量实验证明了 AP-ISG 在多个指标上的优越性。
{"title":"Attribute Prototype-guided Iterative Scene Graph for Explainable Radiology Report Generation.","authors":"Ke Zhang, Yan Yang, Jun Yu, Jianping Fan, Hanliang Jiang, Qingming Huang, Weidong Han","doi":"10.1109/TMI.2024.3424505","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424505","url":null,"abstract":"<p><p>The potential of automated radiology report generation in alleviating the time-consuming tasks of radiologists is increasingly being recognized in medical practice. Existing report generation methods have evolved from using image-level features to the latest approach of utilizing anatomical regions, significantly enhancing interpretability. However, directly and simplistically using region features for report generation compromises the capability of relation reasoning and overlooks the common attributes potentially shared across regions. To address these limitations, we propose a novel region-based Attribute Prototype-guided Iterative Scene Graph generation framework (AP-ISG) for report generation, utilizing scene graph generation as an auxiliary task to further enhance interpretability and relational reasoning capability. The core components of AP-ISG are the Iterative Scene Graph Generation (ISGG) module and the Attribute Prototype-guided Learning (APL) module. Specifically, ISSG employs an autoregressive scheme for structural edge reasoning and a contextualization mechanism for relational reasoning. APL enhances intra-prototype matching and reduces inter-prototype semantic overlap in the visual space to fully model the potential attribute commonalities among regions. Extensive experiments on the MIMIC-CXR with Chest ImaGenome datasets demonstrate the superiority of AP-ISG across multiple metrics.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Carotid Vessel Wall Segmentation Through Domain Aligner, Topological Learning, and Segment Anything Model for Sparse Annotation in MR Images. 通过域对齐器、拓扑学习和用于 MR 图像稀疏注释的分段 Anything 模型进行颈动脉血管壁分段
Pub Date : 2024-07-08 DOI: 10.1109/TMI.2024.3424884
Xibao Li, Xi Ouyang, Jiadong Zhang, Zhongxiang Ding, Yuyao Zhang, Zhong Xue, Feng Shi, Dinggang Shen

Medical image analysis poses significant challenges due to limited availability of clinical data, which is crucial for training accurate models. This limitation is further compounded by the specialized and labor-intensive nature of the data annotation process. For example, despite the popularity of computed tomography angiography (CTA) in diagnosing atherosclerosis with an abundance of annotated datasets, magnetic resonance (MR) images stand out with better visualization for soft plaque and vessel wall characterization. However, the higher cost and limited accessibility of MR, as well as time-consuming nature of manual labeling, contribute to fewer annotated datasets. To address these issues, we formulate a multi-modal transfer learning network, named MT-Net, designed to learn from unpaired CTA and sparsely-annotated MR data. Additionally, we harness the Segment Anything Model (SAM) to synthesize additional MR annotations, enriching the training process. Specifically, our method first segments vessel lumen regions followed by precise characterization of carotid artery vessel walls, thereby ensuring both segmentation accuracy and clinical relevance. Validation of our method involved rigorous experimentation on publicly available datasets from COSMOS and CARE-II challenge, demonstrating its superior performance compared to existing state-of-the-art techniques.

由于临床数据的可用性有限,医学图像分析面临着巨大的挑战,而临床数据对于训练精确的模型至关重要。数据标注过程的专业性和劳动密集型进一步加剧了这一局限性。例如,尽管计算机断层扫描血管造影术(CTA)在诊断动脉粥样硬化方面很受欢迎,有大量的注释数据集,但磁共振(MR)图像在软斑块和血管壁特征描述方面具有更好的可视化效果。然而,磁共振成像的成本较高,可访问性有限,而且人工标注耗时,因此注释数据集较少。为了解决这些问题,我们建立了一个多模态迁移学习网络(名为 MT-Net),旨在从未配对的 CTA 和稀疏标注的 MR 数据中学习。此外,我们还利用 "任意分段模型"(SAM)来合成额外的磁共振注释,从而丰富了训练过程。具体来说,我们的方法首先分割血管腔区域,然后精确描述颈动脉血管壁的特征,从而确保分割的准确性和临床相关性。我们的方法在 COSMOS 和 CARE-II 挑战赛的公开数据集上进行了严格的实验验证,证明其性能优于现有的先进技术。
{"title":"Carotid Vessel Wall Segmentation Through Domain Aligner, Topological Learning, and Segment Anything Model for Sparse Annotation in MR Images.","authors":"Xibao Li, Xi Ouyang, Jiadong Zhang, Zhongxiang Ding, Yuyao Zhang, Zhong Xue, Feng Shi, Dinggang Shen","doi":"10.1109/TMI.2024.3424884","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424884","url":null,"abstract":"<p><p>Medical image analysis poses significant challenges due to limited availability of clinical data, which is crucial for training accurate models. This limitation is further compounded by the specialized and labor-intensive nature of the data annotation process. For example, despite the popularity of computed tomography angiography (CTA) in diagnosing atherosclerosis with an abundance of annotated datasets, magnetic resonance (MR) images stand out with better visualization for soft plaque and vessel wall characterization. However, the higher cost and limited accessibility of MR, as well as time-consuming nature of manual labeling, contribute to fewer annotated datasets. To address these issues, we formulate a multi-modal transfer learning network, named MT-Net, designed to learn from unpaired CTA and sparsely-annotated MR data. Additionally, we harness the Segment Anything Model (SAM) to synthesize additional MR annotations, enriching the training process. Specifically, our method first segments vessel lumen regions followed by precise characterization of carotid artery vessel walls, thereby ensuring both segmentation accuracy and clinical relevance. Validation of our method involved rigorous experimentation on publicly available datasets from COSMOS and CARE-II challenge, demonstrating its superior performance compared to existing state-of-the-art techniques.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified Multi-Modal Image Synthesis for Missing Modality Imputation. 用于缺失模态估算的统一多模态图像合成。
Pub Date : 2024-07-08 DOI: 10.1109/TMI.2024.3424785
Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S Kevin Zhou

Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.

多模态医学图像可提供互补的软组织特征,有助于疾病的筛查和诊断。然而,有限的扫描时间、图像损坏和各种成像协议往往导致多模态图像不完整,从而限制了多模态数据在临床上的应用。针对这一问题,我们在本文中提出了一种用于缺失模态估算的新型统一多模态图像合成方法。我们的方法总体上采用了生成对抗架构,旨在用单一模型从任意可用模态组合中合成缺失模态。为此,我们专门为生成器设计了共性和差异敏感编码器,以利用输入模态中包含的模态不变信息和特定信息。这两类信息的结合有助于生成具有一致解剖结构和所需分布的真实细节的图像。此外,我们还提出了一个动态特征统一模块,用于整合来自不同数量可用模态的信息,从而使网络对随机缺失模态具有鲁棒性。该模块同时执行硬整合和软整合,确保特征组合的有效性,同时避免信息丢失。经过在两个公开的多模态磁共振数据集上的验证,所提出的方法能有效地处理各种合成任务,与之前的方法相比表现出更优越的性能。
{"title":"Unified Multi-Modal Image Synthesis for Missing Modality Imputation.","authors":"Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S Kevin Zhou","doi":"10.1109/TMI.2024.3424785","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424785","url":null,"abstract":"<p><p>Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation. HiDiff:用于医学图像分割的混合扩散框架。
Pub Date : 2024-07-08 DOI: 10.1109/TMI.2024.3424471
Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan

Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff.

随着深度学习(DL)技术的快速发展,医学影像分割技术得到了长足的进步。现有的基于深度学习的分割模型通常是判别性的,即它们旨在学习从输入图像到分割掩膜的映射。然而,这些判别方法忽视了底层数据分布和内在类别特征,导致特征空间不稳定。在这项工作中,我们建议利用生成模型中的底层数据分布知识来补充判别式分割方法。为此,我们提出了一种用于医学影像分割的新型混合扩散框架,称为 HiDiff,它可以协同现有的判别分割模型和新的生成扩散模型的优势。HiDiff 包括两个关键部分:判别分割器和扩散细化器。首先,我们利用任何传统的训练有素的分割模型作为判别分割器,为扩散细化器提供分割掩码先验。其次,我们提出了一种新颖的二进制伯努利扩散模型(BBDM)作为扩散细化器,它可以通过对底层数据分布建模,有效、高效、交互式地细化分割掩码。第三,我们以交替协作的方式训练分割器和 BBDM,使其相互促进。在腹部器官、脑肿瘤、息肉和视网膜血管分割数据集(涵盖四种广泛使用的模式)上的大量实验结果表明,HiDiff 的性能优于现有的医疗分割算法,包括最先进的基于变换器和扩散的算法。此外,HiDiff 还擅长分割小物体,并能推广到新的数据集。源代码可从 https://github.com/takimailto/HiDiff 获取。
{"title":"HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation.","authors":"Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan","doi":"10.1109/TMI.2024.3424471","DOIUrl":"https://doi.org/10.1109/TMI.2024.3424471","url":null,"abstract":"<p><p>Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint regional uptake quantification of thorium-227 and radium-223 using a multiple-energy-window projection-domain quantitative SPECT method. 使用多能量窗投影域定量 SPECT 方法联合量化钍-227 和镭-223 的区域摄取量。
Pub Date : 2024-07-05 DOI: 10.1109/TMI.2024.3420228
Zekun Li, Nadia Benabdallah, Richard Laforest, Richard L Wahl, Daniel L J Thorek, Abhinav K Jha

Thorium-227 (227Th)-based α-particle radiopharmaceutical therapies (α-RPTs) are currently being investigated in several clinical and pre-clinical studies. After administration, 227Th decays to 223Ra, another α-particle-emitting isotope, which redistributes within the patient. Reliable dose quantification of both 227Th and 223Ra is clinically important, and SPECT may perform this quantification as these isotopes also emit X- and γ-ray photons. However, reliable quantification is challenging for several reasons: the orders-of-magnitude lower activity compared to conventional SPECT, resulting in a very low number of detected counts, the presence of multiple photopeaks, substantial overlap in the emission spectra of these isotopes, and the image-degrading effects in SPECT. To address these issues, we propose a multiple-energy-window projection-domain quantification (MEW-PDQ) method that jointly estimates the regional activity uptake of both 227Th and 223Ra directly using the SPECT projection data from multiple energy windows. We evaluated the method with realistic simulation studies conducted with anthropomorphic digital phantoms, including a virtual imaging trial, in the context of imaging patients with bone metastases of prostate cancer who were treated with 227Th-based α-RPTs. The proposed method yielded reliable (accurate and precise) regional uptake estimates of both isotopes and outperformed state-of-the-art methods across different lesion sizes and contrasts, as well as in the virtual imaging trial. This reliable performance was also observed with moderate levels of intra-regional heterogeneous uptake as well as when there were moderate inaccuracies in the definitions of the support of various regions. Additionally, we demonstrated the effectiveness of using multiple energy windows and the variance of the estimated uptake using the proposed method approached the Cramér-Rao-lower-bound-defined theoretical limit. These results provide strong evidence in support of this method for reliable uptake quantification in 227Th-based α-RPTs.

基于钍-227(227Th)的α粒子放射性药物疗法(α-RPTs)目前正在进行多项临床和临床前研究。用药后,227Th 会衰变为 223Ra(另一种α粒子发射同位素),并在患者体内重新分布。对 227Th 和 223Ra 进行可靠的剂量定量在临床上非常重要,SPECT 可以进行这种定量,因为这些同位素也会发射 X 和 γ 射线光子。然而,可靠的定量具有挑战性,原因有几个:与传统的 SPECT 相比,这些同位素的放射性活度低几个数量级,导致检测到的计数非常少;存在多个光峰;这些同位素的发射光谱存在大量重叠;以及 SPECT 的图像降级效应。为了解决这些问题,我们提出了一种多能量窗投影域量化(MEW-PDQ)方法,该方法直接利用多个能量窗的 SPECT 投影数据来联合估算 227Th 和 223Ra 的区域活性吸收。我们利用拟人数字模型(包括虚拟成像试验)进行了逼真的模拟研究,评估了该方法在前列腺癌骨转移患者接受基于 227Th 的 α-RPTs 治疗时的成像效果。在不同的病灶大小和对比度以及虚拟成像试验中,所提出的方法对两种同位素的区域摄取量都做出了可靠(准确和精确)的估计,并优于最先进的方法。在中度区域内异质摄取以及各区域支持定义存在中度误差的情况下,也能观察到这种可靠的性能。此外,我们还证明了使用多个能量窗口的有效性,而且使用所提方法估算的摄取量方差接近克拉梅尔-拉奥下限定义的理论极限。这些结果为该方法在基于 227Th 的 α-RPT 中进行可靠的吸收定量提供了有力的支持。
{"title":"Joint regional uptake quantification of thorium-227 and radium-223 using a multiple-energy-window projection-domain quantitative SPECT method.","authors":"Zekun Li, Nadia Benabdallah, Richard Laforest, Richard L Wahl, Daniel L J Thorek, Abhinav K Jha","doi":"10.1109/TMI.2024.3420228","DOIUrl":"10.1109/TMI.2024.3420228","url":null,"abstract":"<p><p>Thorium-227 (<sup>227</sup>Th)-based α-particle radiopharmaceutical therapies (α-RPTs) are currently being investigated in several clinical and pre-clinical studies. After administration, <sup>227</sup>Th decays to <sup>223</sup>Ra, another α-particle-emitting isotope, which redistributes within the patient. Reliable dose quantification of both <sup>227</sup>Th and <sup>223</sup>Ra is clinically important, and SPECT may perform this quantification as these isotopes also emit X- and γ-ray photons. However, reliable quantification is challenging for several reasons: the orders-of-magnitude lower activity compared to conventional SPECT, resulting in a very low number of detected counts, the presence of multiple photopeaks, substantial overlap in the emission spectra of these isotopes, and the image-degrading effects in SPECT. To address these issues, we propose a multiple-energy-window projection-domain quantification (MEW-PDQ) method that jointly estimates the regional activity uptake of both <sup>227</sup>Th and <sup>223</sup>Ra directly using the SPECT projection data from multiple energy windows. We evaluated the method with realistic simulation studies conducted with anthropomorphic digital phantoms, including a virtual imaging trial, in the context of imaging patients with bone metastases of prostate cancer who were treated with <sup>227</sup>Th-based α-RPTs. The proposed method yielded reliable (accurate and precise) regional uptake estimates of both isotopes and outperformed state-of-the-art methods across different lesion sizes and contrasts, as well as in the virtual imaging trial. This reliable performance was also observed with moderate levels of intra-regional heterogeneous uptake as well as when there were moderate inaccuracies in the definitions of the support of various regions. Additionally, we demonstrated the effectiveness of using multiple energy windows and the variance of the estimated uptake using the proposed method approached the Cramér-Rao-lower-bound-defined theoretical limit. These results provide strong evidence in support of this method for reliable uptake quantification in <sup>227</sup>Th-based α-RPTs.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141539127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A denoising diffusion probabilistic model for metal artifact reduction in CT. 用于减少 CT 中金属伪影的去噪扩散概率模型。
Pub Date : 2024-07-04 DOI: 10.1109/TMI.2024.3416398
Grigorios M Karageorgos, Jiayong Zhang, Nils Peters, Wenjun Xia, Chuang Niu, Harald Paganetti, Ge Wang, Bruno De Man

The presence of metal objects leads to corrupted CT projection measurements, resulting in metal artifacts in the reconstructed CT images. AI promises to offer improved solutions to estimate missing sinogram data for metal artifact reduction (MAR), as previously shown with convolutional neural networks (CNNs) and generative adversarial networks (GANs). Recently, denoising diffusion probabilistic models (DDPM) have shown great promise in image generation tasks, potentially outperforming GANs. In this study, a DDPM-based approach is proposed for inpainting of missing sinogram data for improved MAR. The proposed model is unconditionally trained, free from information on metal objects, which can potentially enhance its generalization capabilities across different types of metal implants compared to conditionally trained approaches. The performance of the proposed technique was evaluated and compared to the state-of-the-art normalized MAR (NMAR) approach as well as to CNN-based and GAN-based MAR approaches. The DDPM-based approach provided significantly higher SSIM and PSNR, as compared to NMAR (SSIM: p < 10-26; PSNR: p < 10-21), the CNN (SSIM: p < 10-25; PSNR: p < 10-9) and the GAN (SSIM: p < 10-6; PSNR: p < 0.05) methods. The DDPM-MAR technique was further evaluated based on clinically relevant image quality metrics on clinical CT images with virtually introduced metal objects and metal artifacts, demonstrating superior quality relative to the other three models. In general, the AI-based techniques showed improved MAR performance compared to the non-AI-based NMAR approach. The proposed methodology shows promise in enhancing the effectiveness of MAR, and therefore improving the diagnostic accuracy of CT.

金属物体的存在会破坏 CT 投影测量,导致重建的 CT 图像中出现金属伪影。人工智能有望提供更好的解决方案来估计缺失的正弦曲线数据,以减少金属伪影(MAR),正如之前卷积神经网络(CNN)和生成对抗网络(GAN)所显示的那样。最近,去噪扩散概率模型(DDPM)在图像生成任务中显示出了巨大的潜力,有可能超越 GANs。本研究提出了一种基于 DDPM 的方法,用于对缺失的正弦曲线数据进行内绘,以改善 MAR。所提出的模型是无条件训练的,不受金属物体信息的影响,与有条件训练的方法相比,这有可能增强其对不同类型金属植入物的泛化能力。对所提出技术的性能进行了评估,并与最先进的归一化 MAR(NMAR)方法以及基于 CNN 和基于 GAN 的 MAR 方法进行了比较。与 NMAR(SSIM:p < 10-26;PSNR:p < 10-21)、CNN(SSIM:p < 10-25;PSNR:p < 10-9)和 GAN(SSIM:p < 10-6;PSNR:p < 0.05)方法相比,基于 DDPM 的方法提供了明显更高的 SSIM 和 PSNR。根据临床相关的图像质量指标,在实际引入金属物体和金属伪影的临床 CT 图像上对 DDPM-MAR 技术进行了进一步评估,结果显示其质量优于其他三种模型。总体而言,与非基于人工智能的 NMAR 方法相比,基于人工智能的技术显示出更高的 MAR 性能。所提出的方法有望提高 MAR 的有效性,从而提高 CT 诊断的准确性。
{"title":"A denoising diffusion probabilistic model for metal artifact reduction in CT.","authors":"Grigorios M Karageorgos, Jiayong Zhang, Nils Peters, Wenjun Xia, Chuang Niu, Harald Paganetti, Ge Wang, Bruno De Man","doi":"10.1109/TMI.2024.3416398","DOIUrl":"https://doi.org/10.1109/TMI.2024.3416398","url":null,"abstract":"<p><p>The presence of metal objects leads to corrupted CT projection measurements, resulting in metal artifacts in the reconstructed CT images. AI promises to offer improved solutions to estimate missing sinogram data for metal artifact reduction (MAR), as previously shown with convolutional neural networks (CNNs) and generative adversarial networks (GANs). Recently, denoising diffusion probabilistic models (DDPM) have shown great promise in image generation tasks, potentially outperforming GANs. In this study, a DDPM-based approach is proposed for inpainting of missing sinogram data for improved MAR. The proposed model is unconditionally trained, free from information on metal objects, which can potentially enhance its generalization capabilities across different types of metal implants compared to conditionally trained approaches. The performance of the proposed technique was evaluated and compared to the state-of-the-art normalized MAR (NMAR) approach as well as to CNN-based and GAN-based MAR approaches. The DDPM-based approach provided significantly higher SSIM and PSNR, as compared to NMAR (SSIM: p < 10<sup>-26</sup>; PSNR: p < 10<sup>-21</sup>), the CNN (SSIM: p < 10<sup>-25</sup>; PSNR: p < 10<sup>-9</sup>) and the GAN (SSIM: p < 10<sup>-6</sup>; PSNR: p < 0.05) methods. The DDPM-MAR technique was further evaluated based on clinically relevant image quality metrics on clinical CT images with virtually introduced metal objects and metal artifacts, demonstrating superior quality relative to the other three models. In general, the AI-based techniques showed improved MAR performance compared to the non-AI-based NMAR approach. The proposed methodology shows promise in enhancing the effectiveness of MAR, and therefore improving the diagnostic accuracy of CT.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141536216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating Aberration-Induced Noise: A Deep Learning-Based Aberration-to-Aberration Approach. 减少像差引起的噪音:基于深度学习的 "从像差到像差 "方法。
Pub Date : 2024-07-03 DOI: 10.1109/TMI.2024.3422027
Mostafa Sharifzadeh, Sobhan Goudarzi, An Tang, Habib Benali, Hassan Rivaz

One of the primary sources of suboptimal image quality in ultrasound imaging is phase aberration. It is caused by spatial changes in sound speed over a heterogeneous medium, which disturbs the transmitted waves and prevents coherent summation of echo signals. Obtaining non-aberrated ground truths in real-world scenarios can be extremely challenging, if not impossible. This challenge hinders the performance of deep learning-based techniques due to the domain shift between simulated and experimental data. Here, for the first time, we propose a deep learning-based method that does not require ground truth to correct the phase aberration problem and, as such, can be directly trained on real data. We train a network wherein both the input and target output are randomly aberrated radio frequency (RF) data. Moreover, we demonstrate that a conventional loss function such as mean square error is inadequate for training such a network to achieve optimal performance. Instead, we propose an adaptive mixed loss function that employs both B-mode and RF data, resulting in more efficient convergence and enhanced performance. Finally, we publicly release our dataset, comprising over 180,000 aberrated single plane-wave images (RF data), wherein phase aberrations are modeled as near-field phase screens. Although not utilized in the proposed method, each aberrated image is paired with its corresponding aberration profile and the non-aberrated version, aiming to mitigate the data scarcity problem in developing deep learning-based techniques for phase aberration correction. Source code and trained model are also available along with the dataset at http://code.sonography.ai/main-aaa.

相位差是超声成像图像质量不佳的主要原因之一。相位差是由异质介质上声速的空间变化引起的,它干扰了传输波,阻碍了回波信号的连贯求和。在现实世界中获取无像差的地面实况极具挑战性,甚至是不可能的。由于模拟数据和实验数据之间存在域偏移,这一挑战阻碍了基于深度学习技术的性能。在这里,我们首次提出了一种基于深度学习的方法,它不需要地面实况来纠正相差问题,因此可以直接在真实数据上进行训练。我们训练了一个网络,其输入和目标输出都是随机畸变的射频(RF)数据。此外,我们还证明了均方误差等传统损失函数不足以训练这样的网络以达到最佳性能。相反,我们提出了一种自适应混合损失函数,同时采用 B 模式和射频数据,从而提高了收敛效率和性能。最后,我们公开发布了我们的数据集,其中包括 180,000 多幅畸变的单平面波图像(射频数据),其中相位畸变被建模为近场相位屏。虽然在所提出的方法中没有使用,但每幅畸变图像都与相应的畸变轮廓和非畸变版本配对,目的是在开发基于深度学习的相位差校正技术时缓解数据稀缺问题。源代码和训练好的模型以及数据集也可在 http://code.sonography.ai/main-aaa 上获取。
{"title":"Mitigating Aberration-Induced Noise: A Deep Learning-Based Aberration-to-Aberration Approach.","authors":"Mostafa Sharifzadeh, Sobhan Goudarzi, An Tang, Habib Benali, Hassan Rivaz","doi":"10.1109/TMI.2024.3422027","DOIUrl":"10.1109/TMI.2024.3422027","url":null,"abstract":"<p><p>One of the primary sources of suboptimal image quality in ultrasound imaging is phase aberration. It is caused by spatial changes in sound speed over a heterogeneous medium, which disturbs the transmitted waves and prevents coherent summation of echo signals. Obtaining non-aberrated ground truths in real-world scenarios can be extremely challenging, if not impossible. This challenge hinders the performance of deep learning-based techniques due to the domain shift between simulated and experimental data. Here, for the first time, we propose a deep learning-based method that does not require ground truth to correct the phase aberration problem and, as such, can be directly trained on real data. We train a network wherein both the input and target output are randomly aberrated radio frequency (RF) data. Moreover, we demonstrate that a conventional loss function such as mean square error is inadequate for training such a network to achieve optimal performance. Instead, we propose an adaptive mixed loss function that employs both B-mode and RF data, resulting in more efficient convergence and enhanced performance. Finally, we publicly release our dataset, comprising over 180,000 aberrated single plane-wave images (RF data), wherein phase aberrations are modeled as near-field phase screens. Although not utilized in the proposed method, each aberrated image is paired with its corresponding aberration profile and the non-aberrated version, aiming to mitigate the data scarcity problem in developing deep learning-based techniques for phase aberration correction. Source code and trained model are also available along with the dataset at http://code.sonography.ai/main-aaa.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Convolutional-Transformer Model for FFR and iFR Assessment From Coronary Angiography 从冠状动脉造影评估 FFR 和 iFR 的卷积变换器模型
Pub Date : 2024-07-02 DOI: 10.1109/TMI.2024.3383283
Raffaele Mineo;F. Proietto Salanitri;G. Bellitto;I. Kavasidis;O. De Filippo;M. Millesimo;G. M. De Ferrari;M. Aldinucci;D. Giordano;S. Palazzo;F. D’Ascenzo;C. Spampinato
The quantification of stenosis severity from X-ray catheter angiography is a challenging task. Indeed, this requires to fully understand the lesion’s geometry by analyzing dynamics of the contrast material, only relying on visual observation by clinicians. To support decision making for cardiac intervention, we propose a hybrid CNN-Transformer model for the assessment of angiography-based non-invasive fractional flow-reserve (FFR) and instantaneous wave-free ratio (iFR) of intermediate coronary stenosis. Our approach predicts whether a coronary artery stenosis is hemodynamically significant and provides direct FFR and iFR estimates. This is achieved through a combination of regression and classification branches that forces the model to focus on the cut-off region of FFR (around 0.8 FFR value), which is highly critical for decision-making. We also propose a spatio-temporal factorization mechanisms that redesigns the transformer’s self-attention mechanism to capture both local spatial and temporal interactions between vessel geometry, blood flow dynamics, and lesion morphology. The proposed method achieves state-of-the-art performance on a dataset of 778 exams from 389 patients. Unlike existing methods, our approach employs a single angiography view and does not require knowledge of the key frame; supervision at training time is provided by a classification loss (based on a threshold of the FFR/iFR values) and a regression loss for direct estimation. Finally, the analysis of model interpretability and calibration shows that, in spite of the complexity of angiographic imaging data, our method can robustly identify the location of the stenosis and correlate prediction uncertainty to the provided output scores.
通过 X 射线导管血管造影量化血管狭窄的严重程度是一项具有挑战性的任务。事实上,这需要通过分析造影剂的动态变化来充分了解病变的几何形状,而临床医生只能依靠肉眼观察。为了支持心脏介入治疗的决策,我们提出了一种混合 CNN-Transformer 模型,用于评估基于血管造影的无创血流储备分数(FFR)和中度冠状动脉狭窄的瞬时无波比(iFR)。我们的方法可以预测冠状动脉狭窄是否具有显著的血流动力学意义,并提供直接的 FFR 和 iFR 估计值。这是通过将回归和分类分支相结合来实现的,这就迫使模型关注 FFR 的临界区域(FFR 值在 0.8 左右),这对决策至关重要。我们还提出了一种时空因式分解机制,重新设计了转换器的自我注意机制,以捕捉血管几何形状、血流动力学和病变形态之间的局部时空相互作用。所提出的方法在来自 389 名患者的 778 个检查数据集上取得了最先进的性能。与现有方法不同的是,我们的方法采用单一血管造影视图,不需要了解关键帧;训练时的监督由分类损失(基于 FFR/iFR 值的阈值)和用于直接估计的回归损失提供。最后,对模型可解释性和校准性的分析表明,尽管血管造影成像数据很复杂,我们的方法仍能稳健地确定血管狭窄的位置,并将预测的不确定性与所提供的输出分数相关联。
{"title":"A Convolutional-Transformer Model for FFR and iFR Assessment From Coronary Angiography","authors":"Raffaele Mineo;F. Proietto Salanitri;G. Bellitto;I. Kavasidis;O. De Filippo;M. Millesimo;G. M. De Ferrari;M. Aldinucci;D. Giordano;S. Palazzo;F. D’Ascenzo;C. Spampinato","doi":"10.1109/TMI.2024.3383283","DOIUrl":"10.1109/TMI.2024.3383283","url":null,"abstract":"The quantification of stenosis severity from X-ray catheter angiography is a challenging task. Indeed, this requires to fully understand the lesion’s geometry by analyzing dynamics of the contrast material, only relying on visual observation by clinicians. To support decision making for cardiac intervention, we propose a hybrid CNN-Transformer model for the assessment of angiography-based non-invasive fractional flow-reserve (FFR) and instantaneous wave-free ratio (iFR) of intermediate coronary stenosis. Our approach predicts whether a coronary artery stenosis is hemodynamically significant and provides direct FFR and iFR estimates. This is achieved through a combination of regression and classification branches that forces the model to focus on the cut-off region of FFR (around 0.8 FFR value), which is highly critical for decision-making. We also propose a spatio-temporal factorization mechanisms that redesigns the transformer’s self-attention mechanism to capture both local spatial and temporal interactions between vessel geometry, blood flow dynamics, and lesion morphology. The proposed method achieves state-of-the-art performance on a dataset of 778 exams from 389 patients. Unlike existing methods, our approach employs a single angiography view and does not require knowledge of the key frame; supervision at training time is provided by a classification loss (based on a threshold of the FFR/iFR values) and a regression loss for direct estimation. Finally, the analysis of model interpretability and calibration shows that, in spite of the complexity of angiographic imaging data, our method can robustly identify the location of the stenosis and correlate prediction uncertainty to the provided output scores.","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10582501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1