首页 > 最新文献

IEEE transactions on medical imaging最新文献

英文 中文
Prior Knowledge-guided Triple-Domain Transformer-GAN for Direct PET Reconstruction from Low-Count Sinograms. 先验知识指导下的三域变换器-广义正电子发射计算机模型(Triple-Domain Transformer-GAN for Direct PET Reconstruction from Low-Count Sinograms)。
Pub Date : 2024-06-13 DOI: 10.1109/TMI.2024.3413832
Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Yuanyuan Xu, Peng Wang, Jiliu Zhou, Yan Wang, Dinggang Shen

To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been dedicated to acquiring standard-count PET (SPET) from low-count PET (LPET). However, current methods have failed to take full advantage of the different emphasized information from multiple domains, i.e., the sinogram, image, and frequency domains, resulting in the loss of crucial details. Meanwhile, they overlook the unique inner-structure of the sinograms, thereby failing to fully capture its structural characteristics and relationships. To alleviate these problems, in this paper, we proposed a prior knowledge-guided transformer-GAN that unites triple domains of sinogram, image, and frequency to directly reconstruct SPET images from LPET sinograms, namely PK-TriDo. Our PK-TriDo consists of a Sinogram Inner-Structure-based Denoising Transformer (SISD-Former) to denoise the input LPET sinogram, a Frequency-adapted Image Reconstruction Transformer (FaIR-Former) to reconstruct high-quality SPET images from the denoised sinograms guided by the image domain prior knowledge, and an Adversarial Network (AdvNet) to further enhance the reconstruction quality via adversarial training. Specifically tailored for the PET imaging mechanism, we injected a sinogram embedding module that partitions the sinograms by rows and columns to obtain 1D sequences of angles and distances to faithfully preserve the inner-structure of the sinograms. Moreover, to mitigate high-frequency distortions and enhance reconstruction details, we integrated global-local frequency parsers (GLFPs) into FaIR-Former to calibrate the distributions and proportions of different frequency bands, thus compelling the network to preserve high-frequency details. Evaluations on three datasets with different dose levels and imaging scenarios demonstrated that our PK-TriDo outperforms the state-of-the-art methods.

为了获得高质量的正电子发射断层扫描(PET)图像,同时最大限度地减少辐射照射,许多方法都致力于从低计数 PET(LPET)中获取标准计数 PET(SPET)。然而,目前的方法未能充分利用来自多个域(即正弦图、图像和频域)的不同强调信息,导致关键细节丢失。同时,这些方法忽略了正弦曲线独特的内部结构,因而无法全面捕捉其结构特征和关系。为了解决这些问题,我们在本文中提出了一种先验知识指导下的变换器-GAN,即 PK-TriDo,它能将正弦图、图像和频率三重域结合起来,直接从 LPET 正弦图重建 SPET 图像。我们的PK-TriDo由一个基于正弦图内部结构的去噪变换器(SISD-Former)、一个频率适应图像重建变换器(FaIR-Former)和一个对抗网络(Adversarial Network,AdvNet)组成,前者用于对输入的LPET正弦图进行去噪,后者用于在图像域先验知识的指导下从去噪的正弦图重建高质量的SPET图像。针对 PET 成像机制,我们注入了一个正弦图嵌入模块,该模块按行和列对正弦图进行分割,以获得角度和距离的一维序列,从而忠实地保留正弦图的内部结构。此外,为了减轻高频失真并增强重建细节,我们在 FaIR-Former 中集成了全局-局部频率解析器(GLFP),以校准不同频段的分布和比例,从而迫使网络保留高频细节。在三个具有不同剂量水平和成像场景的数据集上进行的评估表明,我们的 PK-TriDo 优于最先进的方法。
{"title":"Prior Knowledge-guided Triple-Domain Transformer-GAN for Direct PET Reconstruction from Low-Count Sinograms.","authors":"Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Yuanyuan Xu, Peng Wang, Jiliu Zhou, Yan Wang, Dinggang Shen","doi":"10.1109/TMI.2024.3413832","DOIUrl":"https://doi.org/10.1109/TMI.2024.3413832","url":null,"abstract":"<p><p>To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been dedicated to acquiring standard-count PET (SPET) from low-count PET (LPET). However, current methods have failed to take full advantage of the different emphasized information from multiple domains, i.e., the sinogram, image, and frequency domains, resulting in the loss of crucial details. Meanwhile, they overlook the unique inner-structure of the sinograms, thereby failing to fully capture its structural characteristics and relationships. To alleviate these problems, in this paper, we proposed a prior knowledge-guided transformer-GAN that unites triple domains of sinogram, image, and frequency to directly reconstruct SPET images from LPET sinograms, namely PK-TriDo. Our PK-TriDo consists of a Sinogram Inner-Structure-based Denoising Transformer (SISD-Former) to denoise the input LPET sinogram, a Frequency-adapted Image Reconstruction Transformer (FaIR-Former) to reconstruct high-quality SPET images from the denoised sinograms guided by the image domain prior knowledge, and an Adversarial Network (AdvNet) to further enhance the reconstruction quality via adversarial training. Specifically tailored for the PET imaging mechanism, we injected a sinogram embedding module that partitions the sinograms by rows and columns to obtain 1D sequences of angles and distances to faithfully preserve the inner-structure of the sinograms. Moreover, to mitigate high-frequency distortions and enhance reconstruction details, we integrated global-local frequency parsers (GLFPs) into FaIR-Former to calibrate the distributions and proportions of different frequency bands, thus compelling the network to preserve high-frequency details. Evaluations on three datasets with different dose levels and imaging scenarios demonstrated that our PK-TriDo outperforms the state-of-the-art methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141319367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterization of Polarimetric Properties in Various Brain Tumor Types Using Wide-Field Imaging Mueller Polarimetry. 利用宽视场成像穆勒偏振测量法鉴定各种脑肿瘤的偏振特性
Pub Date : 2024-06-12 DOI: 10.1109/TMI.2024.3413288
Romane Gros, Omar Rodriguez-Nunez, Leonard Felger, Stefano Moriconi, Richard McKinley, Angelo Pierangelo, Tatiana Novikova, Erik Vassella, Philippe Schucht, Ekkehard Hewer, Theoni Maragkou

Neuro-oncological surgery is the primary brain cancer treatment, yet it faces challenges with gliomas due to their invasiveness and the need to preserve neurological function. Hence, radical resection is often unfeasible, highlighting the importance of precise tumor margin delineation to prevent neurological deficits and improve prognosis. Imaging Mueller polarimetry, an effective modality in various organ tissues, seems a promising approach for tumor delineation in neurosurgery. To further assess its use, we characterized the polarimetric properties by analysing 45 polarimetric measurements of 27 fresh brain tumor samples, including different tumor types with a strong focus on gliomas. Our study integrates a wide-field imaging Mueller polarimetric system and a novel neuropathology protocol, correlating polarimetric and histological data for accurate tissue identification. An image processing pipeline facilitated the alignment and overlay of polarimetric images and histological masks. Variations in depolarization values were observed for grey and white matter of brain tumor tissue, while differences in linear retardance were seen only within white matter of brain tumor tissue. Notably, we identified pronounced optical axis azimuth randomization within tumor regions. This study lays the foundation for machine learning-based brain tumor segmentation algorithms using polarimetric data, facilitating intraoperative diagnosis and decision making.

神经肿瘤外科手术是治疗脑癌的主要方法,但由于胶质瘤具有侵袭性,且需要保护神经功能,因此面临着挑战。因此,根治性切除往往是不可行的,这凸显了精确划分肿瘤边缘对防止神经功能缺损和改善预后的重要性。成像穆勒偏振测量法是一种在各种器官组织中都很有效的方法,它似乎是神经外科中一种很有前景的肿瘤划定方法。为了进一步评估其应用,我们分析了 27 个新鲜脑肿瘤样本的 45 次偏振测量结果,其中包括不同的肿瘤类型,重点是胶质瘤。我们的研究整合了宽视场成像穆勒极谱分析系统和新颖的神经病理学方案,将极谱分析和组织学数据关联起来,以准确识别组织。图像处理管道有助于偏振图像和组织学掩膜的对齐和叠加。在脑肿瘤组织的灰质和白质中观察到了去极化值的变化,而线性延迟的差异仅出现在脑肿瘤组织的白质中。值得注意的是,我们在肿瘤区域内发现了明显的光轴方位随机性。这项研究为使用偏振数据进行基于机器学习的脑肿瘤分割算法奠定了基础,有助于术中诊断和决策制定。
{"title":"Characterization of Polarimetric Properties in Various Brain Tumor Types Using Wide-Field Imaging Mueller Polarimetry.","authors":"Romane Gros, Omar Rodriguez-Nunez, Leonard Felger, Stefano Moriconi, Richard McKinley, Angelo Pierangelo, Tatiana Novikova, Erik Vassella, Philippe Schucht, Ekkehard Hewer, Theoni Maragkou","doi":"10.1109/TMI.2024.3413288","DOIUrl":"10.1109/TMI.2024.3413288","url":null,"abstract":"<p><p>Neuro-oncological surgery is the primary brain cancer treatment, yet it faces challenges with gliomas due to their invasiveness and the need to preserve neurological function. Hence, radical resection is often unfeasible, highlighting the importance of precise tumor margin delineation to prevent neurological deficits and improve prognosis. Imaging Mueller polarimetry, an effective modality in various organ tissues, seems a promising approach for tumor delineation in neurosurgery. To further assess its use, we characterized the polarimetric properties by analysing 45 polarimetric measurements of 27 fresh brain tumor samples, including different tumor types with a strong focus on gliomas. Our study integrates a wide-field imaging Mueller polarimetric system and a novel neuropathology protocol, correlating polarimetric and histological data for accurate tissue identification. An image processing pipeline facilitated the alignment and overlay of polarimetric images and histological masks. Variations in depolarization values were observed for grey and white matter of brain tumor tissue, while differences in linear retardance were seen only within white matter of brain tumor tissue. Notably, we identified pronounced optical axis azimuth randomization within tumor regions. This study lays the foundation for machine learning-based brain tumor segmentation algorithms using polarimetric data, facilitating intraoperative diagnosis and decision making.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video-based Soft Tissue Deformation Tracking for Laparoscopic Augmented Reality-based Navigation in Kidney Surgery. 基于视频的软组织变形追踪技术用于肾脏手术中的腹腔镜增强现实导航。
Pub Date : 2024-06-12 DOI: 10.1109/TMI.2024.3413537
Enpeng Wang, Yueang Liu, Puxun Tu, Zeike A Taylor, Xiaojun Chen

Minimally invasive surgery (MIS) remains technically demanding due to the difficulty of tracking hidden critical structures within the moving anatomy of the patient. In this study, we propose a soft tissue deformation tracking augmented reality (AR) navigation pipeline for laparoscopic surgery of the kidneys. The proposed navigation pipeline addresses two main sub-problems: the initial registration and deformation tracking. Our method utilizes preoperative MR or CT data and binocular laparoscopes without any additional interventional hardware. The initial registration is resolved through a probabilistic rigid registration algorithm and elastic compensation based on dense point cloud reconstruction. For deformation tracking, the sparse feature point displacement vector field continuously provides temporal boundary conditions for the biomechanical model. To enhance the accuracy of the displacement vector field, a novel feature points selection strategy based on deep learning is proposed. Moreover, an ex-vivo experimental method for internal structures error assessment is presented. The ex-vivo experiments indicate an external surface reprojection error of 4.07 ± 2.17mm and a maximum mean absolutely error for internal structures of 2.98mm. In-vivo experiments indicate mean absolutely error of 3.28 ± 0.40mm and 1.90±0.24mm, respectively. The combined qualitative and quantitative findings indicated the potential of our AR-assisted navigation system in improving the clinical application of laparoscopic kidney surgery.

微创手术(MIS)对技术的要求仍然很高,因为很难跟踪病人移动解剖结构中隐藏的关键结构。在这项研究中,我们提出了一种用于肾脏腹腔镜手术的软组织形变跟踪增强现实(AR)导航管道。所提出的导航管道主要解决两个子问题:初始注册和形变跟踪。我们的方法利用术前 MR 或 CT 数据和双目腹腔镜,无需任何额外的介入硬件。初始配准通过概率刚性配准算法和基于密集点云重建的弹性补偿来解决。在形变跟踪方面,稀疏特征点位移矢量场不断为生物力学模型提供时间边界条件。为了提高位移矢量场的精度,提出了一种基于深度学习的新型特征点选择策略。此外,还提出了一种用于评估内部结构误差的体外实验方法。体外实验表明,外表面重投影误差为 4.07 ± 2.17 毫米,内部结构的最大平均绝对误差为 2.98 毫米。体内实验显示平均绝对误差分别为 3.28 ± 0.40 毫米和 1.90 ± 0.24 毫米。综合定性和定量研究结果表明,我们的 AR 辅助导航系统在改善腹腔镜肾脏手术的临床应用方面具有巨大潜力。
{"title":"Video-based Soft Tissue Deformation Tracking for Laparoscopic Augmented Reality-based Navigation in Kidney Surgery.","authors":"Enpeng Wang, Yueang Liu, Puxun Tu, Zeike A Taylor, Xiaojun Chen","doi":"10.1109/TMI.2024.3413537","DOIUrl":"10.1109/TMI.2024.3413537","url":null,"abstract":"<p><p>Minimally invasive surgery (MIS) remains technically demanding due to the difficulty of tracking hidden critical structures within the moving anatomy of the patient. In this study, we propose a soft tissue deformation tracking augmented reality (AR) navigation pipeline for laparoscopic surgery of the kidneys. The proposed navigation pipeline addresses two main sub-problems: the initial registration and deformation tracking. Our method utilizes preoperative MR or CT data and binocular laparoscopes without any additional interventional hardware. The initial registration is resolved through a probabilistic rigid registration algorithm and elastic compensation based on dense point cloud reconstruction. For deformation tracking, the sparse feature point displacement vector field continuously provides temporal boundary conditions for the biomechanical model. To enhance the accuracy of the displacement vector field, a novel feature points selection strategy based on deep learning is proposed. Moreover, an ex-vivo experimental method for internal structures error assessment is presented. The ex-vivo experiments indicate an external surface reprojection error of 4.07 ± 2.17mm and a maximum mean absolutely error for internal structures of 2.98mm. In-vivo experiments indicate mean absolutely error of 3.28 ± 0.40mm and 1.90±0.24mm, respectively. The combined qualitative and quantitative findings indicated the potential of our AR-assisted navigation system in improving the clinical application of laparoscopic kidney surgery.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse-view Spectral CT Reconstruction and Material Decomposition based on Multi-channel SGM. 基于多通道 SGM 的稀疏视图光谱 CT 重建和材料分解。
Pub Date : 2024-06-12 DOI: 10.1109/TMI.2024.3413085
Yuedong Liu, Xuan Zhou, Cunfeng Wei, Qiong Xu

In medical applications, the diffusion of contrast agents in tissue can reflect the physiological function of organisms, so it is valuable to quantify the distribution and content of contrast agents in the body over a period. Spectral CT has the advantages of multi-energy projection acquisition and material decomposition, which can quantify K-edge contrast agents. However, multiple repetitive spectral CT scans can cause excessive radiation doses. Sparse-view scanning is commonly used to reduce dose and scan time, but its reconstructed images are usually accompanied by streaking artifacts, which leads to inaccurate quantification of the contrast agents. To solve this problem, an unsupervised sparse-view spectral CT reconstruction and material decomposition algorithm based on the multi-channel score-based generative model (SGM) is proposed in this paper. First, multi-energy images and tissue images are used as multi-channel input data for SGM training. Secondly, the organism is multiply scanned in sparse views, and the trained SGM is utilized to generate multi-energy images and tissue images driven by sparse-view projections. After that, a material decomposition algorithm using tissue images generated by SGM as prior images for solving contrast agent images is established. Finally, the distribution and content of the contrast agents are obtained. The comparison and evaluation of this method are given in this paper, and a series of mouse scanning experiments are carried out to verify the effectiveness of the method.

在医学应用中,造影剂在组织中的扩散可以反映生物体的生理功能,因此量化造影剂在体内一段时间内的分布和含量非常有价值。光谱 CT 具有多能量投影采集和物质分解的优点,可以量化 K 边造影剂。然而,多次重复光谱 CT 扫描会导致辐射剂量超标。稀疏视图扫描常用于减少剂量和扫描时间,但其重建图像通常伴有条纹伪影,导致造影剂定量不准确。为解决这一问题,本文提出了一种基于多通道评分生成模型(SGM)的无监督稀疏视图光谱 CT 重建和物质分解算法。首先,将多能量图像和组织图像作为 SGM 训练的多通道输入数据。其次,对生物体进行稀疏视图多重扫描,利用训练好的 SGM 生成由稀疏视图投影驱动的多能量图像和组织图像。然后,建立一种材料分解算法,将 SGM 生成的组织图像作为求解造影剂图像的先验图像。最后,得到造影剂的分布和含量。本文对该方法进行了比较和评估,并通过一系列小鼠扫描实验验证了该方法的有效性。
{"title":"Sparse-view Spectral CT Reconstruction and Material Decomposition based on Multi-channel SGM.","authors":"Yuedong Liu, Xuan Zhou, Cunfeng Wei, Qiong Xu","doi":"10.1109/TMI.2024.3413085","DOIUrl":"10.1109/TMI.2024.3413085","url":null,"abstract":"<p><p>In medical applications, the diffusion of contrast agents in tissue can reflect the physiological function of organisms, so it is valuable to quantify the distribution and content of contrast agents in the body over a period. Spectral CT has the advantages of multi-energy projection acquisition and material decomposition, which can quantify K-edge contrast agents. However, multiple repetitive spectral CT scans can cause excessive radiation doses. Sparse-view scanning is commonly used to reduce dose and scan time, but its reconstructed images are usually accompanied by streaking artifacts, which leads to inaccurate quantification of the contrast agents. To solve this problem, an unsupervised sparse-view spectral CT reconstruction and material decomposition algorithm based on the multi-channel score-based generative model (SGM) is proposed in this paper. First, multi-energy images and tissue images are used as multi-channel input data for SGM training. Secondly, the organism is multiply scanned in sparse views, and the trained SGM is utilized to generate multi-energy images and tissue images driven by sparse-view projections. After that, a material decomposition algorithm using tissue images generated by SGM as prior images for solving contrast agent images is established. Finally, the distribution and content of the contrast agents are obtained. The comparison and evaluation of this method are given in this paper, and a series of mouse scanning experiments are carried out to verify the effectiveness of the method.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
R2D2-GAN: Robust Dual Discriminator Generative Adversarial Network for Microscopy Hyperspectral Image Super-Resolution. R2D2-GAN:用于显微镜高光谱图像超分辨率的鲁棒双判别生成对抗网络。
Pub Date : 2024-06-11 DOI: 10.1109/TMI.2024.3412033
Jiaxuan Liu, Hui Zhang, Jiang-Huai Tian, Yingjian Su, Yurong Chen, Yaonan Wang

High-resolution microscopy hyperspectral (HS) images can provide highly detailed spatial and spectral information, enabling the identification and analysis of biological tissues at a microscale level. Recently, significant efforts have been devoted to enhancing the resolution of HS images by leveraging high spatial resolution multispectral (MS) images. However, the inherent hardware constraints lead to a significant distribution gap between HS and MS images, posing challenges for image super-resolution within biomedical domains. This discrepancy may arise from various factors, including variations in camera imaging principles (e.g., snapshot and push-broom imaging), shooting positions, and the presence of noise interference. To address these challenges, we introduced a unique unsupervised super-resolution framework named R2D2-GAN. This framework utilizes a generative adversarial network (GAN) to efficiently merge the two data modalities and improve the resolution of microscopy HS images. Traditionally, supervised approaches have relied on intuitive and sensitive loss functions, such as mean squared error (MSE). Our method, trained in a real-world unsupervised setting, benefits from exploiting consistent information across the two modalities. It employs a game-theoretic strategy and dynamic adversarial loss, rather than relying solely on fixed training strategies for reconstruction loss. Furthermore, we have augmented our proposed model with a central consistency regularization (CCR) module, aiming to further enhance the robustness of the R2D2-GAN. Our experimental results show that the proposed method is accurate and robust for super-resolution images. We specifically tested our proposed method on both a real and a synthetic dataset, obtaining promising results in comparison to other state-of-the-art methods. Our code and datasets are accessible through Multimedia Content.

高分辨率显微镜高光谱(HS)图像可提供高度详细的空间和光谱信息,从而能够在微观层面识别和分析生物组织。最近,人们致力于利用高空间分辨率多光谱(MS)图像来提高 HS 图像的分辨率。然而,固有的硬件限制导致 HS 和 MS 图像之间存在明显的分布差距,给生物医学领域的图像超分辨率带来了挑战。这种差异可能由多种因素造成,包括相机成像原理(如快照和推扫帚成像)、拍摄位置和噪声干扰的不同。为了应对这些挑战,我们引入了一种名为 R2D2-GAN 的独特无监督超分辨率框架。该框架利用生成式对抗网络 (GAN) 有效地合并两种数据模式,提高显微 HS 图像的分辨率。传统的监督方法依赖于直观而敏感的损失函数,如均值平方误差 (MSE)。我们的方法是在真实世界的无监督环境中训练出来的,可利用两种模式的一致信息。它采用了博弈论策略和动态对抗损失,而不是仅仅依赖固定的重建损失训练策略。此外,我们还利用中央一致性正则化(CCR)模块增强了我们提出的模型,旨在进一步提高 R2D2-GAN 的鲁棒性。我们的实验结果表明,所提出的方法对于超分辨率图像来说既准确又稳健。我们特别在真实数据集和合成数据集上测试了我们提出的方法,与其他最先进的方法相比,取得了令人满意的结果。我们的代码和数据集可通过多媒体内容访问。
{"title":"R2D2-GAN: Robust Dual Discriminator Generative Adversarial Network for Microscopy Hyperspectral Image Super-Resolution.","authors":"Jiaxuan Liu, Hui Zhang, Jiang-Huai Tian, Yingjian Su, Yurong Chen, Yaonan Wang","doi":"10.1109/TMI.2024.3412033","DOIUrl":"10.1109/TMI.2024.3412033","url":null,"abstract":"<p><p>High-resolution microscopy hyperspectral (HS) images can provide highly detailed spatial and spectral information, enabling the identification and analysis of biological tissues at a microscale level. Recently, significant efforts have been devoted to enhancing the resolution of HS images by leveraging high spatial resolution multispectral (MS) images. However, the inherent hardware constraints lead to a significant distribution gap between HS and MS images, posing challenges for image super-resolution within biomedical domains. This discrepancy may arise from various factors, including variations in camera imaging principles (e.g., snapshot and push-broom imaging), shooting positions, and the presence of noise interference. To address these challenges, we introduced a unique unsupervised super-resolution framework named R2D2-GAN. This framework utilizes a generative adversarial network (GAN) to efficiently merge the two data modalities and improve the resolution of microscopy HS images. Traditionally, supervised approaches have relied on intuitive and sensitive loss functions, such as mean squared error (MSE). Our method, trained in a real-world unsupervised setting, benefits from exploiting consistent information across the two modalities. It employs a game-theoretic strategy and dynamic adversarial loss, rather than relying solely on fixed training strategies for reconstruction loss. Furthermore, we have augmented our proposed model with a central consistency regularization (CCR) module, aiming to further enhance the robustness of the R2D2-GAN. Our experimental results show that the proposed method is accurate and robust for super-resolution images. We specifically tested our proposed method on both a real and a synthetic dataset, obtaining promising results in comparison to other state-of-the-art methods. Our code and datasets are accessible through Multimedia Content.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constraint-Aware Learning for Fractional Flow Reserve Pullback Curve Estimation from Invasive Coronary Imaging. 通过有创冠状动脉成像进行分流储备回拉曲线估算的约束感知学习
Pub Date : 2024-06-11 DOI: 10.1109/TMI.2024.3412935
Dong Zhang, Xiujian Liu, Anbang Wang, Hongwei Zhang, Guang Yang, Heye Zhang, Zhifan Gao

Estimation of the fractional flow reserve (FFR) pullback curve from invasive coronary imaging is important for the intraoperative guidance of coronary intervention. Machine/deep learning has been proven effective in FFR pullback curve estimation. However, the existing methods suffer from inadequate incorporation of intrinsic geometry associations and physics knowledge. In this paper, we propose a constraint-aware learning framework to improve the estimation of the FFR pullback curve from invasive coronary imaging. It incorporates both geometrical and physical constraints to approximate the relationships between the geometric structure and FFR values along the coronary artery centerline. Our method also leverages the power of synthetic data in model training to reduce the collection costs of clinical data. Moreover, to bridge the domain gap between synthetic and real data distributions when testing on real-world imaging data, we also employ a diffusion-driven test-time data adaptation method that preserves the knowledge learned in synthetic data. Specifically, this method learns a diffusion model of the synthetic data distribution and then projects real data to the synthetic data distribution at test time. Extensive experimental studies on a synthetic dataset and a real-world dataset of 382 patients covering three imaging modalities have shown the better performance of our method for FFR estimation of stenotic coronary arteries, compared with other machine/deep learning-based FFR estimation models and computational fluid dynamics-based model. The results also provide high agreement and correlation between the FFR predictions of our method and the invasively measured FFR values. The plausibility of FFR predictions along the coronary artery centerline is also validated.

通过有创冠状动脉成像估计分数血流储备(FFR)回拉曲线对于术中指导冠状动脉介入治疗非常重要。机器/深度学习已被证明在 FFR 回抽曲线估算中非常有效。然而,现有的方法没有充分结合内在的几何关联和物理知识。在本文中,我们提出了一种约束感知学习框架,以改进有创冠状动脉成像中的 FFR 回抽曲线估计。它结合了几何约束和物理约束,以近似沿冠状动脉中心线的几何结构和 FFR 值之间的关系。我们的方法还在模型训练中利用了合成数据的力量,以降低临床数据的收集成本。此外,在真实世界的成像数据上进行测试时,为了弥合合成数据和真实数据分布之间的领域差距,我们还采用了一种扩散驱动的测试时间数据适应方法,以保留在合成数据中学到的知识。具体来说,该方法学习合成数据分布的扩散模型,然后在测试时将真实数据投射到合成数据分布上。在合成数据集和 382 位患者的真实数据集上进行的广泛实验研究表明,与其他基于机器/深度学习的 FFR 估算模型和基于计算流体力学的模型相比,我们的方法在狭窄冠状动脉的 FFR 估算方面具有更好的性能。结果还表明,我们的方法预测的 FFR 值与有创测量的 FFR 值之间具有很高的一致性和相关性。沿冠状动脉中心线预测 FFR 的合理性也得到了验证。
{"title":"Constraint-Aware Learning for Fractional Flow Reserve Pullback Curve Estimation from Invasive Coronary Imaging.","authors":"Dong Zhang, Xiujian Liu, Anbang Wang, Hongwei Zhang, Guang Yang, Heye Zhang, Zhifan Gao","doi":"10.1109/TMI.2024.3412935","DOIUrl":"10.1109/TMI.2024.3412935","url":null,"abstract":"<p><p>Estimation of the fractional flow reserve (FFR) pullback curve from invasive coronary imaging is important for the intraoperative guidance of coronary intervention. Machine/deep learning has been proven effective in FFR pullback curve estimation. However, the existing methods suffer from inadequate incorporation of intrinsic geometry associations and physics knowledge. In this paper, we propose a constraint-aware learning framework to improve the estimation of the FFR pullback curve from invasive coronary imaging. It incorporates both geometrical and physical constraints to approximate the relationships between the geometric structure and FFR values along the coronary artery centerline. Our method also leverages the power of synthetic data in model training to reduce the collection costs of clinical data. Moreover, to bridge the domain gap between synthetic and real data distributions when testing on real-world imaging data, we also employ a diffusion-driven test-time data adaptation method that preserves the knowledge learned in synthetic data. Specifically, this method learns a diffusion model of the synthetic data distribution and then projects real data to the synthetic data distribution at test time. Extensive experimental studies on a synthetic dataset and a real-world dataset of 382 patients covering three imaging modalities have shown the better performance of our method for FFR estimation of stenotic coronary arteries, compared with other machine/deep learning-based FFR estimation models and computational fluid dynamics-based model. The results also provide high agreement and correlation between the FFR predictions of our method and the invasively measured FFR values. The plausibility of FFR predictions along the coronary artery centerline is also validated.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing High-order Functional Connectivity Networks with Temporal Information from fMRI Data. 利用 fMRI 数据中的时序信息构建高阶功能连接网络
Pub Date : 2024-06-11 DOI: 10.1109/TMI.2024.3412399
Yingzhi Teng, Kai Wu, Jing Liu, Yifan Li, Xiangyi Teng

Conducting functional connectivity analysis on functional magnetic resonance imaging (fMRI) data presents a significant and intricate challenge. Contemporary studies typically analyze fMRI data by constructing high-order functional connectivity networks (FCNs) due to their strong interpretability. However, these approaches often overlook temporal information, resulting in suboptimal accuracy. Temporal information plays a vital role in reflecting changes in blood oxygenation level-dependent signals. To address this shortcoming, we have devised a framework for extracting temporal dependencies from fMRI data and inferring high-order functional connectivity among regions of interest (ROIs). Our approach postulates that the current state can be determined by the FCN and the state at the previous time, effectively capturing temporal dependencies. Furthermore, we enhance FCN by incorporating high-order features through hypergraph-based manifold regularization. Our algorithm involves causal modeling of the dynamic brain system, and the obtained directed FC reveals differences in the flow of information under different pattern. We have validated the significance of integrating temporal information into FCN using four real-world fMRI datasets. On average, our framework achieves 12% higher accuracy than non-temporal hypergraph-based and low-order FCNs, all while maintaining a short processing time. Notably, our framework successfully identifies the most discriminative ROIs, aligning with previous research, thereby facilitating cognitive and behavioral studies.

对功能性磁共振成像(fMRI)数据进行功能连接分析是一项重大而复杂的挑战。由于高阶功能连接网络(FCN)具有很强的可解释性,当代研究通常通过构建高阶功能连接网络来分析 fMRI 数据。然而,这些方法往往忽略了时间信息,导致准确性不理想。时间信息在反映血氧水平相关信号的变化方面起着至关重要的作用。针对这一缺陷,我们设计了一个框架,用于从 fMRI 数据中提取时间依赖性,并推断感兴趣区(ROI)之间的高阶功能连接。我们的方法假设当前状态可由 FCN 和上一次的状态确定,从而有效捕捉时间依赖性。此外,我们还通过基于超图的流形正则化,结合高阶特征来增强 FCN。我们的算法涉及大脑动态系统的因果建模,得到的有向 FC 揭示了不同模式下信息流的差异。我们利用四个真实世界的 fMRI 数据集验证了将时间信息纳入 FCN 的意义。与基于非时间超图和低阶 FCN 相比,我们的框架平均提高了 12% 的准确率,同时保持了较短的处理时间。值得注意的是,我们的框架成功地识别了最具辨别力的 ROI,这与之前的研究结果一致,从而促进了认知和行为研究。
{"title":"Constructing High-order Functional Connectivity Networks with Temporal Information from fMRI Data.","authors":"Yingzhi Teng, Kai Wu, Jing Liu, Yifan Li, Xiangyi Teng","doi":"10.1109/TMI.2024.3412399","DOIUrl":"10.1109/TMI.2024.3412399","url":null,"abstract":"<p><p>Conducting functional connectivity analysis on functional magnetic resonance imaging (fMRI) data presents a significant and intricate challenge. Contemporary studies typically analyze fMRI data by constructing high-order functional connectivity networks (FCNs) due to their strong interpretability. However, these approaches often overlook temporal information, resulting in suboptimal accuracy. Temporal information plays a vital role in reflecting changes in blood oxygenation level-dependent signals. To address this shortcoming, we have devised a framework for extracting temporal dependencies from fMRI data and inferring high-order functional connectivity among regions of interest (ROIs). Our approach postulates that the current state can be determined by the FCN and the state at the previous time, effectively capturing temporal dependencies. Furthermore, we enhance FCN by incorporating high-order features through hypergraph-based manifold regularization. Our algorithm involves causal modeling of the dynamic brain system, and the obtained directed FC reveals differences in the flow of information under different pattern. We have validated the significance of integrating temporal information into FCN using four real-world fMRI datasets. On average, our framework achieves 12% higher accuracy than non-temporal hypergraph-based and low-order FCNs, all while maintaining a short processing time. Notably, our framework successfully identifies the most discriminative ROIs, aligning with previous research, thereby facilitating cognitive and behavioral studies.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype Correlation Matching and Class-Relation Reasoning for Few-Shot Medical Image Segmentation. 原型相关匹配和类相关推理用于少镜头医学图像分割。
Pub Date : 2024-06-11 DOI: 10.1109/TMI.2024.3412420
Yumin Zhang, Hongliu Li, Yajun Gao, Haoran Duan, Yawen Huang, Yefeng Zheng

Few-shot medical image segmentation has achieved great progress in improving accuracy and efficiency of medical analysis in the biomedical imaging field. However, most existing methods cannot explore inter-class relations among base and novel medical classes to reason unseen novel classes. Moreover, the same kind of medical class has large intra-class variations brought by diverse appearances, shapes and scales, thus causing ambiguous visual characterization to degrade generalization performance of these existing methods on unseen novel classes. To address the above challenges, in this paper, we propose a Prototype correlation Matching and Class-relation Reasoning (i.e., PMCR) model. The proposed model can effectively mitigate false pixel correlation matches caused by large intra-class variations while reasoning inter-class relations among different medical classes. Specifically, in order to address false pixel correlation match brought by large intra-class variations, we propose a prototype correlation matching module to mine representative prototypes that can characterize diverse visual information of different appearances well. We aim to explore prototypelevel rather than pixel-level correlation matching between support and query features via optimal transport algorithm to tackle false matches caused by intra-class variations. Meanwhile, in order to explore inter-class relations, we design a class-relation reasoning module to segment unseen novel medical objects via reasoning inter-class relations between base and novel classes. Such inter-class relations can be well propagated to semantic encoding of local query features to improve few-shot segmentation performance. Quantitative comparisons illustrates the large performance improvement of our model over other baseline methods.

在提高生物医学成像领域医学分析的准确性和效率方面,微距医学影像分割取得了巨大进步。然而,大多数现有方法无法探索基础医疗类别和新医疗类别之间的类间关系,从而推理出未见过的新类别。此外,同一类医学类别因外观、形状和尺度的不同而存在较大的类内差异,从而导致模糊的视觉表征,降低了这些现有方法在未见过的新类别上的泛化性能。针对上述挑战,本文提出了原型相关匹配和类别相关推理(即 PMCR)模型。该模型可有效减少因类内差异过大而导致的错误像素相关匹配,同时推理出不同医疗类别之间的类间关系。具体来说,针对类内差异大所带来的像素相关匹配错误,我们提出了原型相关匹配模块,以挖掘具有代表性的原型,从而很好地表征不同外观的各种视觉信息。我们旨在通过最优传输算法探索支持特征与查询特征之间的原型级而非像素级相关匹配,以解决由类内差异导致的错误匹配。同时,为了探索类间关系,我们设计了一个类关系推理模块,通过推理基类和新类之间的类间关系来分割未见的新医疗对象。这种类间关系可以很好地传播到局部查询特征的语义编码中,从而提高少量分割的性能。定量比较结果表明,与其他基准方法相比,我们的模型在性能上有很大提高。
{"title":"Prototype Correlation Matching and Class-Relation Reasoning for Few-Shot Medical Image Segmentation.","authors":"Yumin Zhang, Hongliu Li, Yajun Gao, Haoran Duan, Yawen Huang, Yefeng Zheng","doi":"10.1109/TMI.2024.3412420","DOIUrl":"10.1109/TMI.2024.3412420","url":null,"abstract":"<p><p>Few-shot medical image segmentation has achieved great progress in improving accuracy and efficiency of medical analysis in the biomedical imaging field. However, most existing methods cannot explore inter-class relations among base and novel medical classes to reason unseen novel classes. Moreover, the same kind of medical class has large intra-class variations brought by diverse appearances, shapes and scales, thus causing ambiguous visual characterization to degrade generalization performance of these existing methods on unseen novel classes. To address the above challenges, in this paper, we propose a Prototype correlation Matching and Class-relation Reasoning (i.e., PMCR) model. The proposed model can effectively mitigate false pixel correlation matches caused by large intra-class variations while reasoning inter-class relations among different medical classes. Specifically, in order to address false pixel correlation match brought by large intra-class variations, we propose a prototype correlation matching module to mine representative prototypes that can characterize diverse visual information of different appearances well. We aim to explore prototypelevel rather than pixel-level correlation matching between support and query features via optimal transport algorithm to tackle false matches caused by intra-class variations. Meanwhile, in order to explore inter-class relations, we design a class-relation reasoning module to segment unseen novel medical objects via reasoning inter-class relations between base and novel classes. Such inter-class relations can be well propagated to semantic encoding of local query features to improve few-shot segmentation performance. Quantitative comparisons illustrates the large performance improvement of our model over other baseline methods.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal Carcinoma Tumor Segmentation across Multiple Hospitals. 多医院鼻咽癌肿瘤分割的双参照无源主动域自适应技术
Pub Date : 2024-06-11 DOI: 10.1109/TMI.2024.3412923
Hongqiu Wang, Jian Chen, Shichen Zhang, Yuan He, Jinfeng Xu, Mengwan Wu, Jinlan He, Wenjun Liao, Xiangde Luo

Nasopharyngeal carcinoma (NPC) is a prevalent and clinically significant malignancy that predominantly impacts the head and neck area. Precise delineation of the Gross Tumor Volume (GTV) plays a pivotal role in ensuring effective radiotherapy for NPC. Despite recent methods that have achieved promising results on GTV segmentation, they are still limited by lacking carefully-annotated data and hard-to-access data from multiple hospitals in clinical practice. Although some unsupervised domain adaptation (UDA) has been proposed to alleviate this problem, unconditionally mapping the distribution distorts the underlying structural information, leading to inferior performance. To address this challenge, we devise a novel Sourece-Free Active Domain Adaptation framework to facilitate domain adaptation for the GTV segmentation task. Specifically, we design a dual reference strategy to select domain-invariant and domain-specific representative samples from a specific target domain for annotation and model fine-tuning without relying on source-domain data. Our approach not only ensures data privacy but also reduces the workload for oncologists as it just requires annotating a few representative samples from the target domain and does not need to access the source data. We collect a large-scale clinical dataset comprising 1057 NPC patients from five hospitals to validate our approach. Experimental results show that our method outperforms the previous active learning (e.g., AADA and MHPL) and UDA (e.g., Tent and CPR) methods, and achieves comparable results to the fully supervised upper bound, even with few annotations, highlighting the significant medical utility of our approach. In addition, there is no public dataset about multi-center NPC segmentation, we will release code and dataset for future research (Git).

鼻咽癌(NPC)是一种主要影响头颈部的常见临床恶性肿瘤。精确划分肿瘤总体积(GTV)对确保鼻咽癌的有效放疗起着至关重要的作用。尽管最近的一些方法在 GTV 分割方面取得了可喜的成果,但由于缺乏仔细标注的数据以及临床实践中难以从多家医院获取数据,这些方法仍然受到限制。虽然有人提出了一些无监督域适应(UDA)来缓解这一问题,但无条件映射分布会扭曲底层结构信息,导致性能低下。为了应对这一挑战,我们设计了一种新颖的无源主动域自适应框架,以促进 GTV 分割任务的域自适应。具体来说,我们设计了一种双重参考策略,从特定目标领域中选择领域不变和特定领域的代表性样本,用于注释和模型微调,而无需依赖源领域数据。我们的方法不仅能确保数据隐私,还能减轻肿瘤学家的工作量,因为它只需注释目标领域的一些代表性样本,而无需访问源数据。为了验证我们的方法,我们从五家医院收集了由 1057 名鼻咽癌患者组成的大规模临床数据集。实验结果表明,我们的方法优于之前的主动学习方法(如 AADA 和 MHPL)和 UDA 方法(如 Tent 和 CPR),甚至在注释较少的情况下也能达到与完全监督上界相当的结果,这凸显了我们的方法在医疗领域的巨大作用。此外,目前还没有关于多中心鼻咽癌分割的公开数据集,我们将发布代码和数据集供未来研究使用(Git)。
{"title":"Dual-Reference Source-Free Active Domain Adaptation for Nasopharyngeal Carcinoma Tumor Segmentation across Multiple Hospitals.","authors":"Hongqiu Wang, Jian Chen, Shichen Zhang, Yuan He, Jinfeng Xu, Mengwan Wu, Jinlan He, Wenjun Liao, Xiangde Luo","doi":"10.1109/TMI.2024.3412923","DOIUrl":"10.1109/TMI.2024.3412923","url":null,"abstract":"<p><p>Nasopharyngeal carcinoma (NPC) is a prevalent and clinically significant malignancy that predominantly impacts the head and neck area. Precise delineation of the Gross Tumor Volume (GTV) plays a pivotal role in ensuring effective radiotherapy for NPC. Despite recent methods that have achieved promising results on GTV segmentation, they are still limited by lacking carefully-annotated data and hard-to-access data from multiple hospitals in clinical practice. Although some unsupervised domain adaptation (UDA) has been proposed to alleviate this problem, unconditionally mapping the distribution distorts the underlying structural information, leading to inferior performance. To address this challenge, we devise a novel Sourece-Free Active Domain Adaptation framework to facilitate domain adaptation for the GTV segmentation task. Specifically, we design a dual reference strategy to select domain-invariant and domain-specific representative samples from a specific target domain for annotation and model fine-tuning without relying on source-domain data. Our approach not only ensures data privacy but also reduces the workload for oncologists as it just requires annotating a few representative samples from the target domain and does not need to access the source data. We collect a large-scale clinical dataset comprising 1057 NPC patients from five hospitals to validate our approach. Experimental results show that our method outperforms the previous active learning (e.g., AADA and MHPL) and UDA (e.g., Tent and CPR) methods, and achieves comparable results to the fully supervised upper bound, even with few annotations, highlighting the significant medical utility of our approach. In addition, there is no public dataset about multi-center NPC segmentation, we will release code and dataset for future research (Git).</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting. 令牌混合器:将图像和文本绑定到一个嵌入空间,用于医学图像报告。
Pub Date : 2024-06-11 DOI: 10.1109/TMI.2024.3412402
Yan Yang, Jun Yu, Zhenqi Fu, Ke Zhang, Ting Yu, Xianyun Wang, Hanliang Jiang, Junhui Lv, Qingming Huang, Weidong Han

Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment between images and reports is crucial. However, the exposure bias problem in autoregressive text generation poses a notable challenge, as the model is optimized by a word-level loss function using the teacher-forcing strategy. To this end, we propose a novel Token-Mixer framework that learns to bind image and text in one embedding space for medical image reporting. Concretely, Token-Mixer enhances the cross-modal alignment by matching image-to-text generation with text-to-text generation that suffers less from exposure bias. The framework contains an image encoder, a text encoder and a text decoder. In training, images and paired reports are first encoded into image tokens and text tokens, and these tokens are randomly mixed to form the mixed tokens. Then, the text decoder accepts image tokens, text tokens or mixed tokens as prompt tokens and conducts text generation for network optimization. Furthermore, we introduce a tailored text decoder and an alternative training strategy that well integrate with our Token-Mixer framework. Extensive experiments across three publicly available datasets demonstrate Token-Mixer successfully enhances the image-text alignment and thereby attains a state-of-the-art performance. Related codes are available at https://github.com/yangyan22/Token-Mixer.

医学影像报告侧重于根据医学影像自动生成诊断报告,已引起越来越多的研究关注。在这项任务中,学习图像和报告之间的跨模态对齐至关重要。然而,自回归文本生成中的暴露偏差问题是一个显著的挑战,因为该模型是通过使用教师强迫策略的单词级损失函数进行优化的。为此,我们提出了一种新颖的 Token-Mixer 框架,该框架可学习在一个嵌入空间中绑定图像和文本,用于医学影像报告。具体来说,Token-Mixer 通过将图像到文本的生成与受曝光偏差影响较小的文本到文本的生成相匹配,增强了跨模态对齐。该框架包含一个图像编码器、一个文本编码器和一个文本解码器。在训练过程中,首先将图像和配对报告编码为图像令牌和文本令牌,然后将这些令牌随机混合,形成混合令牌。然后,文本解码器接受图像令牌、文本令牌或混合令牌作为提示令牌,并为网络优化生成文本。此外,我们还介绍了一种量身定制的文本解码器和另一种训练策略,它们能很好地与我们的 Token-Mixer 框架集成。在三个公开可用的数据集上进行的广泛实验表明,Token-Mixer 成功地增强了图像与文本的对齐,从而达到了最先进的性能。相关代码见 https://github.com/yangyan22/Token-Mixer。
{"title":"Token-Mixer: Bind Image and Text in One Embedding Space for Medical Image Reporting.","authors":"Yan Yang, Jun Yu, Zhenqi Fu, Ke Zhang, Ting Yu, Xianyun Wang, Hanliang Jiang, Junhui Lv, Qingming Huang, Weidong Han","doi":"10.1109/TMI.2024.3412402","DOIUrl":"10.1109/TMI.2024.3412402","url":null,"abstract":"<p><p>Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment between images and reports is crucial. However, the exposure bias problem in autoregressive text generation poses a notable challenge, as the model is optimized by a word-level loss function using the teacher-forcing strategy. To this end, we propose a novel Token-Mixer framework that learns to bind image and text in one embedding space for medical image reporting. Concretely, Token-Mixer enhances the cross-modal alignment by matching image-to-text generation with text-to-text generation that suffers less from exposure bias. The framework contains an image encoder, a text encoder and a text decoder. In training, images and paired reports are first encoded into image tokens and text tokens, and these tokens are randomly mixed to form the mixed tokens. Then, the text decoder accepts image tokens, text tokens or mixed tokens as prompt tokens and conducts text generation for network optimization. Furthermore, we introduce a tailored text decoder and an alternative training strategy that well integrate with our Token-Mixer framework. Extensive experiments across three publicly available datasets demonstrate Token-Mixer successfully enhances the image-text alignment and thereby attains a state-of-the-art performance. Related codes are available at https://github.com/yangyan22/Token-Mixer.</p>","PeriodicalId":94033,"journal":{"name":"IEEE transactions on medical imaging","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on medical imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1