Co-Learning Multimodality PET-CT Features via a Cascaded CNN-Transformer Network

IF 3.5 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING IEEE Transactions on Radiation and Plasma Medical Sciences Pub Date : 2024-06-24 DOI:10.1109/TRPMS.2024.3417901

Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim

{"title":"Co-Learning Multimodality PET-CT Features via a Cascaded CNN-Transformer Network","authors":"Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim","doi":"10.1109/TRPMS.2024.3417901","DOIUrl":null,"url":null,"abstract":"<italic>Background:</i>\n Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns. \n<italic>Methods:</i>\n We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS). \n<italic>Results:</i>\n Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%. \n<italic>Conclusion:</i>\n Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"8 7","pages":"814-825"},"PeriodicalIF":3.5000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10570071/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns. Methods: We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS). Results: Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%. Conclusion: Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过级联 CNN 变换器网络共同学习多模态 PET-CT 特征

背景：多模态正电子发射计算机断层扫描（PET-CT）数据的自动分割是计算机辅助诊断系统（CAD）开发过程中的一大挑战。在这方面，基于卷积神经网络（CNN）的方法被认为是最先进的方法。然而，这些基于卷积神经网络的方法很难共同学习互补的 PET-CT 图像特征，并且在只关注局部模式时，很难学习全局背景。方法：我们提出了一种为 PET-CT 图像分割量身定制的级联 CNN 变换器网络（CCNN-TN）。我们采用变换器网络（TN），因为它能够通过自我关注和嵌入图像补丁来建立全局上下文。我们通过级联多个 TN 和 CNN 来学习全局和局部上下文，从而扩展了 TN 的定义。我们还引入了超融合分支，迭代融合分别提取的互补图像特征。与目前最先进的 CNN 方法相比，我们在三个数据集上评估了我们的方法：两个非小细胞肺癌（NSCLC）和一个软组织肉瘤（STS）。研究结果我们的 CCNN-TN 方法在肿瘤分割方面的骰子相似系数（DSC）得分分别为 72.25%（NSCLC）、67.11%（NSCLC）和 66.36%（STS）。与其他方法相比，我们的 CCNN-TN 的 DSC 分别高出 4.5%、1.31% 和 3.44%。结论我们的实验结果表明，与现有方法相比，CCNN-TN 在不同数据集上取得了更具通用性的结果，并且在各种图像融合策略和网络骨干上具有一致的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊