Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim
{"title":"Co-Learning Multimodality PET-CT Features via a Cascaded CNN-Transformer Network","authors":"Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim","doi":"10.1109/TRPMS.2024.3417901","DOIUrl":null,"url":null,"abstract":"<italic>Background:</i>\n Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns. \n<italic>Methods:</i>\n We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS). \n<italic>Results:</i>\n Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%. \n<italic>Conclusion:</i>\n Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10570071/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns.
Methods:
We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS).
Results:
Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%.
Conclusion:
Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.