Pub Date : 2024-06-18DOI: 10.1109/TETCI.2024.3413002
Xiaodan Zhang;Shixin Dou;Junzhong Ji;Ying Liu;Zheng Wang
Automatic generation of medical reports for Brain Computed Tomography (CT) imaging is crucial for helping radiologists make more accurate clinical diagnoses efficiently. Brain CT imaging typically contains rich pathological information, including common pathologies that often co-occur in one report and rare pathologies that appear in medical reports with lower frequency. However, current research ignores the potential co-occurrence between common pathologies and pays insufficient attention to rare pathologies, severely restricting the accuracy and diversity of the generated medical reports. In this paper, we propose a Co-occurrence Relationship Driven Hierarchical Attention Network (CRHAN) to improve Brain CT report generation by mining common and rare pathologies in Brain CT imaging. Specifically, the proposed CRHAN follows a general encoder-decoder framework with two novel attention modules. In the encoder, a co-occurrence relationship guided semantic attention (CRSA) module is proposed to extract the critical semantic features by embedding the co-occurrence relationship of common pathologies into semantic attention. In the decoder, a common-rare topic driven visual attention (CRVA) module is proposed to fuse the common and rare semantic features as sentence topic vectors, and then guide the visual attention to capture important lesion features for medical report generation. Experiments on the Brain CT dataset demonstrate the effectiveness of the proposed method.
{"title":"Co-Occurrence Relationship Driven Hierarchical Attention Network for Brain CT Report Generation","authors":"Xiaodan Zhang;Shixin Dou;Junzhong Ji;Ying Liu;Zheng Wang","doi":"10.1109/TETCI.2024.3413002","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3413002","url":null,"abstract":"Automatic generation of medical reports for Brain Computed Tomography (CT) imaging is crucial for helping radiologists make more accurate clinical diagnoses efficiently. Brain CT imaging typically contains rich pathological information, including common pathologies that often co-occur in one report and rare pathologies that appear in medical reports with lower frequency. However, current research ignores the potential co-occurrence between common pathologies and pays insufficient attention to rare pathologies, severely restricting the accuracy and diversity of the generated medical reports. In this paper, we propose a Co-occurrence Relationship Driven Hierarchical Attention Network (CRHAN) to improve Brain CT report generation by mining common and rare pathologies in Brain CT imaging. Specifically, the proposed CRHAN follows a general encoder-decoder framework with two novel attention modules. In the encoder, a co-occurrence relationship guided semantic attention (CRSA) module is proposed to extract the critical semantic features by embedding the co-occurrence relationship of common pathologies into semantic attention. In the decoder, a common-rare topic driven visual attention (CRVA) module is proposed to fuse the common and rare semantic features as sentence topic vectors, and then guide the visual attention to capture important lesion features for medical report generation. Experiments on the Brain CT dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 5","pages":"3643-3653"},"PeriodicalIF":5.3,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-view spectral clustering has achieved impressive performance by learning multiple robust and meaningful similarity graphs for clustering. Generally, the existing literatures often construct multiple similarity graphs by certain similarity measure (e.g. the Euclidean distance), which lack the desired ability to learn sparse and reliable connections that carry critical information in graph learning while preserving the low-rank structure. Regarding the challenges, a novel Sparse Graph Tensor Learning for Multi-view Spectral Clustering (SGTL) method is designed in this paper, where multiple similarity graphs are seamlessly coupled with the cluster indicators and constrained with a low-rank graph tensor. Specifically, a novel graph learning paradigm is designed by establishing an explicit theoretical connection between the similarity matrices and the cluster indicator matrices, in order that the constructed similarity graphs enjoy the desired block diagonal and sparse property for learning a small portion of reliable links. Then, we stack multiple similarity matrices into a low-rank graph tensor to better preserve the low-rank structure of the reliable links in graph learning, where the key knowledge conveyed by singular values from different views is explicitly considered. Extensive experiments on several benchmark datasets demonstrate the superiority of SGTL.
{"title":"Sparse Graph Tensor Learning for Multi-View Spectral Clustering","authors":"Man-Sheng Chen;Zhi-Yuan Li;Jia-Qi Lin;Chang-Dong Wang;Dong Huang","doi":"10.1109/TETCI.2024.3409724","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3409724","url":null,"abstract":"Multi-view spectral clustering has achieved impressive performance by learning multiple robust and meaningful similarity graphs for clustering. Generally, the existing literatures often construct multiple similarity graphs by certain similarity measure (e.g. the Euclidean distance), which lack the desired ability to learn sparse and reliable connections that carry critical information in graph learning while preserving the low-rank structure. Regarding the challenges, a novel Sparse Graph Tensor Learning for Multi-view Spectral Clustering (SGTL) method is designed in this paper, where multiple similarity graphs are seamlessly coupled with the cluster indicators and constrained with a low-rank graph tensor. Specifically, a novel graph learning paradigm is designed by establishing an explicit theoretical connection between the similarity matrices and the cluster indicator matrices, in order that the constructed similarity graphs enjoy the desired block diagonal and sparse property for learning a small portion of reliable links. Then, we stack multiple similarity matrices into a low-rank graph tensor to better preserve the low-rank structure of the reliable links in graph learning, where the key knowledge conveyed by singular values from different views is explicitly considered. Extensive experiments on several benchmark datasets demonstrate the superiority of SGTL.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 5","pages":"3534-3543"},"PeriodicalIF":5.3,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1109/TETCI.2024.3393388
Hang Xu;Bing Xue;Mengjie Zhang
High dimensionality often challenges the efficiency and accuracy of a classifier, while evolutionary feature selection is an effective method for data preprocessing and dimensionality reduction. However, with the exponential expansion of search space along with the increase of features, traditional evolutionary feature selection methods could still find it difficult to search for optimal or near optimal solutions in the large-scale search space. To overcome the above issue, in this paper, we propose a bi-search evolutionary algorithm (termed BSEA) for tackling high-dimensional feature selection in classification, with two contradictory optimizing objectives (i.e., minimizing both selected features and classification errors). In BSEA, a bi-search evolutionary mode combining the forward and backward searching tasks is adopted to enhance the search ability in the large-scale search space; in addition, an adaptive feature analysis mechanism is also designed to the explore promising features for efficiently reproducing more diverse offspring. In the experiments, BSEA is comprehensively compared with 9 most recent or classic state-of-the-art MOEAs on a series of 11 high-dimensional datasets with no less than 2000 features. The empirical results suggest that BSEA generally performs the best on most of the datasets in terms of all performance metrics, along with high computational efficiency, while each of its essential components can take positive effect on boosting the search ability and together make the best contribution.
{"title":"A Bi-Search Evolutionary Algorithm for High-Dimensional Bi-Objective Feature Selection","authors":"Hang Xu;Bing Xue;Mengjie Zhang","doi":"10.1109/TETCI.2024.3393388","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3393388","url":null,"abstract":"High dimensionality often challenges the efficiency and accuracy of a classifier, while evolutionary feature selection is an effective method for data preprocessing and dimensionality reduction. However, with the exponential expansion of search space along with the increase of features, traditional evolutionary feature selection methods could still find it difficult to search for optimal or near optimal solutions in the large-scale search space. To overcome the above issue, in this paper, we propose a bi-search evolutionary algorithm (termed BSEA) for tackling high-dimensional feature selection in classification, with two contradictory optimizing objectives (i.e., minimizing both selected features and classification errors). In BSEA, a bi-search evolutionary mode combining the forward and backward searching tasks is adopted to enhance the search ability in the large-scale search space; in addition, an adaptive feature analysis mechanism is also designed to the explore promising features for efficiently reproducing more diverse offspring. In the experiments, BSEA is comprehensively compared with 9 most recent or classic state-of-the-art MOEAs on a series of 11 high-dimensional datasets with no less than 2000 features. The empirical results suggest that BSEA generally performs the best on most of the datasets in terms of all performance metrics, along with high computational efficiency, while each of its essential components can take positive effect on boosting the search ability and together make the best contribution.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 5","pages":"3489-3502"},"PeriodicalIF":5.3,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1109/TETCI.2024.3390058
Dahye Jeong;Eunbeen Choi;Hyeongjin Ahn;Ester Martinez-Martin;Eunil Park;Angel P. del Pobil
Authentication systems are crucial in the digital era, providing reliable protection of personal information. Most authentication systems rely on a single modality, such as the face, fingerprints, or password sensors. In the case of an authentication system based on a single modality, there is a problem in that the performance of the authentication is degraded when the information of the corresponding modality is covered. Especially, face identification does not work well due to the mask in a COVID-19 situation. In this paper, we focus on the multi-modality approach to improve the performance of occluded face identification. Multi-modal authentication systems are crucial in building a robust authentication system because they can compensate for the lack of modality in the uni-modal authentication system. In this light, we propose DemoID, a multi-modal authentication system based on face and voice for human identification in a challenging environment. Moreover, we build a demographic module to efficiently handle the demographic information of individual faces. The experimental results showed an accuracy of 99% when using all modalities and an overall improvement of 5.41%–10.77% relative to uni-modal face models. Furthermore, our model demonstrated the highest performance compared to existing multi-modal models and also showed promising results on the real-world dataset constructed for this study.
{"title":"Multi-modal Authentication Model for Occluded Faces in a Challenging Environment","authors":"Dahye Jeong;Eunbeen Choi;Hyeongjin Ahn;Ester Martinez-Martin;Eunil Park;Angel P. del Pobil","doi":"10.1109/TETCI.2024.3390058","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3390058","url":null,"abstract":"Authentication systems are crucial in the digital era, providing reliable protection of personal information. Most authentication systems rely on a single modality, such as the face, fingerprints, or password sensors. In the case of an authentication system based on a single modality, there is a problem in that the performance of the authentication is degraded when the information of the corresponding modality is covered. Especially, face identification does not work well due to the mask in a COVID-19 situation. In this paper, we focus on the multi-modality approach to improve the performance of occluded face identification. Multi-modal authentication systems are crucial in building a robust authentication system because they can compensate for the lack of modality in the uni-modal authentication system. In this light, we propose DemoID, a multi-modal authentication system based on face and voice for human identification in a challenging environment. Moreover, we build a demographic module to efficiently handle the demographic information of individual faces. The experimental results showed an accuracy of 99% when using all modalities and an overall improvement of 5.41%–10.77% relative to uni-modal face models. Furthermore, our model demonstrated the highest performance compared to existing multi-modal models and also showed promising results on the real-world dataset constructed for this study.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 5","pages":"3463-3473"},"PeriodicalIF":5.3,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D object detection using LiDAR is critical for autonomous driving. However, the point cloud data in autonomous driving scenarios is sparse. Converting the sparse point cloud into regular data representations (voxels or projection) often leads to information loss due to downsampling or excessive compression of feature information. This kind of information loss will adversely affect detection accuracy, especially for objects with fewer reflective points like cyclists. This paper proposes a multi-modal point cloud 3D object detector based on projection features and voxel features, which consists of two branches. One, called the voxel branch, is used to extract fine-grained local features. Another, called the projection branch, is used to extract projection features from a bird's-eye view and focus on the correlation of local features in the voxel branch. By feeding voxel features into the projection branch, we can compensate for the information loss in the projection branch while focusing on the correlation between neighboring local features in the voxel features. To achieve comprehensive feature fusion of voxel features and projection features, we propose a multi-modal feature fusion module (MSSFA). To further mitigate the loss of crucial features caused by downsampling, we propose a voxel feature extraction method (VR-VFE), which samples feature points based on their importance for the detection task. To validate the effectiveness of our method, we tested it on the KITTI dataset and ONCE dataset. The experimental results show that our method has achieved significant improvement in the detection accuracy of objects with fewer reflection points like cyclists.
使用激光雷达进行 3D 物体检测对自动驾驶至关重要。然而,自动驾驶场景中的点云数据是稀疏的。将稀疏点云转换为常规数据表示(体素或投影)通常会导致信息丢失,原因是对特征信息进行了下采样或过度压缩。这种信息损失会对检测精度产生不利影响,特别是对于像骑自行车者这样反射点较少的物体。本文提出了一种基于投影特征和体素特征的多模态点云三维物体检测器,它由两个分支组成。一个分支称为体素分支,用于提取细粒度的局部特征。另一个分支称为投影分支,用于从鸟瞰图中提取投影特征,并关注体素分支中局部特征的相关性。通过将体素特征输入投影分支,我们可以弥补投影分支的信息损失,同时关注体素特征中相邻局部特征之间的相关性。为了实现体素特征和投影特征的全面特征融合,我们提出了多模态特征融合模块(MSSFA)。为了进一步减少下采样造成的关键特征损失,我们提出了一种体素特征提取方法(VR-VFE),该方法根据特征点对检测任务的重要性对其进行采样。为了验证我们方法的有效性,我们在 KITTI 数据集和 ONCE 数据集上进行了测试。实验结果表明,我们的方法显著提高了对自行车等反射点较少的物体的检测精度。
{"title":"PV-SSD: A Multi-Modal Point Cloud 3D Object Detector Based on Projection Features and Voxel Features","authors":"Yongxin Shao;Aihong Tan;Zhetao Sun;Enhui Zheng;Tianhong Yan;Peng Liao","doi":"10.1109/TETCI.2024.3389710","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3389710","url":null,"abstract":"3D object detection using LiDAR is critical for autonomous driving. However, the point cloud data in autonomous driving scenarios is sparse. Converting the sparse point cloud into regular data representations (voxels or projection) often leads to information loss due to downsampling or excessive compression of feature information. This kind of information loss will adversely affect detection accuracy, especially for objects with fewer reflective points like cyclists. This paper proposes a multi-modal point cloud 3D object detector based on projection features and voxel features, which consists of two branches. One, called the voxel branch, is used to extract fine-grained local features. Another, called the projection branch, is used to extract projection features from a bird's-eye view and focus on the correlation of local features in the voxel branch. By feeding voxel features into the projection branch, we can compensate for the information loss in the projection branch while focusing on the correlation between neighboring local features in the voxel features. To achieve comprehensive feature fusion of voxel features and projection features, we propose a multi-modal feature fusion module (MSSFA). To further mitigate the loss of crucial features caused by downsampling, we propose a voxel feature extraction method (VR-VFE), which samples feature points based on their importance for the detection task. To validate the effectiveness of our method, we tested it on the KITTI dataset and ONCE dataset. The experimental results show that our method has achieved significant improvement in the detection accuracy of objects with fewer reflection points like cyclists.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 5","pages":"3436-3449"},"PeriodicalIF":5.3,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-26DOI: 10.1109/TETCI.2024.3389777
Hui Bai;Ran Cheng
Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.
{"title":"Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning","authors":"Hui Bai;Ran Cheng","doi":"10.1109/TETCI.2024.3389777","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3389777","url":null,"abstract":"Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 5","pages":"3450-3462"},"PeriodicalIF":5.3,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1109/TETCI.2024.3370032
Weihua Xu;Yufei Lin;Na Wang
With the rapid development of Big Data era, it is necessary to extract necessary information from a large amount of information. Single-source information systems are often affected by extreme values and outliers, so multi-source information systems are more common and data more reasonable, information fusion is a common method to deal with multi-source information system. Compared with single-valued data, interval-valued data can describe the uncertainty and random change of data more effectively. This article proposes a novel interval-valued multi-source information fusion method: A multi-source information fusion method based on dependency interval. This method needs to construct a dependency function, which takes into account the interval length and the number of data points in the interval, so as to make the obtained data more centralized and eliminate the influence of outliers and extreme values. Due to the unfixed boundary of the dependency interval, a median point within the interval is selected as a bridge to simplify the acquisition of the dependency interval. Furthermore, a multi-source information system fusion algorithm based on dependency intervals was proposed, and experiments were conducted on 9 UCI datasets to compare the classification accuracy and quality of the proposed algorithm with traditional information fusion methods. The experimental results show that this method is more effective than the maximum interval method, quartile interval method, and mean interval method, and the validity of the data has been proven through hypothesis testing.
{"title":"A Novel Multi-Source Information Fusion Method Based on Dependency Interval","authors":"Weihua Xu;Yufei Lin;Na Wang","doi":"10.1109/TETCI.2024.3370032","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3370032","url":null,"abstract":"With the rapid development of Big Data era, it is necessary to extract necessary information from a large amount of information. Single-source information systems are often affected by extreme values and outliers, so multi-source information systems are more common and data more reasonable, information fusion is a common method to deal with multi-source information system. Compared with single-valued data, interval-valued data can describe the uncertainty and random change of data more effectively. This article proposes a novel interval-valued multi-source information fusion method: A multi-source information fusion method based on dependency interval. This method needs to construct a dependency function, which takes into account the interval length and the number of data points in the interval, so as to make the obtained data more centralized and eliminate the influence of outliers and extreme values. Due to the unfixed boundary of the dependency interval, a median point within the interval is selected as a bridge to simplify the acquisition of the dependency interval. Furthermore, a multi-source information system fusion algorithm based on dependency intervals was proposed, and experiments were conducted on 9 UCI datasets to compare the classification accuracy and quality of the proposed algorithm with traditional information fusion methods. The experimental results show that this method is more effective than the maximum interval method, quartile interval method, and mean interval method, and the validity of the data has been proven through hypothesis testing.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 4","pages":"3180-3194"},"PeriodicalIF":5.3,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-contrast medical image segmentation is a challenging task that requires full use of local details and global context. However, existing convolutional neural networks (CNNs) cannot fully exploit global information due to limited receptive fields and local weight sharing. On the other hand, the transformer effectively establishes long-range dependencies but lacks desirable properties for modeling local details. This paper proposes a Transformer-embedded Boundary perception Network (TBNet) that combines the advantages of transformer and convolution for low-contrast medical image segmentation. Firstly, the transformer-embedded module uses convolution at the low-level layer to model local details and uses the Enhanced TRansformer (ETR) to capture long-range dependencies at the high-level layer. This module can extract robust features with semantic contexts to infer the possible target location and basic structure in low-contrast conditions. Secondly, we utilize the decoupled body-edge branch to promote general feature learning and precept precise boundary locations. The ETR establishes long-range dependencies across the whole feature map range and is enhanced by introducing local information. We implement it in a parallel mode, i.e., the group of self-attention with multi-head captures the global relationship, and the group of convolution retains local details. We compare TBNet with other state-of-the-art (SOTA) methods on the cornea endothelial cell, ciliary body, and kidney segmentation tasks. The TBNet improves segmentation performance, proving its effectiveness and robustness.
{"title":"Low-Contrast Medical Image Segmentation via Transformer and Boundary Perception","authors":"Yinglin Zhang;Ruiling Xi;Wei Wang;Heng Li;Lingxi Hu;Huiyan Lin;Dave Towey;Ruibin Bai;Huazhu Fu;Risa Higashita;Jiang Liu","doi":"10.1109/TETCI.2024.3353624","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3353624","url":null,"abstract":"Low-contrast medical image segmentation is a challenging task that requires full use of local details and global context. However, existing convolutional neural networks (CNNs) cannot fully exploit global information due to limited receptive fields and local weight sharing. On the other hand, the transformer effectively establishes long-range dependencies but lacks desirable properties for modeling local details. This paper proposes a Transformer-embedded Boundary perception Network (TBNet) that combines the advantages of transformer and convolution for low-contrast medical image segmentation. Firstly, the transformer-embedded module uses convolution at the low-level layer to model local details and uses the Enhanced TRansformer (ETR) to capture long-range dependencies at the high-level layer. This module can extract robust features with semantic contexts to infer the possible target location and basic structure in low-contrast conditions. Secondly, we utilize the decoupled body-edge branch to promote general feature learning and precept precise boundary locations. The ETR establishes long-range dependencies across the whole feature map range and is enhanced by introducing local information. We implement it in a parallel mode, i.e., the group of self-attention with multi-head captures the global relationship, and the group of convolution retains local details. We compare TBNet with other state-of-the-art (SOTA) methods on the cornea endothelial cell, ciliary body, and kidney segmentation tasks. The TBNet improves segmentation performance, proving its effectiveness and robustness.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 3","pages":"2297-2309"},"PeriodicalIF":5.3,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141094882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1109/TETCI.2024.3386838
Xudong Wang;Xi'ai Chen;Weihong Ren;Zhi Han;Huijie Fan;Yandong Tang;Lianqing Liu
Most existing dehazing networks rely on synthetic hazy-clear image pairs for training, and thus fail to work well in real-world scenes. In this paper, we deduce a reformulated atmospheric scattering model for a hazy image and propose a novel lightweight two-branch dehazing network. In the model, we use a Transformation Map to represent the dehazing transformation and use a Compensation Map to represent variable illumination compensation. Based on this model, we design a T