Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412726
Yu Quan, Zhixin Li, Canlong Zhang, Huifang Ma
In order to improve the performance of two-stage object detection and consider the importance of scene and semantic information for visual recognition, the neural network of object detection algorithm is studied and analyzed in this paper. The main research work of this paper includes: A scene level region proposal self-attention object detection model based on depth separable convolution is proposed. In order to obtain stronger semantic information and context information of the target scene, the scene-level region proposal self-attention module is reconstructed based on the process of region proposal recognition. The feature map of the output feature pyramid network is sent into three parallel branches: semantic segmentation module, candidate area network module and region proposal self-attention module. At the same time, for the overall performance of the model, a deep separable convolutional network module is constructed on the backbone network, which includes six stages. In the fifth to sixth stage of the network, the separable convolutional network module is integrated respectively. Finally, a object detection method based on border regression network enhancement is proposed to achieve accurate target location. In order to verify the effectiveness of each model, the experimental results of each model are analyzed.
{"title":"Object Detection Model Based on Scene-Level Region Proposal Self-Attention","authors":"Yu Quan, Zhixin Li, Canlong Zhang, Huifang Ma","doi":"10.1109/ICPR48806.2021.9412726","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412726","url":null,"abstract":"In order to improve the performance of two-stage object detection and consider the importance of scene and semantic information for visual recognition, the neural network of object detection algorithm is studied and analyzed in this paper. The main research work of this paper includes: A scene level region proposal self-attention object detection model based on depth separable convolution is proposed. In order to obtain stronger semantic information and context information of the target scene, the scene-level region proposal self-attention module is reconstructed based on the process of region proposal recognition. The feature map of the output feature pyramid network is sent into three parallel branches: semantic segmentation module, candidate area network module and region proposal self-attention module. At the same time, for the overall performance of the model, a deep separable convolutional network module is constructed on the backbone network, which includes six stages. In the fifth to sixth stage of the network, the separable convolutional network module is integrated respectively. Finally, a object detection method based on border regression network enhancement is proposed to achieve accurate target location. In order to verify the effectiveness of each model, the experimental results of each model are analyzed.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"42 1","pages":"954-961"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87191731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412385
Y. Tan, Y. Elovici, A. Binder
Adversarial attacks have been a prevalent problem causing misclassification in machine learning models, with stochasticity being a promising direction towards greater robustness. However, stochastic networks frequently underperform compared to deterministic deep networks. In this work, we present a conceptually clear adaptive noise injection mechanism in combination with teacher-initialisation, which adjusts its degree of randomness dynamically through the computation of mini-batch statistics. This mechanism is embedded within a simple framework to obtain stochastic networks from existing deterministic networks. Our experiments show that our method is able to outperform prior baselines under white-box settings, exemplified through CIFAR-10 and CIFAR-100. Following which, we perform in-depth analysis on varying different components of training with our approach on the effects of robustness and accuracy, through the study of the evolution of decision boundary and trend curves of clean accuracy/attack success over differing degrees of stochasticity. We also shed light on the effects of adversarial training on a pre-trained network, through the lens of decision boundaries.
{"title":"Adaptive Noise Injection for Training Stochastic Student Networks from Deterministic Teachers","authors":"Y. Tan, Y. Elovici, A. Binder","doi":"10.1109/ICPR48806.2021.9412385","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412385","url":null,"abstract":"Adversarial attacks have been a prevalent problem causing misclassification in machine learning models, with stochasticity being a promising direction towards greater robustness. However, stochastic networks frequently underperform compared to deterministic deep networks. In this work, we present a conceptually clear adaptive noise injection mechanism in combination with teacher-initialisation, which adjusts its degree of randomness dynamically through the computation of mini-batch statistics. This mechanism is embedded within a simple framework to obtain stochastic networks from existing deterministic networks. Our experiments show that our method is able to outperform prior baselines under white-box settings, exemplified through CIFAR-10 and CIFAR-100. Following which, we perform in-depth analysis on varying different components of training with our approach on the effects of robustness and accuracy, through the study of the evolution of decision boundary and trend curves of clean accuracy/attack success over differing degrees of stochasticity. We also shed light on the effects of adversarial training on a pre-trained network, through the lens of decision boundaries.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"17 1","pages":"7587-7594"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90637028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412743
Yankai Huang, Tiecheng Song, Shuang Li, Yuanjing Han
Local binary pattern (LBP) based descriptors have shown effectiveness for texture classification. However, most of them encode the intensity relationships between neighboring pixels and a central pixel into binary forms, thereby failing to capture the complete ordering information among neighbors. Several methods have explored intensity order information for feature description, but they do not address the grayscale-inversion problem. In this paper, we propose an image descriptor called local grouped invariant order pattern (LGIOP) for grayscale-inversion and rotation invariant texture classification. Our LGIOP is a histogram representation which jointly encodes neighboring order information and central pixels. In particular, two new order encoding methods, i.e., intensity order encoding and distance order encoding, are proposed to describe the neighboring relationships. These two order encoding methods are not only complementary but also invariant to grayscale-inversion and rotation changes. Experiments for texture classification demonstrate that the proposed LGIOP descriptor is robust to (linear or nonlinear) grayscale inversion and image rotation.
{"title":"Local Grouped Invariant Order Pattern for Grayscale-Inversion and Rotation Invariant Texture Classification","authors":"Yankai Huang, Tiecheng Song, Shuang Li, Yuanjing Han","doi":"10.1109/ICPR48806.2021.9412743","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412743","url":null,"abstract":"Local binary pattern (LBP) based descriptors have shown effectiveness for texture classification. However, most of them encode the intensity relationships between neighboring pixels and a central pixel into binary forms, thereby failing to capture the complete ordering information among neighbors. Several methods have explored intensity order information for feature description, but they do not address the grayscale-inversion problem. In this paper, we propose an image descriptor called local grouped invariant order pattern (LGIOP) for grayscale-inversion and rotation invariant texture classification. Our LGIOP is a histogram representation which jointly encodes neighboring order information and central pixels. In particular, two new order encoding methods, i.e., intensity order encoding and distance order encoding, are proposed to describe the neighboring relationships. These two order encoding methods are not only complementary but also invariant to grayscale-inversion and rotation changes. Experiments for texture classification demonstrate that the proposed LGIOP descriptor is robust to (linear or nonlinear) grayscale inversion and image rotation.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"2015 1","pages":"6632-6639"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73560863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9413300
G. Orfanidis, K. Ioannidis, S. Vrochidis, A. Tefas, Y. Kompatsiaris
This works focuses on examining the performance of the Single Shot Detector (SSD) model in resource restricted systems where maintaining the power of the full model comprises a significant prerequisite. The proposed SSD variations examine the behavior of lighter versions of SSD while propose measures to limit the unavoidable performance shortage. The outcomes of the conducted research demonstrate a remarkable trade-off between performance losses, speed improvement and the required resource reservation. Thus, the experimental results evidence the efficiency of the presented SSD alterations towards accomplishing higher frame rates and retaining the performance of the original model.
{"title":"A modified Single-Shot multibox Detector for beyond Real-Time Object Detection","authors":"G. Orfanidis, K. Ioannidis, S. Vrochidis, A. Tefas, Y. Kompatsiaris","doi":"10.1109/ICPR48806.2021.9413300","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413300","url":null,"abstract":"This works focuses on examining the performance of the Single Shot Detector (SSD) model in resource restricted systems where maintaining the power of the full model comprises a significant prerequisite. The proposed SSD variations examine the behavior of lighter versions of SSD while propose measures to limit the unavoidable performance shortage. The outcomes of the conducted research demonstrate a remarkable trade-off between performance losses, speed improvement and the required resource reservation. Thus, the experimental results evidence the efficiency of the presented SSD alterations towards accomplishing higher frame rates and retaining the performance of the original model.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"50 1","pages":"3977-3984"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78089484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412367
R. Barati, R. Safabakhsh, M. Rahmati
In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our proposal is that it relates the objective of the evasion attacks and adversarial training with concepts already defined in learning theory. Also, we extend and unify some of the other proposals in the literature and provide alternative explanations on the observations made in those proposals. Through different experiments, we demonstrate that the framework is valuable in the study of the phenomenon and is applicable to real-world problems.
{"title":"Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks","authors":"R. Barati, R. Safabakhsh, M. Rahmati","doi":"10.1109/ICPR48806.2021.9412367","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412367","url":null,"abstract":"In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our proposal is that it relates the objective of the evasion attacks and adversarial training with concepts already defined in learning theory. Also, we extend and unify some of the other proposals in the literature and provide alternative explanations on the observations made in those proposals. Through different experiments, we demonstrate that the framework is valuable in the study of the phenomenon and is applicable to real-world problems.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"38 1","pages":"7036-7042"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78359301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412821
Hongchao Lu, Zhidong Deng
In recent years optical flow is often estimated to reuse features so as to accelerate video semantic segmentation. With addition of optical flow network, however, extra cost may incur and accuracy may thus be degraded because of repeated warping operation. In this paper, we propose a boundary-aware distillation network (BDNet) that replaces optical flow network with block motion vectors encoded in compressed video, resulting in negligible computational complexity. In order to make salient features, an auxiliary boundary-aware stream is added to the main stream to jointly estimate silhouette and segmentation of objects. To further correct warped features, a well-trained teacher network is employed to transfer knowledge to the main stream. Both boundary-aware stream and the teacher network are neglected during inference stage, so that video segmentation network enables to get faster without increasing any computational burden. By splitting the task into three components, our BDNet shows almost 10% time saving as well as 1.6% accuracy improvement over baseline on the Cityscapes dataset.
{"title":"A Boundary-aware Distillation Network for Compressed Video Semantic Segmentation","authors":"Hongchao Lu, Zhidong Deng","doi":"10.1109/ICPR48806.2021.9412821","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412821","url":null,"abstract":"In recent years optical flow is often estimated to reuse features so as to accelerate video semantic segmentation. With addition of optical flow network, however, extra cost may incur and accuracy may thus be degraded because of repeated warping operation. In this paper, we propose a boundary-aware distillation network (BDNet) that replaces optical flow network with block motion vectors encoded in compressed video, resulting in negligible computational complexity. In order to make salient features, an auxiliary boundary-aware stream is added to the main stream to jointly estimate silhouette and segmentation of objects. To further correct warped features, a well-trained teacher network is employed to transfer knowledge to the main stream. Both boundary-aware stream and the teacher network are neglected during inference stage, so that video segmentation network enables to get faster without increasing any computational burden. By splitting the task into three components, our BDNet shows almost 10% time saving as well as 1.6% accuracy improvement over baseline on the Cityscapes dataset.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"325 1","pages":"5354-5359"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78412619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412640
Xiaoqiang Zhou, Junjie Li, Zilei Wang, R. He, T. Tan
Image inpainting faces the challenging issue of the requirements on structure reasonableness and texture coherence. In this paper, we propose a two-stage inpainting framework to address this issue. The basic idea is to address the two requirements in two separate stages. Completed segmentation of the corrupted image is firstly predicted through segmentation reconstruction network, while fine-grained image details are restored in the second stage through an image generator. The two stages are connected in series as the image details are generated under the guidance of completed segmentation map that predicted in the first stage. Specifically, in the second stage, we propose a novel graph-based relation network to model the relationship existed in corrupted image. In relation network, both intra-relationship for pixels in the same semantic region and inter-relationship between different semantic parts are considered, improving the consistency and compatibility of image textures. Besides, contrastive loss is designed to facilitate the relation network training. Such a framework not only simplifies the inpainting problem directly, but also exploits the relationship in corrupted image explicitly. Extensive experiments on various public datasets quantitatively and qualitatively demonstrate the superiority of our approach compared with the state-of-the-art.
{"title":"Image Inpainting with Contrastive Relation Network","authors":"Xiaoqiang Zhou, Junjie Li, Zilei Wang, R. He, T. Tan","doi":"10.1109/ICPR48806.2021.9412640","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412640","url":null,"abstract":"Image inpainting faces the challenging issue of the requirements on structure reasonableness and texture coherence. In this paper, we propose a two-stage inpainting framework to address this issue. The basic idea is to address the two requirements in two separate stages. Completed segmentation of the corrupted image is firstly predicted through segmentation reconstruction network, while fine-grained image details are restored in the second stage through an image generator. The two stages are connected in series as the image details are generated under the guidance of completed segmentation map that predicted in the first stage. Specifically, in the second stage, we propose a novel graph-based relation network to model the relationship existed in corrupted image. In relation network, both intra-relationship for pixels in the same semantic region and inter-relationship between different semantic parts are considered, improving the consistency and compatibility of image textures. Besides, contrastive loss is designed to facilitate the relation network training. Such a framework not only simplifies the inpainting problem directly, but also exploits the relationship in corrupted image explicitly. Extensive experiments on various public datasets quantitatively and qualitatively demonstrate the superiority of our approach compared with the state-of-the-art.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"272 1","pages":"4420-4427"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75776382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412853
Hanbin Hong, Wentao Bao, Yuan Hong, Yu Kong
Visual Privacy Attribute Classification (VPAC) identifies privacy information leakage via social media images. These images containing privacy attributes such as skin color, face or gender are classified into multiple privacy attribute categories in VPAC. With limited works in this task, current methods often extract features from images and simply classify the extracted feature into multiple privacy attribute classes. The dependencies between privacy attributes, e.g., skin color and face typically coexist in the same image, are usually ignored in classification, which causes performance degradation in VPAC. In this paper, we propose a novel end-to-end Privacy Attributes-aware Message Passing Neural Network (PA-MPNN) to address VPAC. Privacy attributes are considered as nodes on a graph and an MPNN is introduced to model the privacy attribute dependencies. To generate representative features for privacy attribute nodes, a class-wise encoder-decoder is proposed to learn a latent space for each attribute. An attention mechanism with multiple correlation matrices is also introduced in MPNN to learn the privacy attributes graph automatically. Experimental results on the Privacy Attribute Dataset demonstrate that our framework achieves better performance than state-of-the-art methods for visual privacy attributes classification.
{"title":"Privacy Attributes-aware Message Passing Neural Network for Visual Privacy Attributes Classification","authors":"Hanbin Hong, Wentao Bao, Yuan Hong, Yu Kong","doi":"10.1109/ICPR48806.2021.9412853","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412853","url":null,"abstract":"Visual Privacy Attribute Classification (VPAC) identifies privacy information leakage via social media images. These images containing privacy attributes such as skin color, face or gender are classified into multiple privacy attribute categories in VPAC. With limited works in this task, current methods often extract features from images and simply classify the extracted feature into multiple privacy attribute classes. The dependencies between privacy attributes, e.g., skin color and face typically coexist in the same image, are usually ignored in classification, which causes performance degradation in VPAC. In this paper, we propose a novel end-to-end Privacy Attributes-aware Message Passing Neural Network (PA-MPNN) to address VPAC. Privacy attributes are considered as nodes on a graph and an MPNN is introduced to model the privacy attribute dependencies. To generate representative features for privacy attribute nodes, a class-wise encoder-decoder is proposed to learn a latent space for each attribute. An attention mechanism with multiple correlation matrices is also introduced in MPNN to learn the privacy attributes graph automatically. Experimental results on the Privacy Attribute Dataset demonstrate that our framework achieves better performance than state-of-the-art methods for visual privacy attributes classification.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"74 1","pages":"4245-4251"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74802655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9412063
Thomas Swearingen, A. Ross
A face identification system compares an unknown input probe image to a gallery of labeled face images in order to determine the identity of the probe image. The result of identification is a ranked match list with the most similar gallery face image at the top (rank 1) and the least similar gallery face image at the bottom. In many systems, the top ranked gallery images may look very similar to the probe image as well as to each other and can sometimes result in the misidentification of the probe image. Such similar looking faces pertaining to different identities are referred to as lookalike faces. We hypothesize that a matcher specifically trained to disambiguate lookalike face images when combined with a regular face matcher will improve overall identification performance. This work proposes reranking the initial ranked match list using a disambiguator especially for lookalike face pairs. This work also evaluates schemes to select gallery images in the initial ranked match list that should be re- ranked. Experiments on the challenging TinyFace dataset shows that the proposed approach improves the closed-set identification accuracy of a state-of-the-art face matcher.
{"title":"Lookalike Disambiguation: Improving Face Identification Performance at Top Ranks","authors":"Thomas Swearingen, A. Ross","doi":"10.1109/ICPR48806.2021.9412063","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412063","url":null,"abstract":"A face identification system compares an unknown input probe image to a gallery of labeled face images in order to determine the identity of the probe image. The result of identification is a ranked match list with the most similar gallery face image at the top (rank 1) and the least similar gallery face image at the bottom. In many systems, the top ranked gallery images may look very similar to the probe image as well as to each other and can sometimes result in the misidentification of the probe image. Such similar looking faces pertaining to different identities are referred to as lookalike faces. We hypothesize that a matcher specifically trained to disambiguate lookalike face images when combined with a regular face matcher will improve overall identification performance. This work proposes reranking the initial ranked match list using a disambiguator especially for lookalike face pairs. This work also evaluates schemes to select gallery images in the initial ranked match list that should be re- ranked. Experiments on the challenging TinyFace dataset shows that the proposed approach improves the closed-set identification accuracy of a state-of-the-art face matcher.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"33 1","pages":"10508-10515"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73158561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-01-10DOI: 10.1109/ICPR48806.2021.9413051
Yongpei Zhu, Zicong Zhou, G. Liao, Kehong Yuan
Recently deep learning-based networks have achieved advanced performance in medical image segmentation. However, the development of deep learning is slow in magnetic resonance image (MRI) segmentation of normal brain tissues. In this paper, inspired by channel attention module, we propose a new architecture, Binary Channel Attention U-Net (BCAU- Net), by introducing a novel Binary Channel Attention Module (BCAM) into skip connection of U-Net, which can take full advantages of the channel information extracted from the encoding path and corresponding decoding path. To better aggregate multiscale spatial information of the feature map, spatial pyramid pooling (SPP) modules with different pooling operations are used in BCAM instead of original average-pooling and max-pooling operations. We verify this model on two datasets including IBSR and MRBrainS18, and obtain better performance on MRI brain segmentation compared with other methods. We believe the proposed method can advance the performance in brain segmentation and clinical diagnosis.
{"title":"BCAU-Net: A Novel Architecture with Binary Channel Attention Module for MRI Brain Segmentation","authors":"Yongpei Zhu, Zicong Zhou, G. Liao, Kehong Yuan","doi":"10.1109/ICPR48806.2021.9413051","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413051","url":null,"abstract":"Recently deep learning-based networks have achieved advanced performance in medical image segmentation. However, the development of deep learning is slow in magnetic resonance image (MRI) segmentation of normal brain tissues. In this paper, inspired by channel attention module, we propose a new architecture, Binary Channel Attention U-Net (BCAU- Net), by introducing a novel Binary Channel Attention Module (BCAM) into skip connection of U-Net, which can take full advantages of the channel information extracted from the encoding path and corresponding decoding path. To better aggregate multiscale spatial information of the feature map, spatial pyramid pooling (SPP) modules with different pooling operations are used in BCAM instead of original average-pooling and max-pooling operations. We verify this model on two datasets including IBSR and MRBrainS18, and obtain better performance on MRI brain segmentation compared with other methods. We believe the proposed method can advance the performance in brain segmentation and clinical diagnosis.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"25 1","pages":"5690-5695"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74402102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}