Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482237
Jiaming Niu, Yu Yang, Tong Yue
As a challenging task, crowd counting has attracted the attention of researchers due to its wide application in the fields of smart video surveillance, smart city construction, and public safety. But at the same time, the impact of many factors, including occlusion, scale changes, and perspective distortion, on task accuracy is still a problem that needs to be solved urgently. On the basis of combing and summarizing the relevant literature, the mainstream population counting methods are reviewed to lay the foundation for more in-depth research in the future. Firstly, it analyzes the research background, current situation and development trend of crowd counting method as a whole. Secondly, the traditional counting method is summarized with the three angles of detection, regression and density estimation as the starting point. Then, it focuses on the crowd counting method based on CNN. Once again, a brief introduction to commonly used counting data sets is given, and Ground Truth generation methods and mainstream evaluation indicators are explained. Finally, based on a series of analyses, the main characteristics and development prospects of population counting are summarized.
{"title":"Current Status and Development Trend of Crowd Counting","authors":"Jiaming Niu, Yu Yang, Tong Yue","doi":"10.1109/IMCEC51613.2021.9482237","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482237","url":null,"abstract":"As a challenging task, crowd counting has attracted the attention of researchers due to its wide application in the fields of smart video surveillance, smart city construction, and public safety. But at the same time, the impact of many factors, including occlusion, scale changes, and perspective distortion, on task accuracy is still a problem that needs to be solved urgently. On the basis of combing and summarizing the relevant literature, the mainstream population counting methods are reviewed to lay the foundation for more in-depth research in the future. Firstly, it analyzes the research background, current situation and development trend of crowd counting method as a whole. Secondly, the traditional counting method is summarized with the three angles of detection, regression and density estimation as the starting point. Then, it focuses on the crowd counting method based on CNN. Once again, a brief introduction to commonly used counting data sets is given, and Ground Truth generation methods and mainstream evaluation indicators are explained. Finally, based on a series of analyses, the main characteristics and development prospects of population counting are summarized.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125353485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Verification code recognition system based on convolutional neural network. In order to strengthen the network security defense work, this paper proposes a novel verification code recognition system based on convolutional neural network. The system combines Internet technology and big data technology, combined with advanced captcha technology, can prevent hackers from brute force cracking behavior to a certain extent. In addition, the system combines convolutional neural network, which makes the verification code combine numbers and letters, which improves the complexity of the verification code and the security of the user account. Based on this, the system uses threshold segmentation method and projection positioning method to construct an 8-layer convolutional neural network model, which enhances the security of the verification code input link. The research results show that the system can enhance the complexity of captcha, improve the recognition rate of captcha, and improve the security of user accounting.
{"title":"Verification Code Recognition Based on Convolutional Neural Network","authors":"Q. Tian, Qishun Song, Hongbo Wang, Zhihong Hu, Siyu Zhu","doi":"10.1109/IMCEC51613.2021.9482170","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482170","url":null,"abstract":"Verification code recognition system based on convolutional neural network. In order to strengthen the network security defense work, this paper proposes a novel verification code recognition system based on convolutional neural network. The system combines Internet technology and big data technology, combined with advanced captcha technology, can prevent hackers from brute force cracking behavior to a certain extent. In addition, the system combines convolutional neural network, which makes the verification code combine numbers and letters, which improves the complexity of the verification code and the security of the user account. Based on this, the system uses threshold segmentation method and projection positioning method to construct an 8-layer convolutional neural network model, which enhances the security of the verification code input link. The research results show that the system can enhance the complexity of captcha, improve the recognition rate of captcha, and improve the security of user accounting.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125409301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482115
Xiaolin Ma, Yuying Xiao
The field of body action recognition is a research hotspot in computer vision. Due to the complex calculation process of traditional recognition algorithms and the limitations of the data set to be processed, action recognition algorithms based on deep learning have gradually attracted attention. Various network frameworks have been proposed, which greatly improved the recognition Accuracy. In view of some problems in the action recognition algorithm of deep learning at this stage, this paper proposes a new R2.5D-GRU network. First, the 3D convolution is decomposed into a two-dimensional spatial convolution and a one-dimensional time convolution, and the low-level spatio-temporal features are extracted, and the high-level temporal features are extracted using GRU for temporal modeling. Experimental results show that the algorithm proposed in this paper performs better than some existing mainstream algorithms in the UCF101 data set.
{"title":"Action recognition based on R2.5D-GRU networks","authors":"Xiaolin Ma, Yuying Xiao","doi":"10.1109/IMCEC51613.2021.9482115","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482115","url":null,"abstract":"The field of body action recognition is a research hotspot in computer vision. Due to the complex calculation process of traditional recognition algorithms and the limitations of the data set to be processed, action recognition algorithms based on deep learning have gradually attracted attention. Various network frameworks have been proposed, which greatly improved the recognition Accuracy. In view of some problems in the action recognition algorithm of deep learning at this stage, this paper proposes a new R2.5D-GRU network. First, the 3D convolution is decomposed into a two-dimensional spatial convolution and a one-dimensional time convolution, and the low-level spatio-temporal features are extracted, and the high-level temporal features are extracted using GRU for temporal modeling. Experimental results show that the algorithm proposed in this paper performs better than some existing mainstream algorithms in the UCF101 data set.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"10 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126208770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482216
Chao Jia, Jianjing Wei
Abdominal organ-related diseases have become one of the main diseases that affect people’s healthy life. MRI is a clinical diagnosis method for abdominal-related diseases. Through MRI, doctors can have a more intuitive observation of the tissue lesions in the human abdomen to make more detailed observations. Accurate diagnosis, therefore, accurate image segmentation of MRI images has very important clinical value. Traditional segmentation methods are relatively inefficient for organ segmentation with large abdominal deformation, small volume and blurry tissue edges. In this paper, we propose a AMO-Net to overcome these limitations. First, we extend the single encoder-decoder architecture to 2 layers to learn richer feature representations. Second, the feature pyramid structure is introduced into the proposed network, which can effectively handle multi-scale changes, is friendly to small target object recognition, and can be associated with remote feature information. Finally, a module called Hierarchical-Split Block is introduced to improve CNN performance. We evaluate our model on the CHAOS challenge dataset, and the final experiment proves that our model achieves better segmentation performance compared with other state-of-the-art segmentation networks.
{"title":"AMO-Net: abdominal multi-organ segmentation in MRI with a extend Unet","authors":"Chao Jia, Jianjing Wei","doi":"10.1109/IMCEC51613.2021.9482216","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482216","url":null,"abstract":"Abdominal organ-related diseases have become one of the main diseases that affect people’s healthy life. MRI is a clinical diagnosis method for abdominal-related diseases. Through MRI, doctors can have a more intuitive observation of the tissue lesions in the human abdomen to make more detailed observations. Accurate diagnosis, therefore, accurate image segmentation of MRI images has very important clinical value. Traditional segmentation methods are relatively inefficient for organ segmentation with large abdominal deformation, small volume and blurry tissue edges. In this paper, we propose a AMO-Net to overcome these limitations. First, we extend the single encoder-decoder architecture to 2 layers to learn richer feature representations. Second, the feature pyramid structure is introduced into the proposed network, which can effectively handle multi-scale changes, is friendly to small target object recognition, and can be associated with remote feature information. Finally, a module called Hierarchical-Split Block is introduced to improve CNN performance. We evaluate our model on the CHAOS challenge dataset, and the final experiment proves that our model achieves better segmentation performance compared with other state-of-the-art segmentation networks.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120914853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482176
Jing Jin, Keyi Wang, Wei Wang
In the actual application, the QR code was affected by the collection conditions, environment and surface of substrate, which would cause a series of defects such as noise pollution, local highlight and geometric distortion. These defects would lead to the reduction of recognition rate. The research of preprocessing, area detection, extraction and correction processing for these defects was based on the principle of image processing. Methods were proposed to optimize the QR code and improve the recognition rate. According to the basic principles of digital image processing, the processing technology of QR code was analyzed. The algorithm of threshold segmentation, the molecular block Otsu method was used to deal with the uneven illumination. Based on the special structural features of the position detection pattern, the QR code was detected and extracted from the image. The method of anti-perspective transformation was used to correct the QR code with geometric distortion. The QR code printed on the surface was fitted to surface area, and the corrected graphic was obtained. Then grayscale was interpolated into the corresponding coordinates to get the corrected QR code. The jagged and virtual dots in the image were eliminated by the morphological close operation. The results showed that the Otsu method that combined with the principle of threshold segmentation was applied to process the QR code with uneven lighting and low brightness.
{"title":"Research on correction and recognition of QR code on cylinder","authors":"Jing Jin, Keyi Wang, Wei Wang","doi":"10.1109/IMCEC51613.2021.9482176","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482176","url":null,"abstract":"In the actual application, the QR code was affected by the collection conditions, environment and surface of substrate, which would cause a series of defects such as noise pollution, local highlight and geometric distortion. These defects would lead to the reduction of recognition rate. The research of preprocessing, area detection, extraction and correction processing for these defects was based on the principle of image processing. Methods were proposed to optimize the QR code and improve the recognition rate. According to the basic principles of digital image processing, the processing technology of QR code was analyzed. The algorithm of threshold segmentation, the molecular block Otsu method was used to deal with the uneven illumination. Based on the special structural features of the position detection pattern, the QR code was detected and extracted from the image. The method of anti-perspective transformation was used to correct the QR code with geometric distortion. The QR code printed on the surface was fitted to surface area, and the corrected graphic was obtained. Then grayscale was interpolated into the corresponding coordinates to get the corrected QR code. The jagged and virtual dots in the image were eliminated by the morphological close operation. The results showed that the Otsu method that combined with the principle of threshold segmentation was applied to process the QR code with uneven lighting and low brightness.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121247712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482335
Shuai Zhang, Jinlong Wang, Decai Wang
The rapid development of information technology and its in-depth application in various industries have brought about tremendous changes in people's life and learning methods. Students often arrange time before class to log in to the cloud class platform, and participate in discussions and interactions through learning micro-class videos, animations, courseware, documents, websites and other teaching resources provided by teachers to complete the learning tasks required by the learning task list. This study uses the social statistics software package SPSS22.0 to analyze the questionnaire. The validity analysis uses the principal component analysis method and the maximum variation method orthogonal rotation for confirmatory factor analysis, and forced extraction of four factors. Data shows that 53.1% of learners believe that online learning based on cloud platforms can improved learning Efficiency. The hybrid learning model based on the cloud platform can solve the time wasting problem in traditional classrooms and meet the needs of students' autonomous learning.
{"title":"Design of Hybrid Learning Mode in Higher Vocational Colleges Based on Cloud Platform","authors":"Shuai Zhang, Jinlong Wang, Decai Wang","doi":"10.1109/IMCEC51613.2021.9482335","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482335","url":null,"abstract":"The rapid development of information technology and its in-depth application in various industries have brought about tremendous changes in people's life and learning methods. Students often arrange time before class to log in to the cloud class platform, and participate in discussions and interactions through learning micro-class videos, animations, courseware, documents, websites and other teaching resources provided by teachers to complete the learning tasks required by the learning task list. This study uses the social statistics software package SPSS22.0 to analyze the questionnaire. The validity analysis uses the principal component analysis method and the maximum variation method orthogonal rotation for confirmatory factor analysis, and forced extraction of four factors. Data shows that 53.1% of learners believe that online learning based on cloud platforms can improved learning Efficiency. The hybrid learning model based on the cloud platform can solve the time wasting problem in traditional classrooms and meet the needs of students' autonomous learning.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"16 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116231938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482145
Weizhe Wang
Semantic segmentation of complex traffic scenes is a challenging research topic in the field of computer vision. In order to reduce the dependence of the segmentation model on the pixel-level annotation data of traffic scenes, we propose a semi-supervised semantic segmentation algorithm model based on knowledge distillation. The self-correcting module is used to iteratively optimize the weakly labeled data and generate pseudo-labels. The collaborative learning of multiple students enhances the ability of students to accept potential knowledge online. The proposed method uses the knowledge distillation structure of the teacher-student network to transfer semantic structured information. It solves the problem of insufficient fine label samples in the Cityscapes dataset. The network performance obtained by training with the original label data combined with the pseudo label data can be further improved.
{"title":"Semi-supervised Semantic Segmentation Network based on Knowledge Distillation","authors":"Weizhe Wang","doi":"10.1109/IMCEC51613.2021.9482145","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482145","url":null,"abstract":"Semantic segmentation of complex traffic scenes is a challenging research topic in the field of computer vision. In order to reduce the dependence of the segmentation model on the pixel-level annotation data of traffic scenes, we propose a semi-supervised semantic segmentation algorithm model based on knowledge distillation. The self-correcting module is used to iteratively optimize the weakly labeled data and generate pseudo-labels. The collaborative learning of multiple students enhances the ability of students to accept potential knowledge online. The proposed method uses the knowledge distillation structure of the teacher-student network to transfer semantic structured information. It solves the problem of insufficient fine label samples in the Cityscapes dataset. The network performance obtained by training with the original label data combined with the pseudo label data can be further improved.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115289793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482016
Keqing Guan, Xianli Kong
Personalized recommendation can effectively solve the negative impact of information overload on users and improve user experience in the big data environment. How to build an effective personalized recommendation system has become a common concern of industry and academia. Based on the basic idea of multi-layer perceptron, this paper constructs a personalized recommendation model of multi-source information. By introducing the relevant information of users and recommended items, iterative learning is carried out to improve the accuracy of user preference prediction. Combined with multi-layer perceptron method, the extended model is constructed. Based on TensorFlow framework, the batch data flow method is used to train the model. The implementation framework of the method is built, and the effectiveness of the method is verified by movielens data set. Experimental results show that the proposed method can effectively improve the accuracy of user preference prediction.
{"title":"Research on Personalized Recommendation Method Based on Multi-source Information Learning","authors":"Keqing Guan, Xianli Kong","doi":"10.1109/IMCEC51613.2021.9482016","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482016","url":null,"abstract":"Personalized recommendation can effectively solve the negative impact of information overload on users and improve user experience in the big data environment. How to build an effective personalized recommendation system has become a common concern of industry and academia. Based on the basic idea of multi-layer perceptron, this paper constructs a personalized recommendation model of multi-source information. By introducing the relevant information of users and recommended items, iterative learning is carried out to improve the accuracy of user preference prediction. Combined with multi-layer perceptron method, the extended model is constructed. Based on TensorFlow framework, the batch data flow method is used to train the model. The implementation framework of the method is built, and the effectiveness of the method is verified by movielens data set. Experimental results show that the proposed method can effectively improve the accuracy of user preference prediction.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121697625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482231
Lei Chen, Chengyao Tang, Kecheng Zhang, Jingyuan Li, Weihua Mou
In transponding satellite communication (TSC) system, coarse synchronization is the key for the ground station to receive and process inbound signals, which directly affects the sensitivity of signal receiving and detection probability. The power and Doppler effect of the transmitted signal vary from terminals to terminals due to differences in positions, dynamic scenes, antenna status as well as terminal types. Furthermore, the satellite motion is variable, especially the high dynamic LEO(Low Earth Orbit) satellite communication systems in the future, whose satellite antenna beam direction changes rapidly. All these problems bring challenges to the coarse synchronization of ground station in transponding satellite communication system. At present, the ground station coarse synchronization mostly adopts the fixed threshold estimation and CFAR threshold estimation, which are not suitable for the future LEO satellite communication systems. In this paper, an adaptive threshold estimation method for inbounding signals coarse synchronization, which neither changes the framework of transponding satellite communication system nor requires additional resources, is proposed to obtain more stable acquisition sensitivity and increase the detection probability in the dynamic scene. Also, the theoretical analysis and simulations results are presented to verify this method.
{"title":"An Adaptive Threshold Estimation for Coarse Synchronization in Transponding Satellite Communication System","authors":"Lei Chen, Chengyao Tang, Kecheng Zhang, Jingyuan Li, Weihua Mou","doi":"10.1109/IMCEC51613.2021.9482231","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482231","url":null,"abstract":"In transponding satellite communication (TSC) system, coarse synchronization is the key for the ground station to receive and process inbound signals, which directly affects the sensitivity of signal receiving and detection probability. The power and Doppler effect of the transmitted signal vary from terminals to terminals due to differences in positions, dynamic scenes, antenna status as well as terminal types. Furthermore, the satellite motion is variable, especially the high dynamic LEO(Low Earth Orbit) satellite communication systems in the future, whose satellite antenna beam direction changes rapidly. All these problems bring challenges to the coarse synchronization of ground station in transponding satellite communication system. At present, the ground station coarse synchronization mostly adopts the fixed threshold estimation and CFAR threshold estimation, which are not suitable for the future LEO satellite communication systems. In this paper, an adaptive threshold estimation method for inbounding signals coarse synchronization, which neither changes the framework of transponding satellite communication system nor requires additional resources, is proposed to obtain more stable acquisition sensitivity and increase the detection probability in the dynamic scene. Also, the theoretical analysis and simulations results are presented to verify this method.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121706902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-18DOI: 10.1109/IMCEC51613.2021.9482207
Rongrong Liu, Dongzhi He
In this paper, we propose vertical attention and spatial attention network (VSANet), which is a semantic segmentation method based on Deeplabv3+ and attention module, for improving semantic segmentation effect for autonomous driving road scene images. The improvement of this paper is primarily on the following two aspects. One is that this paper introduces the spatial attention module (SAM) after the atrous convolution, which effectively obtains more spatial context information. Second, by studying the road scene image, it is found that there are considerable differences in the pixel-level distribution of the horizontal segmentation area in the image. For this reason, this paper introduces the vertical attention module (VAM), which can better segment the road scene image. A large number of experimental results indicate that the segmentation accuracy of the proposed model is improved by 1.94% compared with the Deeplabv3+ network model on the test dataset of Cityscapes dataset.
为了提高自动驾驶道路场景图像的语义分割效果,本文提出了一种基于Deeplabv3+和注意力模块的语义分割方法——垂直注意力和空间注意力网络(vertical attention and spatial attention network, VSANet)。本文的改进主要体现在以下两个方面。一是在亚历斯卷积之后引入空间注意模块(SAM),有效地获取了更多的空间上下文信息。其次,通过对道路场景图像的研究,发现图像中水平分割区域的像素级分布存在较大差异。为此,本文引入了垂直关注模块(vertical attention module, VAM),该模块可以更好地分割道路场景图像。大量实验结果表明,在cityscape数据集的测试数据集上,与Deeplabv3+网络模型相比,该模型的分割精度提高了1.94%。
{"title":"Semantic Segmentation Based on Deeplabv3+ and Attention Mechanism","authors":"Rongrong Liu, Dongzhi He","doi":"10.1109/IMCEC51613.2021.9482207","DOIUrl":"https://doi.org/10.1109/IMCEC51613.2021.9482207","url":null,"abstract":"In this paper, we propose vertical attention and spatial attention network (VSANet), which is a semantic segmentation method based on Deeplabv3+ and attention module, for improving semantic segmentation effect for autonomous driving road scene images. The improvement of this paper is primarily on the following two aspects. One is that this paper introduces the spatial attention module (SAM) after the atrous convolution, which effectively obtains more spatial context information. Second, by studying the road scene image, it is found that there are considerable differences in the pixel-level distribution of the horizontal segmentation area in the image. For this reason, this paper introduces the vertical attention module (VAM), which can better segment the road scene image. A large number of experimental results indicate that the segmentation accuracy of the proposed model is improved by 1.94% compared with the Deeplabv3+ network model on the test dataset of Cityscapes dataset.","PeriodicalId":240400,"journal":{"name":"2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123750607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}