As one of the important fields of computer vision research, pedestrian attribute recognition has received increasing attention on researchers at domestic and foreign. However, obtaining long-distance pedestrian information on actual scenes has problems, such as lack of information, incomplete feature extraction, and low attribute recognition accuracy. To address these issues, we proposed a Dual Adaptive Fusion Attention and Criss-Cross Attention Module (DAFCC). This module contains two sub-modules: First, the dual adaptive fusion attention module automatically adjusts the weights of attributes in different scales, then fusion the different scale features and makes attribute extraction more complete. Second, we employ criss-cross attention to extract rich contextual information, which is beneficial for visual understanding. By training on the public PA-100K, RAP and PETA datasets, the mean accuracies achieved 81.09%, 81.44% and 85.94%, respectively. Extensive experimental results show that the method has strong competitiveness among many current classical algorithms.
{"title":"Improving Pedestrian Attribute Recognition with Dual Adaptive Fusion Attention","authors":"Wenbiao Xie, Chen Zou, Chengui Fu, Xiaomei Xie, Qiuming Liu, He Xiao","doi":"10.1145/3581807.3581814","DOIUrl":"https://doi.org/10.1145/3581807.3581814","url":null,"abstract":"As one of the important fields of computer vision research, pedestrian attribute recognition has received increasing attention on researchers at domestic and foreign. However, obtaining long-distance pedestrian information on actual scenes has problems, such as lack of information, incomplete feature extraction, and low attribute recognition accuracy. To address these issues, we proposed a Dual Adaptive Fusion Attention and Criss-Cross Attention Module (DAFCC). This module contains two sub-modules: First, the dual adaptive fusion attention module automatically adjusts the weights of attributes in different scales, then fusion the different scale features and makes attribute extraction more complete. Second, we employ criss-cross attention to extract rich contextual information, which is beneficial for visual understanding. By training on the public PA-100K, RAP and PETA datasets, the mean accuracies achieved 81.09%, 81.44% and 85.94%, respectively. Extensive experimental results show that the method has strong competitiveness among many current classical algorithms.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121510124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Zhang, Lunke Fei, Yun Li, Fangqi Nie, Qiaoxian Jiang, Libing Liang, Pengcheng Yan
Incomplete multi-view clustering has attracted board attention due to the frequent absent of some views of real-world objects. Existing incomplete multi-view clustering methods usually assign different weights to different views to learn the consensus graph of multi-views, which however cannot preserve properly the non-noise information in the views of lower weight. In this paper, unlike existing view-level weighted graph learning, we propose a simple yet effective instance-level weighted graph learning for incomplete multi-view clustering. Specifically, we first use the similarity information of the available views to estimate and recover the missing views, such that the harmful impact of the missing views can be reduced. Then, we adaptively assign the weights to the similarities between different perspectives such that negative effects of noises are reduced. Finally, by combining graph fusion and rank constraints, we can learn a new consensus representation of multi-view data for incomplete multi-view analysis. Experimental results on five widely used incomplete multi-view datasets clearly demonstrate the effectiveness of our proposed method.
{"title":"Instance-level Weighted Graph Learning for Incomplete Multi-view Clustering","authors":"J. Zhang, Lunke Fei, Yun Li, Fangqi Nie, Qiaoxian Jiang, Libing Liang, Pengcheng Yan","doi":"10.1145/3581807.3581832","DOIUrl":"https://doi.org/10.1145/3581807.3581832","url":null,"abstract":"Incomplete multi-view clustering has attracted board attention due to the frequent absent of some views of real-world objects. Existing incomplete multi-view clustering methods usually assign different weights to different views to learn the consensus graph of multi-views, which however cannot preserve properly the non-noise information in the views of lower weight. In this paper, unlike existing view-level weighted graph learning, we propose a simple yet effective instance-level weighted graph learning for incomplete multi-view clustering. Specifically, we first use the similarity information of the available views to estimate and recover the missing views, such that the harmful impact of the missing views can be reduced. Then, we adaptively assign the weights to the similarities between different perspectives such that negative effects of noises are reduced. Finally, by combining graph fusion and rank constraints, we can learn a new consensus representation of multi-view data for incomplete multi-view analysis. Experimental results on five widely used incomplete multi-view datasets clearly demonstrate the effectiveness of our proposed method.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122133257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To achieve fast and accurate identification of rice diseases in the field, we propose an automatic rice disease classifier, in which the process of characterizing rice diseases is visualized and analyzed by a deconvolutional neural network. An AlexNet model, pretrained by ImageNet, is constructed and trained on rice disease images to classify them. After the training is completed, the signal is repositioned to the corresponding position of the input image by a deconvolutional neural network corresponding to the AlexNet structure. The set of pixels that contribute most to the prediction of the convolutional neural network is identified from the deconvolution visualization map. The experimental results demonstrated the effectiveness of the proposed method. The classifier achieved an accuracy of 90.03% for the rice disease dataset, which was 8.39% and 16.78% higher than the accuracies achieved by the LeNet and BP neural networks, respectively. The features of the middle layer of the convolutional neural network perform a hierarchical transformation from low-level information, such as color, to high-level information, such as contours and edges of disease spots. This transformation process matches the criteria for the actual identification of rice diseases. The proposed method lays the foundation for the accurate identification of crop diseases and the design and adjustment of deep convolutional neural network structures.
{"title":"Rice Disease Recognition and Feature Visualization Using a Convolutional Neural Network","authors":"Yan Wei, Zhibin Wang, Xiao-Jun Qiao","doi":"10.1145/3581807.3581811","DOIUrl":"https://doi.org/10.1145/3581807.3581811","url":null,"abstract":"To achieve fast and accurate identification of rice diseases in the field, we propose an automatic rice disease classifier, in which the process of characterizing rice diseases is visualized and analyzed by a deconvolutional neural network. An AlexNet model, pretrained by ImageNet, is constructed and trained on rice disease images to classify them. After the training is completed, the signal is repositioned to the corresponding position of the input image by a deconvolutional neural network corresponding to the AlexNet structure. The set of pixels that contribute most to the prediction of the convolutional neural network is identified from the deconvolution visualization map. The experimental results demonstrated the effectiveness of the proposed method. The classifier achieved an accuracy of 90.03% for the rice disease dataset, which was 8.39% and 16.78% higher than the accuracies achieved by the LeNet and BP neural networks, respectively. The features of the middle layer of the convolutional neural network perform a hierarchical transformation from low-level information, such as color, to high-level information, such as contours and edges of disease spots. This transformation process matches the criteria for the actual identification of rice diseases. The proposed method lays the foundation for the accurate identification of crop diseases and the design and adjustment of deep convolutional neural network structures.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115102547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongbo Liang, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia
Early detection of melanoma and prompt treatment are key approaches to reducing melanoma-related deaths. In order to improve the ability of early detection of melanoma, this paper introduces a set of hyperspectral images (HSIs) data captured by dermoscopy using hyperspectral technology, and based on this data, proposes a principal component self-attention mechanism (PCSAM) method for the classification of dysplastic nevus and melanoma. The proposed method uses principal component analysis technology to amplify the differences in spectral features of the lesions and extract some new features that are convenient for classification. In addition, under the action of the attention mechanism, the spectral features of melanoma are fully paid attention to, and the contextual spatial information between each HSI block can also be utilized. Finally, a comparison experiment is carried out using RGB images and HSIs. Experimental results demonstrate that the spectral features of melanoma can significantly improve the classification accuracy, and it also shows that the participation of hyperspectral technology can effectively improve the recognition accuracy of dysplastic nevus and melanoma, which reflects the advantages of HSI compared with the traditional image.
{"title":"Principal component self-attention mechanism for melanoma hyperspectral image recognition","authors":"Hongbo Liang, Nanying Li, Jiaqi Xue, Yaqian Long, S. Jia","doi":"10.1145/3581807.3581843","DOIUrl":"https://doi.org/10.1145/3581807.3581843","url":null,"abstract":"Early detection of melanoma and prompt treatment are key approaches to reducing melanoma-related deaths. In order to improve the ability of early detection of melanoma, this paper introduces a set of hyperspectral images (HSIs) data captured by dermoscopy using hyperspectral technology, and based on this data, proposes a principal component self-attention mechanism (PCSAM) method for the classification of dysplastic nevus and melanoma. The proposed method uses principal component analysis technology to amplify the differences in spectral features of the lesions and extract some new features that are convenient for classification. In addition, under the action of the attention mechanism, the spectral features of melanoma are fully paid attention to, and the contextual spatial information between each HSI block can also be utilized. Finally, a comparison experiment is carried out using RGB images and HSIs. Experimental results demonstrate that the spectral features of melanoma can significantly improve the classification accuracy, and it also shows that the participation of hyperspectral technology can effectively improve the recognition accuracy of dysplastic nevus and melanoma, which reflects the advantages of HSI compared with the traditional image.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122875871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nan Dong, Xinfeng Zhang, Xiaomin Liu, Weifeng Guo, Fei Wang
Point cloud data is a collection of massive points containing the spatial position of each point on the target surface, which contains abundant spatial information. At present, it is also applied to the digital modeling of human surface in medical imaging, as the data basis for subsequent human body measurement, morphology estimation and data analysis. Among them, the key points is defined as the landmark position of the surface morphology analysis, those key points provides a reference position for the analysis work, and also reflects the symmetry of the body to a certain extent and morphology information. Aiming at the back shape analysis in clinical diagnosis, this paper proposes a two-stage key points positioning scheme of coarse segmentation and fine positioning. We design and build an pointwise artificial neural network to roughly locate the body part, in this part, we propose a maximum pooling module based on spatial location coding to express local features more strongly. Farther, we propose a gray distance and curvature based operator to match the position of key points. The experiment, shows that our method can effectively enhance the distinctiveness of features and meanwhile, reduce the influence from background.
{"title":"Key Points Positioning: A Two-Stage Algorithm For Single-view Point Cloud of Human Back Based on Point-wise Network","authors":"Nan Dong, Xinfeng Zhang, Xiaomin Liu, Weifeng Guo, Fei Wang","doi":"10.1145/3581807.3581846","DOIUrl":"https://doi.org/10.1145/3581807.3581846","url":null,"abstract":"Point cloud data is a collection of massive points containing the spatial position of each point on the target surface, which contains abundant spatial information. At present, it is also applied to the digital modeling of human surface in medical imaging, as the data basis for subsequent human body measurement, morphology estimation and data analysis. Among them, the key points is defined as the landmark position of the surface morphology analysis, those key points provides a reference position for the analysis work, and also reflects the symmetry of the body to a certain extent and morphology information. Aiming at the back shape analysis in clinical diagnosis, this paper proposes a two-stage key points positioning scheme of coarse segmentation and fine positioning. We design and build an pointwise artificial neural network to roughly locate the body part, in this part, we propose a maximum pooling module based on spatial location coding to express local features more strongly. Farther, we propose a gray distance and curvature based operator to match the position of key points. The experiment, shows that our method can effectively enhance the distinctiveness of features and meanwhile, reduce the influence from background.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116232236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the largest online social platform in China, Weibo enables users to freely access and share information, and plays an important role in the dissemination of public opinion. The hot topics on Weibo include multiple pieces of information, the dissemination of which is not an isolated process but affects each other. In consideration of the problem of unclear multi-information propagation rules and insufficient analysis of factors influencing public opinion in real networks, this paper analyzes the multi-information delayed transmission scenario in complex network environments and constructs the Multiple-Information Delay-transmission Susceptible-Forwarding-Immune (MD-SFIFI) model considering the situation that the first message of a hot event has a certain time interval from the release of other messages. Data fitting is conducted to prove the validity of our model. This paper realizes the study of multi-information dissemination law by analyzing the correlation between parameters and information dissemination indicators, and summarizes the multi-information dissemination law, aiming to provide theoretical and data support for the decision making and research of government public opinion response and governance.
{"title":"Modeling and Analyzing the Multi-Information Network Propagation Dynamics on Hot Events","authors":"Yuwei She, Xinyi Jiang, Changyi Wu, Fulian Yin","doi":"10.1145/3581807.3581897","DOIUrl":"https://doi.org/10.1145/3581807.3581897","url":null,"abstract":"As the largest online social platform in China, Weibo enables users to freely access and share information, and plays an important role in the dissemination of public opinion. The hot topics on Weibo include multiple pieces of information, the dissemination of which is not an isolated process but affects each other. In consideration of the problem of unclear multi-information propagation rules and insufficient analysis of factors influencing public opinion in real networks, this paper analyzes the multi-information delayed transmission scenario in complex network environments and constructs the Multiple-Information Delay-transmission Susceptible-Forwarding-Immune (MD-SFIFI) model considering the situation that the first message of a hot event has a certain time interval from the release of other messages. Data fitting is conducted to prove the validity of our model. This paper realizes the study of multi-information dissemination law by analyzing the correlation between parameters and information dissemination indicators, and summarizes the multi-information dissemination law, aiming to provide theoretical and data support for the decision making and research of government public opinion response and governance.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124415824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The fake videos generated by deep generation technology pose a potential threat to social stability, which makes it critical to detect fake videos. Although the previous detection methods have achieved high accuracy, the generalization to different datasets and in realistic scenes is not effective. We find several novel temporal and spatial clues. In the frequency domain, the inter-frame differences between the real and fake videos are significantly more obvious than the intra-frame differences. In the shallow texture on the CbCr color channels, the forged areas of the fake videos appear more distinct blurring compared to the real videos. And the optical flow of the real video changes gradually, while the optical flow of the fake video changes drastically. This paper proposes a spatio-temporal dual Transformer network for video forgery detection that integrates spatio-temporal clues with the temporal consistency of consecutive frames to improve generalization. Specifically, an EfficientNet is first used to extract spatial artifacts of shallow textures and high-frequency information. We add a new loss function to EfficientNet to extract more robust face features, as well as introduce an attention mechanism to enhance the extracted features. Next, a Swin Transformer is used to capture the subtle temporal artifacts in inter-frame spectrum difference and the optical flow. A feature interaction module is added to fuse local features and global representations. Finally, another Swin Transformer is used to classify the videos according to the extracted spatio-temporal features. We evaluate our method on datasets such as FaceForensics++, Celeb-DF (v2) and DFDC. Extensive experiments show that the proposed framework has high accuracy and generalization, outperforming the current state-of-the-art methods.
{"title":"Video Forgery Detection Using Spatio-Temporal Dual Transformer","authors":"Chenyu Liu, Jia Li, Junxian Duan, Huaibo Huang","doi":"10.1145/3581807.3581847","DOIUrl":"https://doi.org/10.1145/3581807.3581847","url":null,"abstract":"The fake videos generated by deep generation technology pose a potential threat to social stability, which makes it critical to detect fake videos. Although the previous detection methods have achieved high accuracy, the generalization to different datasets and in realistic scenes is not effective. We find several novel temporal and spatial clues. In the frequency domain, the inter-frame differences between the real and fake videos are significantly more obvious than the intra-frame differences. In the shallow texture on the CbCr color channels, the forged areas of the fake videos appear more distinct blurring compared to the real videos. And the optical flow of the real video changes gradually, while the optical flow of the fake video changes drastically. This paper proposes a spatio-temporal dual Transformer network for video forgery detection that integrates spatio-temporal clues with the temporal consistency of consecutive frames to improve generalization. Specifically, an EfficientNet is first used to extract spatial artifacts of shallow textures and high-frequency information. We add a new loss function to EfficientNet to extract more robust face features, as well as introduce an attention mechanism to enhance the extracted features. Next, a Swin Transformer is used to capture the subtle temporal artifacts in inter-frame spectrum difference and the optical flow. A feature interaction module is added to fuse local features and global representations. Finally, another Swin Transformer is used to classify the videos according to the extracted spatio-temporal features. We evaluate our method on datasets such as FaceForensics++, Celeb-DF (v2) and DFDC. Extensive experiments show that the proposed framework has high accuracy and generalization, outperforming the current state-of-the-art methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127295015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Based on the mainstream deep learning model BiLSTM-CRF, the electronic medical record named entity recognition model Bi-RNN-LSTM-RNN-CRF is established. First collect the electronic medical record data set, then convert the characters into vectors through the word vector tool, enter them into the bidirectional RNN-LSTM-RNN layer for training, and then enter the training results into the CRF layer, calculate the loss function to obtain the prediction results, and record the time that the process took.Finally, repeat the above steps with the traditional BiLSTM-CRF model to compare the results of the two models. Experimental results show that the F1 value of the Bi-RNN-LSTM-RNN-CRF model can reach 97.80%, and the recognition effect is slightly inferior to that of BiLSTM-CRF.
{"title":"Chinese Electronic Medical Record Named Entity Recognition Based on Bi-RNN-LSTM-RNN-CRF","authors":"Chenquan Dai, Xiaobin Zhuang, Jiaxin Cai","doi":"10.1145/3581807.3581892","DOIUrl":"https://doi.org/10.1145/3581807.3581892","url":null,"abstract":"Based on the mainstream deep learning model BiLSTM-CRF, the electronic medical record named entity recognition model Bi-RNN-LSTM-RNN-CRF is established. First collect the electronic medical record data set, then convert the characters into vectors through the word vector tool, enter them into the bidirectional RNN-LSTM-RNN layer for training, and then enter the training results into the CRF layer, calculate the loss function to obtain the prediction results, and record the time that the process took.Finally, repeat the above steps with the traditional BiLSTM-CRF model to compare the results of the two models. Experimental results show that the F1 value of the Bi-RNN-LSTM-RNN-CRF model can reach 97.80%, and the recognition effect is slightly inferior to that of BiLSTM-CRF.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134204637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.
{"title":"Multi-Scale Channel Attention for Chinese Scene Text Recognition","authors":"Haiqing Liao, X. Du, Yun Wu, Da-Han Wang","doi":"10.1145/3581807.3581808","DOIUrl":"https://doi.org/10.1145/3581807.3581808","url":null,"abstract":"Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"13 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114029209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The continuous development of spaceborne synthetic aperture radar (SAR) technology promotes the research of ship classification and plays an important role in maritime surveillance. At present, the mainstream ship classification based on the deep learning method in SAR images has achieved a state-of-the-art performance, but it heavily depends on plenty of labeled samples. Compared with SAR images, the automatic identification system (AIS) can provide a large amount of data that is relatively easy to obtain and contains rich ship information. Therefore, in order to solve the problem of ship classification in SAR images with limited samples, a ship object classification method by AIS data aided is proposed in this paper. Specifically, we first train the ship classification model SMOTEBoost on AIS data, and then transfer the trained model to SAR images for ship type prediction. Experimental results show that the proposed method achieves classification accuracy as high as 93%, which proves that AIS data transfer can effectively solve the problem of ship classification in SAR images with limited samples.
{"title":"Research on AIS Data Aided Ship Classification in Spaceborne SAR Images","authors":"Zhenguo Yan, Xin Song, Lei Yang","doi":"10.1145/3581807.3581833","DOIUrl":"https://doi.org/10.1145/3581807.3581833","url":null,"abstract":"The continuous development of spaceborne synthetic aperture radar (SAR) technology promotes the research of ship classification and plays an important role in maritime surveillance. At present, the mainstream ship classification based on the deep learning method in SAR images has achieved a state-of-the-art performance, but it heavily depends on plenty of labeled samples. Compared with SAR images, the automatic identification system (AIS) can provide a large amount of data that is relatively easy to obtain and contains rich ship information. Therefore, in order to solve the problem of ship classification in SAR images with limited samples, a ship object classification method by AIS data aided is proposed in this paper. Specifically, we first train the ship classification model SMOTEBoost on AIS data, and then transfer the trained model to SAR images for ship type prediction. Experimental results show that the proposed method achieves classification accuracy as high as 93%, which proves that AIS data transfer can effectively solve the problem of ship classification in SAR images with limited samples.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"20 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120903530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}