How to effectively extract discriminative spatial and temporal features is important for skeleton-based action recognition. However, current researches on skeleton-based action recognition mainly focus on the natural connections of the skeleton and original temporal sequences of the skeleton frames, which ignore the inter-related relation of non-adjacent joints and the variant velocities of action instances. To overcome these limitations and therefore enhance the spatial and temporal features extraction for action recognition, we propose a novel Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network (SA-MTGCN) for skeleton-based action recognition. Specifically, as the relation of non-adjacent but inter-related joints is beneficial for action recognition, we propose an Attention-Enhanced Spatial Graph Convolutional Network (A-SGCN) to use both natural connection and inter-related relation of joints. Furthermore, a Multi-Timescale (MT) structure is proposed to enhance temporal feature extraction by gathering different network layers to model different velocities of action instances. Experimental results on the two widely used NTU and Kinetics datasets demonstrate the effectiveness of our approach.
{"title":"A Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network for Skeleton-Based Action Recognition","authors":"Shuqiong Zhu, Xiaolu Ding, Kai Yang, Wai Chen","doi":"10.1145/3430199.3430213","DOIUrl":"https://doi.org/10.1145/3430199.3430213","url":null,"abstract":"How to effectively extract discriminative spatial and temporal features is important for skeleton-based action recognition. However, current researches on skeleton-based action recognition mainly focus on the natural connections of the skeleton and original temporal sequences of the skeleton frames, which ignore the inter-related relation of non-adjacent joints and the variant velocities of action instances. To overcome these limitations and therefore enhance the spatial and temporal features extraction for action recognition, we propose a novel Spatial Attention-Enhanced Multi-Timescale Graph Convolutional Network (SA-MTGCN) for skeleton-based action recognition. Specifically, as the relation of non-adjacent but inter-related joints is beneficial for action recognition, we propose an Attention-Enhanced Spatial Graph Convolutional Network (A-SGCN) to use both natural connection and inter-related relation of joints. Furthermore, a Multi-Timescale (MT) structure is proposed to enhance temporal feature extraction by gathering different network layers to model different velocities of action instances. Experimental results on the two widely used NTU and Kinetics datasets demonstrate the effectiveness of our approach.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126857525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate and reliable localization is necessary for vehicle autonomous driving. Existing localization systems based on the GNSS cannot always provide lane-level accuracy. This paper proposes the method that improves vehicle localization by using road lanes recognized from a camera and a digital map. Iterative Closest Point (ICP) matching is performed for generated point clouds to minimize lateral error. The neural network is used for lane detection, detections are post-processed and fitted to the polynomial. Changes that allowed improving ICP matching are described. Finally, we perform an experiment with GPS RTK signal as ground truth and demonstrate that the proposed method has a position error of less than 0.5 m for vehicle localization.
{"title":"Map relative localization based on road lane matching with Iterative Closest Point algorithm","authors":"A. Evlampev, I. Shapovalov, S. Gafurov","doi":"10.1145/3430199.3430229","DOIUrl":"https://doi.org/10.1145/3430199.3430229","url":null,"abstract":"Accurate and reliable localization is necessary for vehicle autonomous driving. Existing localization systems based on the GNSS cannot always provide lane-level accuracy. This paper proposes the method that improves vehicle localization by using road lanes recognized from a camera and a digital map. Iterative Closest Point (ICP) matching is performed for generated point clouds to minimize lateral error. The neural network is used for lane detection, detections are post-processed and fitted to the polynomial. Changes that allowed improving ICP matching are described. Finally, we perform an experiment with GPS RTK signal as ground truth and demonstrate that the proposed method has a position error of less than 0.5 m for vehicle localization.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124319968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A digital image watermarking algorithm based on balanced Multiwavelet transform and voting mechanism is proposed in this paper. The algorithm embeds the binary watermark image bits which have been pre-processed into low-pass sub-band coefficients in multiwavelet transform domain. According to the virtually identical quality of the energy of four low-pass subbands, the binary watermark image bits are embedded into four low-pass sub-bands coefficients four times respectively. Due to the different characteristics of each low-pass coefficients block, the largest singular value of the selected blocks is adaptively operated by different quantization step for embedding watermark information. Finally, the voting mechanism is introduced when the watermark extracting. Experimental results show that the watermarking algorithm not only has good invisibility, but also has robustness against some common image processing such as JPEG compression, noise addition, filtering, etc.
{"title":"An Image Watermarking Scheme Based on Voting Mechanism in Balanced Multiwavelet Domain","authors":"Shaobao Wu, Zhihua Wu, Guodong Wang, Dongsheng Shen","doi":"10.1145/3430199.3430240","DOIUrl":"https://doi.org/10.1145/3430199.3430240","url":null,"abstract":"A digital image watermarking algorithm based on balanced Multiwavelet transform and voting mechanism is proposed in this paper. The algorithm embeds the binary watermark image bits which have been pre-processed into low-pass sub-band coefficients in multiwavelet transform domain. According to the virtually identical quality of the energy of four low-pass subbands, the binary watermark image bits are embedded into four low-pass sub-bands coefficients four times respectively. Due to the different characteristics of each low-pass coefficients block, the largest singular value of the selected blocks is adaptively operated by different quantization step for embedding watermark information. Finally, the voting mechanism is introduced when the watermark extracting. Experimental results show that the watermarking algorithm not only has good invisibility, but also has robustness against some common image processing such as JPEG compression, noise addition, filtering, etc.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129019920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The need for Network Intrusion Detection systems has risen since usage of cloud technologies has become mainstream. With the ever growing network traffic, Network Intrusion Detection is a critical part of network security and a very efficient NIDS is a must, given new variety of attack arises frequently. These Intrusion Detection systems are built on either a pattern matching system or AI/ML based anomaly detection system. Pattern matching methods usually have a high False Positive Rates whereas the AI/ML based method, relies on finding metric/feature or correlation between set of metrics/features to predict the possibility of an attack. The most common of these is KNN, SVM etc., operate on a limited set of features and have less accuracy and still suffer from higher False Positive Rates. In this paper, we propose a deep learning model combining the distinct strengths of a Convolutional Neural Network and a Bi-directional LSTM to incorporate learning of spatial and temporal features of the data. For this paper, we use publicly available datasets NSL-KDD and UNSW-NB15 to train and test the model. The proposed model offers a high detection rate and comparatively lower False Positive Rate. The proposed model performs better than many state-of-the-art Network Intrusion Detection systems leveraging Machine Learning/Deep Learning models.
{"title":"Efficient Deep CNN-BiLSTM Model for Network Intrusion Detection","authors":"Jay Sinha, M. Manollas","doi":"10.1145/3430199.3430224","DOIUrl":"https://doi.org/10.1145/3430199.3430224","url":null,"abstract":"The need for Network Intrusion Detection systems has risen since usage of cloud technologies has become mainstream. With the ever growing network traffic, Network Intrusion Detection is a critical part of network security and a very efficient NIDS is a must, given new variety of attack arises frequently. These Intrusion Detection systems are built on either a pattern matching system or AI/ML based anomaly detection system. Pattern matching methods usually have a high False Positive Rates whereas the AI/ML based method, relies on finding metric/feature or correlation between set of metrics/features to predict the possibility of an attack. The most common of these is KNN, SVM etc., operate on a limited set of features and have less accuracy and still suffer from higher False Positive Rates. In this paper, we propose a deep learning model combining the distinct strengths of a Convolutional Neural Network and a Bi-directional LSTM to incorporate learning of spatial and temporal features of the data. For this paper, we use publicly available datasets NSL-KDD and UNSW-NB15 to train and test the model. The proposed model offers a high detection rate and comparatively lower False Positive Rate. The proposed model performs better than many state-of-the-art Network Intrusion Detection systems leveraging Machine Learning/Deep Learning models.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"68 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130054647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In spite of methods for object detection based on convolutional neural networks, there's a problem that the information of objects missing in the convolutional progress with an immeasurable proportion. The reason is that while the network downsample in order to further obtain the abstract features, a certain pixel point in the feature map corresponding to more original image area, so there're less content that can be referred to. To handle this problem, an improved object detection method based on YOLOv3 is demonstrated. Our approach is composed of three steps, initial detector, adaptive chip generator, secondary detector. Firstly, figuring out which chips are worth detecting in the image. Secondly, screening the best associations for reduce the number of duplicate detections from these chips. Finally, detection progress will run on each chip and summarize the output. Benefit from it, this method achieves a significant performance especially in medium and large size objects.
{"title":"An Improved Method of Object Detection Based on Chip","authors":"Ji-Xiang Wei, Tongwei Lu, Zhimeng Xin","doi":"10.1145/3430199.3430236","DOIUrl":"https://doi.org/10.1145/3430199.3430236","url":null,"abstract":"In spite of methods for object detection based on convolutional neural networks, there's a problem that the information of objects missing in the convolutional progress with an immeasurable proportion. The reason is that while the network downsample in order to further obtain the abstract features, a certain pixel point in the feature map corresponding to more original image area, so there're less content that can be referred to. To handle this problem, an improved object detection method based on YOLOv3 is demonstrated. Our approach is composed of three steps, initial detector, adaptive chip generator, secondary detector. Firstly, figuring out which chips are worth detecting in the image. Secondly, screening the best associations for reduce the number of duplicate detections from these chips. Finally, detection progress will run on each chip and summarize the output. Benefit from it, this method achieves a significant performance especially in medium and large size objects.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129147851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we prove mathematically that the geometric derivation of the fundamental matrix F of the two-view reconstruction problem is flawed. Although the fundamental matrix approach is quite classic, it is still taught in universities around the world. Thus, analyzing the derivation of F now is a non-trivial subject. The geometric derivation of E is based on the cross product of vectors in R3. The cross product (or vector product) of two vectors is x × y where x = ⟨x1, x2, x3⟩ and y = ⟨y1, y2, y3⟩ in R3. The relationship between the skew-matrix of a vector t in R3 and the cross product is [t]×y = t × y for any vector y in R3. In the derivation of the essential matrix we have E = [t]×R which is the result of replacing t × R by [t]×R, the cross product of a vector t and a 3×3 matrix R. This is an undefined operation and therefore the essential matrix derivation is flawed. The derivation of F, is based on the assertion that the set of all points in the first image and their corresponding points in the second image are protectively equivalent and therefore there exists a homography H&pgr; between the two images. An assertion that does not hold for 3D non-planar scenes.
{"title":"Experimental and Theoretical Scrutiny of the Geometric Derivation of the Fundamental Matrix","authors":"T. Basta","doi":"10.1145/3430199.3430227","DOIUrl":"https://doi.org/10.1145/3430199.3430227","url":null,"abstract":"In this paper, we prove mathematically that the geometric derivation of the fundamental matrix F of the two-view reconstruction problem is flawed. Although the fundamental matrix approach is quite classic, it is still taught in universities around the world. Thus, analyzing the derivation of F now is a non-trivial subject. The geometric derivation of E is based on the cross product of vectors in R3. The cross product (or vector product) of two vectors is x × y where x = ⟨x1, x2, x3⟩ and y = ⟨y1, y2, y3⟩ in R3. The relationship between the skew-matrix of a vector t in R3 and the cross product is [t]×y = t × y for any vector y in R3. In the derivation of the essential matrix we have E = [t]×R which is the result of replacing t × R by [t]×R, the cross product of a vector t and a 3×3 matrix R. This is an undefined operation and therefore the essential matrix derivation is flawed. The derivation of F, is based on the assertion that the set of all points in the first image and their corresponding points in the second image are protectively equivalent and therefore there exists a homography H&pgr; between the two images. An assertion that does not hold for 3D non-planar scenes.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126921033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiming at the problems of low accuracy, slow calculation speed, large storage space and difficult to detect multiple targets in the search of existing bayonet vehicles, a multi-target staged image retrieval method based on Faster R-CNN preprocessing was proposed. First, the selective search network is used to obtain the probability vectors in the picture; then, the image compact semantic hash code is used to perform fingerprint encoding to quickly compare and narrow the range to obtain a range candidate pool; finally, the image to be retrieved is compared to the image in the pool Quickly compare quantized hash matrices, and use voting to select the most similar images from the pool as the output. The experimental results show that the design can achieve end-to-end training. The average accuracy rate (0.829) and retrieval response time (0.698s) are significantly improved compared to the conventional hash-based retrieval method on the BIT-Vehicle dataset. This meets the era of big data Image retrieval needs.
{"title":"Image Retrieval Method of Bayonet Vehicle Based on the Improvement of Deep Learning Network","authors":"Zilong Wang, Ling Xiong, Yang Chen","doi":"10.1145/3430199.3430209","DOIUrl":"https://doi.org/10.1145/3430199.3430209","url":null,"abstract":"Aiming at the problems of low accuracy, slow calculation speed, large storage space and difficult to detect multiple targets in the search of existing bayonet vehicles, a multi-target staged image retrieval method based on Faster R-CNN preprocessing was proposed. First, the selective search network is used to obtain the probability vectors in the picture; then, the image compact semantic hash code is used to perform fingerprint encoding to quickly compare and narrow the range to obtain a range candidate pool; finally, the image to be retrieved is compared to the image in the pool Quickly compare quantized hash matrices, and use voting to select the most similar images from the pool as the output. The experimental results show that the design can achieve end-to-end training. The average accuracy rate (0.829) and retrieval response time (0.698s) are significantly improved compared to the conventional hash-based retrieval method on the BIT-Vehicle dataset. This meets the era of big data Image retrieval needs.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120938377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Lu, Shaonan Jin, Xun Wang, Jiao Yuan, Kun Fu, Ke Yang
In order to reduce the time and memory consumption of frequent itemsets mining in stream data, and weaken the impact of historical transactions on data patterns, this paper proposes a frequent itemsets mining algorithm SWFIUT-stream based on sliding decay time window. In this algorithm, the time attenuation factor is introduced to assign different weights to each window unit to weaken their influence on data mode. In order to realize the fast stream data mining processing, when mining the frequent itemsets, the two-dimensional table is used to scan and decompose the itemsets synchronously to mine all the frequent itemsets in the window, and the distributed parallel computing processing is carried out based on storm framework. Experimental data show that the algorithm consumes less time and consumes less memory space than conventional algorithms when mining frequent itemsets in stream data.
{"title":"A Mining Frequent Itemsets Algorithm in Stream Data Based on Sliding Time Decay Window","authors":"Xin Lu, Shaonan Jin, Xun Wang, Jiao Yuan, Kun Fu, Ke Yang","doi":"10.1145/3430199.3430226","DOIUrl":"https://doi.org/10.1145/3430199.3430226","url":null,"abstract":"In order to reduce the time and memory consumption of frequent itemsets mining in stream data, and weaken the impact of historical transactions on data patterns, this paper proposes a frequent itemsets mining algorithm SWFIUT-stream based on sliding decay time window. In this algorithm, the time attenuation factor is introduced to assign different weights to each window unit to weaken their influence on data mode. In order to realize the fast stream data mining processing, when mining the frequent itemsets, the two-dimensional table is used to scan and decompose the itemsets synchronously to mine all the frequent itemsets in the window, and the distributed parallel computing processing is carried out based on storm framework. Experimental data show that the algorithm consumes less time and consumes less memory space than conventional algorithms when mining frequent itemsets in stream data.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121834901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An improved OFDM time-frequency synchronization algorithm based on CAZAC (Constant Amplitude Zero Auto Correlation) sequence is proposed to solve the problem that the traditional algorithm is difficult to balance between timing synchronization accuracy and calculation complexity. The CAZAC sequence was introduced to improve the structure of the training sequence of conventional algorithms. The conjugate symmetry of the training sequence of the receiving end in the time domain was used for the timing estimation. Fractional frequency offset assessment Then the effect of the integral frequency offset on the CAZAC sequence was analyzed, and the integer frequency offset was completed by calculating the CAZAC sequence. The algorithm achieves higher timing synchronization accuracy with lower computational complexity, and the accuracy of frequency offset estimation is also higher than that of traditional algorithms. Theory and simulation prove that the proposed algorithm has good timing estimation and frequency offset estimation performance under the Multipath fading channel.
{"title":"An Improved OFDM Time-Frequency Synchronization Algorithm Based on CAZAC Sequence","authors":"Xinming Xie, Bowei Wang, Pengfei Han","doi":"10.1145/3430199.3430232","DOIUrl":"https://doi.org/10.1145/3430199.3430232","url":null,"abstract":"An improved OFDM time-frequency synchronization algorithm based on CAZAC (Constant Amplitude Zero Auto Correlation) sequence is proposed to solve the problem that the traditional algorithm is difficult to balance between timing synchronization accuracy and calculation complexity. The CAZAC sequence was introduced to improve the structure of the training sequence of conventional algorithms. The conjugate symmetry of the training sequence of the receiving end in the time domain was used for the timing estimation. Fractional frequency offset assessment Then the effect of the integral frequency offset on the CAZAC sequence was analyzed, and the integer frequency offset was completed by calculating the CAZAC sequence. The algorithm achieves higher timing synchronization accuracy with lower computational complexity, and the accuracy of frequency offset estimation is also higher than that of traditional algorithms. Theory and simulation prove that the proposed algorithm has good timing estimation and frequency offset estimation performance under the Multipath fading channel.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130049163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qian Wang, Tongxin Xue, Yi Wu, Fan Hu, Pengfei Han
Weakly supervised learning is of interest and research by many people due to the large savings in labeling costs. To solve the high cost of manual labeling in the research of aurora image detection, an Aurora multi-scale network for aurora image dataset is proposed based on weakly-supervised learning. Firstly, the feature learning mechanism of dynamic hierarchical mimicking is adopted to improve the classification performance of the convolutional neural network based on the aurora image. Then, the multi-scale constraint is imposed on the network through the multi-branch input and output of different sizes. The final output of the auroral image class activation maps with more ideal results, the critical structure detection of auroral images based on imagelevel annotation is realized. Experiments show that the algorithm in this paper can effectively improve the class activation maps results of the auroral image, and has an ideal detection effect on the vital structure of the auroral image.
{"title":"Detection of Key Structure of Auroral Images Based on Weakly Supervised Learning","authors":"Qian Wang, Tongxin Xue, Yi Wu, Fan Hu, Pengfei Han","doi":"10.1145/3430199.3430216","DOIUrl":"https://doi.org/10.1145/3430199.3430216","url":null,"abstract":"Weakly supervised learning is of interest and research by many people due to the large savings in labeling costs. To solve the high cost of manual labeling in the research of aurora image detection, an Aurora multi-scale network for aurora image dataset is proposed based on weakly-supervised learning. Firstly, the feature learning mechanism of dynamic hierarchical mimicking is adopted to improve the classification performance of the convolutional neural network based on the aurora image. Then, the multi-scale constraint is imposed on the network through the multi-branch input and output of different sizes. The final output of the auroral image class activation maps with more ideal results, the critical structure detection of auroral images based on imagelevel annotation is realized. Experiments show that the algorithm in this paper can effectively improve the class activation maps results of the auroral image, and has an ideal detection effect on the vital structure of the auroral image.","PeriodicalId":371055,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116109044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}