With the development of Industrial Internet of Things, the types and functions of components are increasing, the application environment is becoming more and more complex. Also, the quality management of components is becoming more and more important. In order to understand the knowledge related to component quality management more conveniently and build an intelligent system for component quality management, this paper proposes a method to construct component quality management knowledge graph based on BERT word embedding model and entity relationship joint extraction method based on annotation strategy. Combining entity extraction and relationship extraction parts into one not only reduces the consumption of computing resources, but also reduces the propagation of wrong entities. In this paper, the sequence to sequence model of Bert-BilSTm-CRF is adopted. Through the BERT word embedding layer, the context information can be better utilized and the accuracy of extraction can be improved. Experimental results show that compared with other classical deep learning term extraction models, this model has a significant improvement in accuracy, recall rate and F1 value.
{"title":"Knowledge graph construction of component quality management","authors":"Haiming Zhang, Xiaoming Fan, Jiaqi Zhang, Chengzhi Jiang, Jiang Li, Hantian Gu, Bo-wen Li, Hao Hu, Chengxi Liu","doi":"10.1117/12.2667430","DOIUrl":"https://doi.org/10.1117/12.2667430","url":null,"abstract":"With the development of Industrial Internet of Things, the types and functions of components are increasing, the application environment is becoming more and more complex. Also, the quality management of components is becoming more and more important. In order to understand the knowledge related to component quality management more conveniently and build an intelligent system for component quality management, this paper proposes a method to construct component quality management knowledge graph based on BERT word embedding model and entity relationship joint extraction method based on annotation strategy. Combining entity extraction and relationship extraction parts into one not only reduces the consumption of computing resources, but also reduces the propagation of wrong entities. In this paper, the sequence to sequence model of Bert-BilSTm-CRF is adopted. Through the BERT word embedding layer, the context information can be better utilized and the accuracy of extraction can be improved. Experimental results show that compared with other classical deep learning term extraction models, this model has a significant improvement in accuracy, recall rate and F1 value.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117347551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the frequent occurrence of global security problems, violent crowd behavior endangers public security seriously. Meanwhile, intelligent surveillance video technology can be applied for violent crowd behavior detection as more and more surveillance cameras are installed in public and sensitive areas. In this paper, we propose a novel mean kinetic violent flow (MKViF) algorithm for violent crowd behavior detection by extracting the kinetic energy feature of video flow. Specifically, A is firstly calculating the mean kinetic energy by streak flow of each corner in each frame. Then, we obtain a binary indicator of kinetic energy change by calculating the amplitude change between sequence frames. Finally, the MKViF vector for a sequence of frames is obtained by averaging these binary indicators of each pixel in all frames. Experimental results show that the proposed MKViF algorithm behaves better in classification performance and real-time processing performance (45 frames per second) than the existing algorithms.
{"title":"Detection of violent crowd behavior based on mean kinetic streak flow","authors":"Yin-Chang Zhou","doi":"10.1117/12.2667803","DOIUrl":"https://doi.org/10.1117/12.2667803","url":null,"abstract":"With the frequent occurrence of global security problems, violent crowd behavior endangers public security seriously. Meanwhile, intelligent surveillance video technology can be applied for violent crowd behavior detection as more and more surveillance cameras are installed in public and sensitive areas. In this paper, we propose a novel mean kinetic violent flow (MKViF) algorithm for violent crowd behavior detection by extracting the kinetic energy feature of video flow. Specifically, A is firstly calculating the mean kinetic energy by streak flow of each corner in each frame. Then, we obtain a binary indicator of kinetic energy change by calculating the amplitude change between sequence frames. Finally, the MKViF vector for a sequence of frames is obtained by averaging these binary indicators of each pixel in all frames. Experimental results show that the proposed MKViF algorithm behaves better in classification performance and real-time processing performance (45 frames per second) than the existing algorithms.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115249967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aiming at the problems that the method based on U-shaped network for medical image segmentation cannot capture the long-range dependencies and could lose some detail information, a multi-scale context-aware segmentation network for medical images is proposed. The model extracts the last three layer features of the encoder, and then introduces a global circular convolution transformer module to solve the problem of long-range dependencies capturing by modeling the global context information. Then, an attention guidance module is introduced to fuse features of different scales, so as to solve the problem of losing details while reducing the introduction of noise information in the low level features. The experimental performance on Synapse multi-organ segmentation datasets indicates that the model produces more precise segmentation results.
{"title":"Multi-scale context-aware segmentation network for medical images","authors":"Qing Li, Yuqing Zhu","doi":"10.1117/12.2667684","DOIUrl":"https://doi.org/10.1117/12.2667684","url":null,"abstract":"Aiming at the problems that the method based on U-shaped network for medical image segmentation cannot capture the long-range dependencies and could lose some detail information, a multi-scale context-aware segmentation network for medical images is proposed. The model extracts the last three layer features of the encoder, and then introduces a global circular convolution transformer module to solve the problem of long-range dependencies capturing by modeling the global context information. Then, an attention guidance module is introduced to fuse features of different scales, so as to solve the problem of losing details while reducing the introduction of noise information in the low level features. The experimental performance on Synapse multi-organ segmentation datasets indicates that the model produces more precise segmentation results.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125496690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Huadu, Luo Renze, Tang Xiang, Wu Yong, Li Yalong
There are many subjective influencing factors, poor recognition effect and low efficiency in manual evaluation of pipeline weld defects. An intelligent identification method of pipeline weld defects based on improved DenseNet network is proposed. This method firstly uses the form of multi-channel convolution of different scales to improve the DenseNet network, thereby improving the generalization ability of the network. Then, the feature extraction ability of the network is improved by stacking two convolutions of the same scale. Finally, an attention mechanism module is introduced into the dense connection block of the network to achieve the effect of improving beneficial features and suppressing useless features. The experimental results show that the method can achieve 92% accuracy in the identification of pipeline weld defects, which is about 13% higher than the original method, and has high efficiency, which can fully achieve the purpose of industrial application.
{"title":"Weld defect recognition method based on improved DenseNet","authors":"Li Huadu, Luo Renze, Tang Xiang, Wu Yong, Li Yalong","doi":"10.1117/12.2667731","DOIUrl":"https://doi.org/10.1117/12.2667731","url":null,"abstract":"There are many subjective influencing factors, poor recognition effect and low efficiency in manual evaluation of pipeline weld defects. An intelligent identification method of pipeline weld defects based on improved DenseNet network is proposed. This method firstly uses the form of multi-channel convolution of different scales to improve the DenseNet network, thereby improving the generalization ability of the network. Then, the feature extraction ability of the network is improved by stacking two convolutions of the same scale. Finally, an attention mechanism module is introduced into the dense connection block of the network to achieve the effect of improving beneficial features and suppressing useless features. The experimental results show that the method can achieve 92% accuracy in the identification of pipeline weld defects, which is about 13% higher than the original method, and has high efficiency, which can fully achieve the purpose of industrial application.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114582228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Xu, Lijun Wang, Jing Xu, Huan He, Jiaying Li, J. Liao
Entity extraction is an information extraction technique that aims to locate and classify named entities (e.g., organizations, locations, persons...), which is a very important and fundamental problem in natural language processing. On the research of entity extraction, numerous models ignore the learning of grammatical structure. Considering the shortcomings of previous models, this paper first proposes the PALC (POStag-Attention-LSTM-CRF) model, which adds POS (part of speech) features to entity extraction. Specially, PALC fuses POS features with other features through a multi-layer bidirectional LSTM network and attention mechanism to improve the effect of entity extraction. The experimental results show that the accuracy of the PALC model in this paper on the CONLL03 dataset can be 90.65%, on the CONLL03 dataset can be 84.86%, and on OntoNote 5.0 English dataset can be 86.99%.
{"title":"Entity extraction based on the parts of speech attention mechanism","authors":"J. Xu, Lijun Wang, Jing Xu, Huan He, Jiaying Li, J. Liao","doi":"10.1117/12.2667496","DOIUrl":"https://doi.org/10.1117/12.2667496","url":null,"abstract":"Entity extraction is an information extraction technique that aims to locate and classify named entities (e.g., organizations, locations, persons...), which is a very important and fundamental problem in natural language processing. On the research of entity extraction, numerous models ignore the learning of grammatical structure. Considering the shortcomings of previous models, this paper first proposes the PALC (POStag-Attention-LSTM-CRF) model, which adds POS (part of speech) features to entity extraction. Specially, PALC fuses POS features with other features through a multi-layer bidirectional LSTM network and attention mechanism to improve the effect of entity extraction. The experimental results show that the accuracy of the PALC model in this paper on the CONLL03 dataset can be 90.65%, on the CONLL03 dataset can be 84.86%, and on OntoNote 5.0 English dataset can be 86.99%.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128152907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qihang Zhao, Bin Zhou, Ben Wang, Jin Lu, Luxiao Zhu
With the development of satellite remote sensing technology, the quality and quantity of remote sensing images are constantly improved. Remote sensing feature classification is also playing an increasingly important role in urban planning, resource exploration and other fields. In the early stage of remote sensing feature classification, machine learning algorithms such as SVM and K-means are mainly used. Nowadays, with the expansion of deep learning, various kinds of research in the computer vision field emerge in an endless manner. Remote sensing images are also mostly classified by different neural networks. According to the characteristics and advantages of U-NET, channel attention mechanism, ResNet, large convolution kernel and structural reparameterization, this paper proposes a network structure called RA-UNET. This paper uses the remote sensing ground object classification dataset LoveDA to conduct experiments. The results show that the network classification effect of this paper is better, with mIoU reaching 59.4% and mPA reaching 72.6%. And use the network in this paper and the four mainstream neural networks of FCN, SegNet, PSPNet and UNet to conduct comparative experiments. The comparative experimental results show that the classification effect of the network in this paper is better than the above four mainstream neural networks.
{"title":"Research on remote sensing image classification based on RA-UNet","authors":"Qihang Zhao, Bin Zhou, Ben Wang, Jin Lu, Luxiao Zhu","doi":"10.1117/12.2667743","DOIUrl":"https://doi.org/10.1117/12.2667743","url":null,"abstract":"With the development of satellite remote sensing technology, the quality and quantity of remote sensing images are constantly improved. Remote sensing feature classification is also playing an increasingly important role in urban planning, resource exploration and other fields. In the early stage of remote sensing feature classification, machine learning algorithms such as SVM and K-means are mainly used. Nowadays, with the expansion of deep learning, various kinds of research in the computer vision field emerge in an endless manner. Remote sensing images are also mostly classified by different neural networks. According to the characteristics and advantages of U-NET, channel attention mechanism, ResNet, large convolution kernel and structural reparameterization, this paper proposes a network structure called RA-UNET. This paper uses the remote sensing ground object classification dataset LoveDA to conduct experiments. The results show that the network classification effect of this paper is better, with mIoU reaching 59.4% and mPA reaching 72.6%. And use the network in this paper and the four mainstream neural networks of FCN, SegNet, PSPNet and UNet to conduct comparative experiments. The comparative experimental results show that the classification effect of the network in this paper is better than the above four mainstream neural networks.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128396464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The target tracking and object tracking are defined in this paper and the difference between multi-target tracking and multi-object tracking is also be illustrated. The Bayes filter, Kalman filter, EKF, JPDA and Hungarian Algorithm are introduced with formulars and an example of moving camera to track the pedestrians used by Kalman filter are shown. In this example, the method which is based on Kalman filter that track pedestrians from a moving car which is installed with camera in the field of the multi-object tracking is analyzed with steps. The algorithm initializes boundary boxes to track the pedestrians and predict the pedestrians based on the previous position. Then, update the tracks and delete the useless tracks. The final step is creating the tracks. After displaying the result, the algorithm based on Kalman filter can successfully track the pedestrians with boundary boxes. However, when the camera is moving fast, some of the pedestrians cannot be recognized.
{"title":"Tracking pedestrians from a moving camera based on Kalman filter","authors":"Yingxu Wang","doi":"10.1117/12.2667813","DOIUrl":"https://doi.org/10.1117/12.2667813","url":null,"abstract":"The target tracking and object tracking are defined in this paper and the difference between multi-target tracking and multi-object tracking is also be illustrated. The Bayes filter, Kalman filter, EKF, JPDA and Hungarian Algorithm are introduced with formulars and an example of moving camera to track the pedestrians used by Kalman filter are shown. In this example, the method which is based on Kalman filter that track pedestrians from a moving car which is installed with camera in the field of the multi-object tracking is analyzed with steps. The algorithm initializes boundary boxes to track the pedestrians and predict the pedestrians based on the previous position. Then, update the tracks and delete the useless tracks. The final step is creating the tracks. After displaying the result, the algorithm based on Kalman filter can successfully track the pedestrians with boundary boxes. However, when the camera is moving fast, some of the pedestrians cannot be recognized.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127274906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, deep learning has been increasingly used to analyze financial data. For deep learning to predict the buy, sell, and hold points of stocks are prone to over-fitting, unreasonable feature extraction, and other issues. This paper builds a CBAM-CNN model based on Convolutional Neural Network (CNN) and Convolutional Block Attention Module (CBAM) to predict the buy, sell and hold points. In order to verify the applicability and superiority of the proposed method, the shares of Dao 30 and SHH 50 from stock listing to August 11, 2021 are selected, and the accuracy of the deep learning algorithm is evaluated using confusion matrix, weighted F1 score, and Kappa coefficient. The analysis results show that this algorithm has a high classification prediction accuracy because it can identify most of the buy and sell instances and therefore has a better effect. In addition, compared with CNN that do not use the CBAM attention mechanism, classification performance is significantly improved. The results from this analysis can help investors determine their better investment strategies.
{"title":"Stock market trend prediction using CBAM and CNN","authors":"Yong Wang, Zhiyu Xu, Yisheng Li","doi":"10.1117/12.2667378","DOIUrl":"https://doi.org/10.1117/12.2667378","url":null,"abstract":"In recent years, deep learning has been increasingly used to analyze financial data. For deep learning to predict the buy, sell, and hold points of stocks are prone to over-fitting, unreasonable feature extraction, and other issues. This paper builds a CBAM-CNN model based on Convolutional Neural Network (CNN) and Convolutional Block Attention Module (CBAM) to predict the buy, sell and hold points. In order to verify the applicability and superiority of the proposed method, the shares of Dao 30 and SHH 50 from stock listing to August 11, 2021 are selected, and the accuracy of the deep learning algorithm is evaluated using confusion matrix, weighted F1 score, and Kappa coefficient. The analysis results show that this algorithm has a high classification prediction accuracy because it can identify most of the buy and sell instances and therefore has a better effect. In addition, compared with CNN that do not use the CBAM attention mechanism, classification performance is significantly improved. The results from this analysis can help investors determine their better investment strategies.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wi-Fi is a popular wireless local area network technology, which has the characteristics of convenient networking and easy expansion. The existing data remote monitoring system mainly uses ZigBee technology to transmit monitoring data, and the response of the monitoring system takes a long time. Therefore, this paper proposes a remote monitoring system based on Wi-Fi technology. Firstly, a framework including intelligent perception layer, data communication layer and data integration layer is designed to realize the real-time data acquisition of the Internet of Things. Then, a data communication mechanism with high transmission rate is established by the Wi-Fi technology to realize the wireless transmission of monitoring data. Finally, the abnormal data judgment module is designed by using BP neural network to further analyze the real-time data of the Internet of Things. The abnormal monitoring results of the real-time data of the Internet of Things are obtained, and the monitoring results are presented through a visual interface. The system test results show that the total response time of the proposed system is 7440ms, which is reduced by 37. 2% and 42. 89% compared with the CAN-based and PLC-based systems. At the same time, the system realizes the intelligent analysis and efficient monitoring of Internet of Things data and promotes the development of data remote monitoring technology.
{"title":"Internet of Things real-time data remote monitoring system based on Wi-Fi technology","authors":"Feng Liu, Peiwei Wang, Peishun Ye","doi":"10.1117/12.2667926","DOIUrl":"https://doi.org/10.1117/12.2667926","url":null,"abstract":"Wi-Fi is a popular wireless local area network technology, which has the characteristics of convenient networking and easy expansion. The existing data remote monitoring system mainly uses ZigBee technology to transmit monitoring data, and the response of the monitoring system takes a long time. Therefore, this paper proposes a remote monitoring system based on Wi-Fi technology. Firstly, a framework including intelligent perception layer, data communication layer and data integration layer is designed to realize the real-time data acquisition of the Internet of Things. Then, a data communication mechanism with high transmission rate is established by the Wi-Fi technology to realize the wireless transmission of monitoring data. Finally, the abnormal data judgment module is designed by using BP neural network to further analyze the real-time data of the Internet of Things. The abnormal monitoring results of the real-time data of the Internet of Things are obtained, and the monitoring results are presented through a visual interface. The system test results show that the total response time of the proposed system is 7440ms, which is reduced by 37. 2% and 42. 89% compared with the CAN-based and PLC-based systems. At the same time, the system realizes the intelligent analysis and efficient monitoring of Internet of Things data and promotes the development of data remote monitoring technology.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127373351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing vehicle detection methods lack the fine vehicle detection algorithm. In order to improve the accuracy and applicability of anchor-based object detection models, a novel and practical vehicle Fine-grained identification network (EFDet-SPP) based on the EfficientDet is proposed. The improved network adds a Spatial Pyramid Pooling module (SPP) after the feature extraction network for concatenating features to enhance network learning capabilities, and multi-scale extraction of highly semantic features of images. Anchor-based predictions are converted to pixel-based predictions by combining FCOS's head network, eliminating the hyperparameters associated with anchor boxes. And with Mosaic, Copy-Paste data augmentation methods scale small object samples to achieve data sample balance. Experimental results show that the improved network has achieved 94.8% in the actual collected fine vehicle detection dataset, which is greatly improved compared with the EfficientDet network, and does not significantly increase the training parameters and calculation amount of the network.
{"title":"EFDet-SPP: efficient anchor-free network for fine vehicle detection","authors":"Yongsheng Xie, Ming Ye, Zhe Zhang, He Liu","doi":"10.1117/12.2667701","DOIUrl":"https://doi.org/10.1117/12.2667701","url":null,"abstract":"Existing vehicle detection methods lack the fine vehicle detection algorithm. In order to improve the accuracy and applicability of anchor-based object detection models, a novel and practical vehicle Fine-grained identification network (EFDet-SPP) based on the EfficientDet is proposed. The improved network adds a Spatial Pyramid Pooling module (SPP) after the feature extraction network for concatenating features to enhance network learning capabilities, and multi-scale extraction of highly semantic features of images. Anchor-based predictions are converted to pixel-based predictions by combining FCOS's head network, eliminating the hyperparameters associated with anchor boxes. And with Mosaic, Copy-Paste data augmentation methods scale small object samples to achieve data sample balance. Experimental results show that the improved network has achieved 94.8% in the actual collected fine vehicle detection dataset, which is greatly improved compared with the EfficientDet network, and does not significantly increase the training parameters and calculation amount of the network.","PeriodicalId":345723,"journal":{"name":"Fifth International Conference on Computer Information Science and Artificial Intelligence","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122909171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}