With the advancement of artificial intelligence, smart home has attracted much attention from scholars. Human Activity Recognition (HAR) is a crucial foundation for various applications in smart home. In this paper, to improve the accuracy of HAR and promote the development of applications and services in smart home, we propose a Transformer-based approach that integrates multiple sensor sequence inputs for HAR. We integrate sequence features, collect contextual information, and employ Transformer to recognize various activities for the CASAS Aruba dataset that uses environmental sensors. The validation results on real-world dataset demonstrate its effectiveness compared to traditional machine learning and deep learning methods.
{"title":"Human Activity Recognition based on Transformer in Smart Home","authors":"Xinmei Huang, Shenmin Zhang","doi":"10.1145/3590003.3590100","DOIUrl":"https://doi.org/10.1145/3590003.3590100","url":null,"abstract":"With the advancement of artificial intelligence, smart home has attracted much attention from scholars. Human Activity Recognition (HAR) is a crucial foundation for various applications in smart home. In this paper, to improve the accuracy of HAR and promote the development of applications and services in smart home, we propose a Transformer-based approach that integrates multiple sensor sequence inputs for HAR. We integrate sequence features, collect contextual information, and employ Transformer to recognize various activities for the CASAS Aruba dataset that uses environmental sensors. The validation results on real-world dataset demonstrate its effectiveness compared to traditional machine learning and deep learning methods.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124618582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RGBT tracking incorporates thermal infrared data to achieve more accurate visual tracking. However, the efficiency of RGBT tracking may be diminished by some bottlenecks, such as thermal crossover, illumination variation and occlusion. To address the aforementioned problems, we propose a fully-convolutional Siamese-based Multi-modal Feature Fusion Network (SiamMFF) that integrates RGB and thermal features. In our work, visible and infrared images are initially processed by the Multi-Modal Feature Fusion framework (MFF) at the search and template sides, respectively. Then, the attribute-aware fusion module is introduced to conduct feature extraction and fusion for the major challenge attributes. In particular, we design a skip connections guidance module to prevent the propagation of noise and to enrich the feature information so that we can improve the tracker’s discriminative ability for modality-specific challenges. The proposed SiamMFF method has been evaluated in a great number of trials on two benchmark datasets GTOT and RGBT234, and the precision rate and success rate can reach 90.5%/73.6% and 81.2%/57.3%, respectively, demonstrating the superiority of our method over existing state-of-the-art methods.
{"title":"Multi-Modal Fusion Object Tracking Based on Fully Convolutional Siamese Network","authors":"Ke Qi, Liji Chen, Yicong Zhou, Yutao Qi","doi":"10.1145/3590003.3590084","DOIUrl":"https://doi.org/10.1145/3590003.3590084","url":null,"abstract":"RGBT tracking incorporates thermal infrared data to achieve more accurate visual tracking. However, the efficiency of RGBT tracking may be diminished by some bottlenecks, such as thermal crossover, illumination variation and occlusion. To address the aforementioned problems, we propose a fully-convolutional Siamese-based Multi-modal Feature Fusion Network (SiamMFF) that integrates RGB and thermal features. In our work, visible and infrared images are initially processed by the Multi-Modal Feature Fusion framework (MFF) at the search and template sides, respectively. Then, the attribute-aware fusion module is introduced to conduct feature extraction and fusion for the major challenge attributes. In particular, we design a skip connections guidance module to prevent the propagation of noise and to enrich the feature information so that we can improve the tracker’s discriminative ability for modality-specific challenges. The proposed SiamMFF method has been evaluated in a great number of trials on two benchmark datasets GTOT and RGBT234, and the precision rate and success rate can reach 90.5%/73.6% and 81.2%/57.3%, respectively, demonstrating the superiority of our method over existing state-of-the-art methods.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121945837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genetic Algorithm (GA) is to convert the problem-solving process into a process similar to the chromosomal changes in biological evolution using the mathematical method and computer simulation operation. This meta-heuristic algorithm has been successfully applied to system logic and non-system logic programming. In this study, we will explore the role of the Bipolar Genetic Algorithm (GA) in enhancing the learning process of the Hopfield neural network based on the previous study of PRO2SAT, and generate global solutions of the Probabilistic 2 Satisfiability model. The main purpose of the learning phase of the PRO2SAT model is to obtain consistent interpretations and calculate the optimal prominence weights, and the GA algorithm is introduced to improve the ability of PRO2SAT to obtain consistent interpretation using its selection, crossover, and mutation operators, and thus to improve the ability of the logic programming model to get a global solution. In the experimental phase, simulation data are used for result testing, and three performance metrics are used to test the consistency interpretation and global solution acquisition ability of the proposed model, including mean absolute error, logic formula satisfaction ratio, and global minimum ratio. Experimental results show that GA, as a meta-heuristic algorithm, has better searching ability for optimal solution and can effectively assist logic programming.
{"title":"Genetic algorithm in hopfield neural network with probabilistic 2 satisfiability","authors":"Ju Chen, Chengfeng Zheng, Yuan Gao, Yueling Guo","doi":"10.1145/3590003.3590024","DOIUrl":"https://doi.org/10.1145/3590003.3590024","url":null,"abstract":"Genetic Algorithm (GA) is to convert the problem-solving process into a process similar to the chromosomal changes in biological evolution using the mathematical method and computer simulation operation. This meta-heuristic algorithm has been successfully applied to system logic and non-system logic programming. In this study, we will explore the role of the Bipolar Genetic Algorithm (GA) in enhancing the learning process of the Hopfield neural network based on the previous study of PRO2SAT, and generate global solutions of the Probabilistic 2 Satisfiability model. The main purpose of the learning phase of the PRO2SAT model is to obtain consistent interpretations and calculate the optimal prominence weights, and the GA algorithm is introduced to improve the ability of PRO2SAT to obtain consistent interpretation using its selection, crossover, and mutation operators, and thus to improve the ability of the logic programming model to get a global solution. In the experimental phase, simulation data are used for result testing, and three performance metrics are used to test the consistency interpretation and global solution acquisition ability of the proposed model, including mean absolute error, logic formula satisfaction ratio, and global minimum ratio. Experimental results show that GA, as a meta-heuristic algorithm, has better searching ability for optimal solution and can effectively assist logic programming.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127231838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Distributed Denial of Service(DDoS) attack, the attacker uses a remotely controlled botnet to attack the target server at the same time to prevent legitimate users from obtaining information services. Previous studies focused on the detection of DDoS attacks on offline datasets, but ignored the detection of specific DDoS types, and some reports showed that the number of DDoS hybrid attacks was increasing significantly. In this paper, we propose an elastic detection mechanism(EDM), which can economize the server’s idle computing power. The framework integrates a variety of pre-trained lightweight CNN detect models, which are suitable for online rapid detection of DDoS hybrid attacks. We focus on evaluating the response accuracy and the detection speed of the EDM. The experimental results show that the model can achieve excellent hybrid attack detection performance, and meet the actual requirements of real-time detection.
DDoS (Distributed Denial of Service)攻击是指攻击者利用远程控制的僵尸网络,在攻击目标服务器的同时,阻止合法用户获取信息服务。以往的研究主要关注对离线数据集的DDoS攻击检测,而忽略了对具体DDoS类型的检测,一些报告显示,DDoS混合攻击的数量正在显著增加。在本文中,我们提出了一种弹性检测机制(EDM),可以节省服务器的空闲计算能力。该框架集成了多种预训练的轻量级CNN检测模型,适用于在线快速检测DDoS混合攻击。重点对电火花加工的响应精度和检测速度进行了评价。实验结果表明,该模型能够取得优异的混合攻击检测性能,满足实时检测的实际要求。
{"title":"Elastic Detection Mechanism Aimed at Hybrid DDoS Attack","authors":"Yubo Wang, Jinyu Wang","doi":"10.1145/3590003.3590031","DOIUrl":"https://doi.org/10.1145/3590003.3590031","url":null,"abstract":"In Distributed Denial of Service(DDoS) attack, the attacker uses a remotely controlled botnet to attack the target server at the same time to prevent legitimate users from obtaining information services. Previous studies focused on the detection of DDoS attacks on offline datasets, but ignored the detection of specific DDoS types, and some reports showed that the number of DDoS hybrid attacks was increasing significantly. In this paper, we propose an elastic detection mechanism(EDM), which can economize the server’s idle computing power. The framework integrates a variety of pre-trained lightweight CNN detect models, which are suitable for online rapid detection of DDoS hybrid attacks. We focus on evaluating the response accuracy and the detection speed of the EDM. The experimental results show that the model can achieve excellent hybrid attack detection performance, and meet the actual requirements of real-time detection.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130547319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
At present, deep learning method is widely used in the field of gray image colorization. Qinghai farmer painting has distinct national characteristics. The farmer painting has bright colors, high saturation, chaotic color distribution and low color contrast, so it is difficult to restore the image color with high fidelity by using the general deep learning image colorization method. The Pix2Pix generation adversarial network of grayscale image colorization method uses the Leaky ReLU function as the activation function. The proposal algorithm replaces the maximum pooling layer with the convolution layer to retain more image feature information and further to improve the color simulation effect. Meanwhile, in view of the lack of relevant Qinghai farmer painting data set, the data set of Qinghai farmer paintings is constructed to meet the needs of network training. The experimental results show that the improved method further improves the color effect and can generate high quality color images of Qinghai farmer paintings with more real color distribution.
{"title":"Research on Colorization of Qinghai Farmer Painting Image Based on Generative Adversarial Networks","authors":"Chunyan Peng, Xueya Zhao, Guangyou Xia","doi":"10.1145/3590003.3590094","DOIUrl":"https://doi.org/10.1145/3590003.3590094","url":null,"abstract":"At present, deep learning method is widely used in the field of gray image colorization. Qinghai farmer painting has distinct national characteristics. The farmer painting has bright colors, high saturation, chaotic color distribution and low color contrast, so it is difficult to restore the image color with high fidelity by using the general deep learning image colorization method. The Pix2Pix generation adversarial network of grayscale image colorization method uses the Leaky ReLU function as the activation function. The proposal algorithm replaces the maximum pooling layer with the convolution layer to retain more image feature information and further to improve the color simulation effect. Meanwhile, in view of the lack of relevant Qinghai farmer painting data set, the data set of Qinghai farmer paintings is constructed to meet the needs of network training. The experimental results show that the improved method further improves the color effect and can generate high quality color images of Qinghai farmer paintings with more real color distribution.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134371021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyu Du, Zhenhua Pan, Z. Xiong, Lei He, Haipeng Wang, Houda Zhu, Jiangbo Wang
Corrugated steel web is suitable for large-span extradosed cable-stayed bridge's design scheme. Live Load Structural Index (LLSI) is applied to evaluate the performance of the bridge with corrugated steel web. Parametric numeric models were built and investigated to explore the web height and weight's effect on the structural performance of an extradosed cable-stayed bridge. Machine learning model involving Particle Swarm Optimization BP neural network has been constructed to predict the correlation and validate the relationship between the structural variable and live load structural index.
{"title":"Performance Evaluation of an Extradosed Cable-Stayed Bridge with Corrugated Web based on Machine Learning Algorithms","authors":"Zeyu Du, Zhenhua Pan, Z. Xiong, Lei He, Haipeng Wang, Houda Zhu, Jiangbo Wang","doi":"10.1145/3590003.3590050","DOIUrl":"https://doi.org/10.1145/3590003.3590050","url":null,"abstract":"Corrugated steel web is suitable for large-span extradosed cable-stayed bridge's design scheme. Live Load Structural Index (LLSI) is applied to evaluate the performance of the bridge with corrugated steel web. Parametric numeric models were built and investigated to explore the web height and weight's effect on the structural performance of an extradosed cable-stayed bridge. Machine learning model involving Particle Swarm Optimization BP neural network has been constructed to predict the correlation and validate the relationship between the structural variable and live load structural index.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122188270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The traditional method of automatic lane detection is mostly based on Hough detection. However, this category of methods has low robustness and is vulnerable to interference. In order to improve the accuracy of lane detection, the presented paper compares and analyzes the end-to-end lane line detection network based on deep learning, including Unet-base and Deeplabv3+, in view of gradient explosion and slow running speed during model training, solutions are also given. Ordered test sets are used to speed up the training processing and validate the deep learning algorithm, in the case of different image resolutions, uses Unet-base and Deeplabv3+ to perform experiments respectively. Experiments show that under the same resolution, the Unet-base model with FCN network structure incorporating a better training strategy outperforms the Deeplabv3+ algorithm model that uses a classical ASSP module to solve the downsampling layer problem in terms of model generalization capability. And the MIOU of improved Unet-base is higher than Deeplabv3+. Therefore, compared to DeepLabv3+, the improved Unet-base model is more generalized.
{"title":"An Unmanned Lane Detection Algorithm Using Deep Learning and Ordered Test Sets Strategy","authors":"Shenwei Zhang, Xiaoyan Lin, Mingwei Zhang, Zhen Zhang, Yun Hou, Honglong Ning, Tian Qiu","doi":"10.1145/3590003.3590087","DOIUrl":"https://doi.org/10.1145/3590003.3590087","url":null,"abstract":"The traditional method of automatic lane detection is mostly based on Hough detection. However, this category of methods has low robustness and is vulnerable to interference. In order to improve the accuracy of lane detection, the presented paper compares and analyzes the end-to-end lane line detection network based on deep learning, including Unet-base and Deeplabv3+, in view of gradient explosion and slow running speed during model training, solutions are also given. Ordered test sets are used to speed up the training processing and validate the deep learning algorithm, in the case of different image resolutions, uses Unet-base and Deeplabv3+ to perform experiments respectively. Experiments show that under the same resolution, the Unet-base model with FCN network structure incorporating a better training strategy outperforms the Deeplabv3+ algorithm model that uses a classical ASSP module to solve the downsampling layer problem in terms of model generalization capability. And the MIOU of improved Unet-base is higher than Deeplabv3+. Therefore, compared to DeepLabv3+, the improved Unet-base model is more generalized.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115123263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianglong Guan, Li Ma, Yunyou Huang, Suqin Tang, Tinghui Li
The process of Alzheimer’s disease (AD) is irreversible, but reasonable medical intervention for preclinical AD can delay AD’s onset. Progressive mild cognitive impairment (pMCI) is the most critical stage for AD preclinical intervention. Therefore, accurate identification of pMCI will significantly improve patient benefits. Functional MRI is a neuroimaging modality that has been widely utilized to study brain activity related to AD. However, it is challenging to obtain functional MRI data, and a small amount of data will easily lead to the overfitting of the identification model. In addition, the current pMCI identification model lack interpretability leads to difficulty in acceptance by clinicians. In this work, we propose an interpretable hybrid model based on a brain network atlas to identify pMCI subjects. First, the hybrid model utilizes multi-layer perceptron to obtain categorical global features to help graph neural networks reduce overfitting. Second, the attention mechanism is introduced into the model to explain the recognition behavior of the model. The results show that our model outperforms the comparison models on multiple metrics.
{"title":"An Interpretable Brain Network Atlas-Based Hybrid Model for Mild Cognitive Impairment Progression Prediction","authors":"Xianglong Guan, Li Ma, Yunyou Huang, Suqin Tang, Tinghui Li","doi":"10.1145/3590003.3590081","DOIUrl":"https://doi.org/10.1145/3590003.3590081","url":null,"abstract":"The process of Alzheimer’s disease (AD) is irreversible, but reasonable medical intervention for preclinical AD can delay AD’s onset. Progressive mild cognitive impairment (pMCI) is the most critical stage for AD preclinical intervention. Therefore, accurate identification of pMCI will significantly improve patient benefits. Functional MRI is a neuroimaging modality that has been widely utilized to study brain activity related to AD. However, it is challenging to obtain functional MRI data, and a small amount of data will easily lead to the overfitting of the identification model. In addition, the current pMCI identification model lack interpretability leads to difficulty in acceptance by clinicians. In this work, we propose an interpretable hybrid model based on a brain network atlas to identify pMCI subjects. First, the hybrid model utilizes multi-layer perceptron to obtain categorical global features to help graph neural networks reduce overfitting. Second, the attention mechanism is introduced into the model to explain the recognition behavior of the model. The results show that our model outperforms the comparison models on multiple metrics.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"4577 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114071189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In view of the problem of haze weather on the visual effect of video image, which causes the picture distortion, image quality degradation and definition blur of video image, a defogging processing method of haze video image based on optical flow threshold is proposed so as to restore the real and natural color image. Firstly, extract the image of the t frame at time t, track the characteristics of the image at time t + 1 to time t + n, extract the image of the t+n frame, then calculate the optical flow values of the t frame and the t + n frame, make a difference between the obtained optical flow values to obtain the optical flow threshold, compare the obtained optical flow threshold with the given threshold, if the value is greater than or equal to the given threshold, take the optical flow threshold intermediate frame image, and the middle frame and t+n frame images are processed by Retinex algorithm, and this operation is performed iteratively. Finally, the processed single frame video sequence is merged into a whole and output. The experiment shows that the processing speed of the algorithm is 0.07, much lower than other processing methods, which verifies the effectiveness and innovativeness of the proposed algorithm.
{"title":"Haze video image Clarity Processing Based on Optical Flow Threshold","authors":"Ru Chen, Xijuan Wang","doi":"10.1145/3590003.3590075","DOIUrl":"https://doi.org/10.1145/3590003.3590075","url":null,"abstract":"In view of the problem of haze weather on the visual effect of video image, which causes the picture distortion, image quality degradation and definition blur of video image, a defogging processing method of haze video image based on optical flow threshold is proposed so as to restore the real and natural color image. Firstly, extract the image of the t frame at time t, track the characteristics of the image at time t + 1 to time t + n, extract the image of the t+n frame, then calculate the optical flow values of the t frame and the t + n frame, make a difference between the obtained optical flow values to obtain the optical flow threshold, compare the obtained optical flow threshold with the given threshold, if the value is greater than or equal to the given threshold, take the optical flow threshold intermediate frame image, and the middle frame and t+n frame images are processed by Retinex algorithm, and this operation is performed iteratively. Finally, the processed single frame video sequence is merged into a whole and output. The experiment shows that the processing speed of the algorithm is 0.07, much lower than other processing methods, which verifies the effectiveness and innovativeness of the proposed algorithm.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121196829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The vacuum glass tube is one of the most important materials in the physical industry, and the inspection rate of its production is crucial to the production of subsequent products. We propose a CBAM-based target detection method for YOLOv7 to detect defects in transparent glass tubes, which are not easily detectable due to their transparent walls. We replace all pooling layers in YOLOv7 with CBAM to enable it to better grasp target features. The experimental results show that the recall rate for defective product detection reaches 98.34% and the accuracy rate reaches 96.33% in the simulated industrial inspection environment. It can meet the accuracy requirements of detecting defects of transparent glass tubes in industrial sites.
{"title":"CBAM-based Method in YOLOv7 for Detecting Defective Vacuum Glass Tubes","authors":"Zeyu Sheng, Haiguang Chen, Zifeng Qi","doi":"10.1145/3590003.3590079","DOIUrl":"https://doi.org/10.1145/3590003.3590079","url":null,"abstract":"The vacuum glass tube is one of the most important materials in the physical industry, and the inspection rate of its production is crucial to the production of subsequent products. We propose a CBAM-based target detection method for YOLOv7 to detect defects in transparent glass tubes, which are not easily detectable due to their transparent walls. We replace all pooling layers in YOLOv7 with CBAM to enable it to better grasp target features. The experimental results show that the recall rate for defective product detection reaches 98.34% and the accuracy rate reaches 96.33% in the simulated industrial inspection environment. It can meet the accuracy requirements of detecting defects of transparent glass tubes in industrial sites.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116076588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}