{"title":"AL-MobileNet:基于多模态数据的智能驾驶舱二维手势识别新模型","authors":"Bin Wang, Liwen Yu, Bo Zhang","doi":"10.1007/s10462-024-10930-z","DOIUrl":null,"url":null,"abstract":"<div><p>As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 10","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10930-z.pdf","citationCount":"0","resultStr":"{\"title\":\"AL-MobileNet: a novel model for 2D gesture recognition in intelligent cockpit based on multi-modal data\",\"authors\":\"Bin Wang, Liwen Yu, Bo Zhang\",\"doi\":\"10.1007/s10462-024-10930-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"57 10\",\"pages\":\"\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-024-10930-z.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-024-10930-z\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10930-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
AL-MobileNet: a novel model for 2D gesture recognition in intelligent cockpit based on multi-modal data
As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.