AL-MobileNet:基于多模态数据的智能驾驶舱二维手势识别新模型

IF 10.7 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2024-09-05 DOI:10.1007/s10462-024-10930-z
Bin Wang, Liwen Yu, Bo Zhang
{"title":"AL-MobileNet:基于多模态数据的智能驾驶舱二维手势识别新模型","authors":"Bin Wang,&nbsp;Liwen Yu,&nbsp;Bo Zhang","doi":"10.1007/s10462-024-10930-z","DOIUrl":null,"url":null,"abstract":"<div><p>As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 10","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10930-z.pdf","citationCount":"0","resultStr":"{\"title\":\"AL-MobileNet: a novel model for 2D gesture recognition in intelligent cockpit based on multi-modal data\",\"authors\":\"Bin Wang,&nbsp;Liwen Yu,&nbsp;Bo Zhang\",\"doi\":\"10.1007/s10462-024-10930-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"57 10\",\"pages\":\"\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-024-10930-z.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-024-10930-z\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10930-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

随着汽车智能化程度的提高,手势识别在人车交互中越来越受到关注。然而,现有的手势识别方法计算量大,在多模态传感器场景中表现不佳。本文提出了一种新颖的网络结构--AL-MobileNet(具有注意力和轻量级模块的移动网络),它可以快速、准确地估计 RGB 和红外图像中的二维手势。本文的创新点如下:首先,为了增强多模态数据,我们创建了一个基于真实二维手势的合成红外数据集,并采用了一种从粗到细的训练方法。其次,为了加快模型在边缘设备上的计算速度,我们引入了一个新的轻量级计算模块,称为 "分割通道注意块"(SCAB)。第三,为确保模型在大型数据集中保持准确性,我们在骨干网络中加入了辅助网络和角度加权损耗(AWL)。实验表明,AL-MobileNet 只需要 0.4 GFLOPs 的计算能力和 120 万个参数。这使得它比 MobileNet 快 1.5 倍,并能在边缘设备上快速执行。AL-MobileNet 在 Ambarella CV28 上的运行速度高达 28 FPS。在一般数据集和我们的数据集上,我们的算法平均 PCK0.2 得分为 0.95。这表明该算法可以快速生成准确的二维手势。该算法的演示可在 gesturebaolong 中查看。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AL-MobileNet: a novel model for 2D gesture recognition in intelligent cockpit based on multi-modal data

As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
期刊最新文献
Federated learning design and functional models: survey A systematic literature review of recent advances on context-aware recommender systems Escape: an optimization method based on crowd evacuation behaviors A multi-strategy boosted bald eagle search algorithm for global optimization and constrained engineering problems: case study on MLP classification problems Innovative solution suggestions for financing electric vehicle charging infrastructure investments with a novel artificial intelligence-based fuzzy decision-making modelling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1