AL-MobileNet: a novel model for 2D gesture recognition in intelligent cockpit based on multi-modal data

IF 10.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Review Pub Date : 2024-09-05 DOI:10.1007/s10462-024-10930-z

Bin Wang, Liwen Yu, Bo Zhang

{"title":"AL-MobileNet: a novel model for 2D gesture recognition in intelligent cockpit based on multi-modal data","authors":"Bin Wang, Liwen Yu, Bo Zhang","doi":"10.1007/s10462-024-10930-z","DOIUrl":null,"url":null,"abstract":"<div><p>As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 10","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10930-z.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10930-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As the degree of automotive intelligence increases, gesture recognition is gaining more attention in human-vehicle interaction. However, existing gesture recognition methods are computationally intensive and perform poorly in multi-modal sensor scenarios. This paper proposes a novel network structure, AL-MobileNet (MobileNet with Attention and Lightweight Modules), which can quickly and accurately estimate 2D gestures in RGB and infrared (IR) images. The innovations of this paper are as follows: Firstly, to enhance multi-modal data, we created a synthetic IR dataset based on real 2D gestures and employed a coarse-to-fine training approach. Secondly, to speed up the model's computation on edge devices, we introduced a new lightweight computational module called the Split Channel Attention Block (SCAB). Thirdly, to ensure the model maintains accuracy in large datasets, we incorporated auxiliary networks and Angle-Weighted Loss (AWL) into the backbone network. Experiments show that AL-MobileNet requires only 0.4 GFLOPs of computational power and 1.2 million parameters. This makes it 1.5 times faster than MobileNet and allows for quick execution on edge devices. AL-MobileNet achieved a running speed of up to 28 FPS on the Ambarella CV28. On both general datasets and our dataset, our algorithm achieved an average PCK0.2 score of 0.95. This indicates that the algorithm can quickly generate accurate 2D gestures. The demonstration of the algorithm can be reviewed in gesturebaolong.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AL-MobileNet：基于多模态数据的智能驾驶舱二维手势识别新模型

随着汽车智能化程度的提高，手势识别在人车交互中越来越受到关注。然而，现有的手势识别方法计算量大，在多模态传感器场景中表现不佳。本文提出了一种新颖的网络结构--AL-MobileNet（具有注意力和轻量级模块的移动网络），它可以快速、准确地估计 RGB 和红外图像中的二维手势。本文的创新点如下：首先，为了增强多模态数据，我们创建了一个基于真实二维手势的合成红外数据集，并采用了一种从粗到细的训练方法。其次，为了加快模型在边缘设备上的计算速度，我们引入了一个新的轻量级计算模块，称为 "分割通道注意块"（SCAB）。第三，为确保模型在大型数据集中保持准确性，我们在骨干网络中加入了辅助网络和角度加权损耗（AWL）。实验表明，AL-MobileNet 只需要 0.4 GFLOPs 的计算能力和 120 万个参数。这使得它比 MobileNet 快 1.5 倍，并能在边缘设备上快速执行。AL-MobileNet 在 Ambarella CV28 上的运行速度高达 28 FPS。在一般数据集和我们的数据集上，我们的算法平均 PCK0.2 得分为 0.95。这表明该算法可以快速生成准确的二维手势。该算法的演示可在 gesturebaolong 中查看。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.