嵌入式设备上手势识别的深度神经网络基准测试*

2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Pub Date : 2022-08-29 DOI:10.1109/RO-MAN53752.2022.9900705

Stefano Bini, Antonio Greco, Alessia Saggese, M. Vento

{"title":"嵌入式设备上手势识别的深度神经网络基准测试*","authors":"Stefano Bini, Antonio Greco, Alessia Saggese, M. Vento","doi":"10.1109/RO-MAN53752.2022.9900705","DOIUrl":null,"url":null,"abstract":"The gesture is one of the most used forms of communication between humans; in recent years, given the new trend of factories to be adapted to Industry 4.0 paradigm, the scientific community has shown a growing interest towards the design of Gesture Recognition (GR) algorithms for Human-Robot Interaction (HRI) applications. Within this context, the GR algorithm needs to work in real time and over embedded platforms, with limited resources. Anyway, when looking at the available scientific literature, the aim of the different proposed neural networks (i.e. 2D and 3D) and of the different modalities used for feeding the network (i.e. RGB, RGB-D, optical flow) is typically the optimization of the accuracy, without strongly paying attention to the feasibility over low power hardware devices. Anyway, the analysis related to the trade-off between accuracy and computational burden (for both networks and modalities) becomes important so as to allow GR algorithms to work in industrial robotics applications. In this paper, we perform a wide benchmarking focusing not only on the accuracy but also on the computational burden, involving two different architectures (2D and 3D), with two different backbones (MobileNet, ResNeXt) and four types of input modalities (RGB, Depth, Optical Flow, Motion History Image) and their combinations.","PeriodicalId":250997,"journal":{"name":"2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Benchmarking deep neural networks for gesture recognition on embedded devices *\",\"authors\":\"Stefano Bini, Antonio Greco, Alessia Saggese, M. Vento\",\"doi\":\"10.1109/RO-MAN53752.2022.9900705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The gesture is one of the most used forms of communication between humans; in recent years, given the new trend of factories to be adapted to Industry 4.0 paradigm, the scientific community has shown a growing interest towards the design of Gesture Recognition (GR) algorithms for Human-Robot Interaction (HRI) applications. Within this context, the GR algorithm needs to work in real time and over embedded platforms, with limited resources. Anyway, when looking at the available scientific literature, the aim of the different proposed neural networks (i.e. 2D and 3D) and of the different modalities used for feeding the network (i.e. RGB, RGB-D, optical flow) is typically the optimization of the accuracy, without strongly paying attention to the feasibility over low power hardware devices. Anyway, the analysis related to the trade-off between accuracy and computational burden (for both networks and modalities) becomes important so as to allow GR algorithms to work in industrial robotics applications. In this paper, we perform a wide benchmarking focusing not only on the accuracy but also on the computational burden, involving two different architectures (2D and 3D), with two different backbones (MobileNet, ResNeXt) and four types of input modalities (RGB, Depth, Optical Flow, Motion History Image) and their combinations.\",\"PeriodicalId\":250997,\"journal\":{\"name\":\"2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RO-MAN53752.2022.9900705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN53752.2022.9900705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

手势是人类之间最常用的交流方式之一;近年来，鉴于工厂适应工业4.0范式的新趋势，科学界对人机交互(HRI)应用的手势识别(GR)算法的设计表现出越来越大的兴趣。在这种情况下，GR算法需要在资源有限的嵌入式平台上实时工作。无论如何，当查看现有的科学文献时，不同提出的神经网络(即2D和3D)和用于馈电网络的不同模式(即RGB, RGB- d，光流)的目的通常是优化精度，而不是强烈关注低功耗硬件设备的可行性。无论如何，与精度和计算负担(对于网络和模式)之间的权衡相关的分析变得重要，以便允许GR算法在工业机器人应用中工作。在本文中，我们进行了广泛的基准测试，不仅关注准确性，还关注计算负担，涉及两种不同的架构(2D和3D)，两种不同的骨干(MobileNet, ResNeXt)和四种类型的输入模式(RGB, Depth，光流，运动历史图像)及其组合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Benchmarking deep neural networks for gesture recognition on embedded devices *

The gesture is one of the most used forms of communication between humans; in recent years, given the new trend of factories to be adapted to Industry 4.0 paradigm, the scientific community has shown a growing interest towards the design of Gesture Recognition (GR) algorithms for Human-Robot Interaction (HRI) applications. Within this context, the GR algorithm needs to work in real time and over embedded platforms, with limited resources. Anyway, when looking at the available scientific literature, the aim of the different proposed neural networks (i.e. 2D and 3D) and of the different modalities used for feeding the network (i.e. RGB, RGB-D, optical flow) is typically the optimization of the accuracy, without strongly paying attention to the feasibility over low power hardware devices. Anyway, the analysis related to the trade-off between accuracy and computational burden (for both networks and modalities) becomes important so as to allow GR algorithms to work in industrial robotics applications. In this paper, we perform a wide benchmarking focusing not only on the accuracy but also on the computational burden, involving two different architectures (2D and 3D), with two different backbones (MobileNet, ResNeXt) and four types of input modalities (RGB, Depth, Optical Flow, Motion History Image) and their combinations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

自引率

0.00%

发文量