Biao Jin;Xiao Ma;Bojun Hu;Zhenkai Zhang;Zhuxian Lian;Biao Wang
{"title":"Gesture-mmWAVE: Compact and Accurate Millimeter-Wave Radar-Based Dynamic Gesture Recognition for Embedded Devices","authors":"Biao Jin;Xiao Ma;Bojun Hu;Zhenkai Zhang;Zhuxian Lian;Biao Wang","doi":"10.1109/THMS.2024.3385124","DOIUrl":null,"url":null,"abstract":"Dynamic gesture recognition using millimeter-wave radar is a promising contactless mode of human–computer interaction with wide-ranging applications in various fields, such as intelligent homes, automatic driving, and sign language translation. However, the existing models have too many parameters and are unsuitable for embedded devices. To address this issue, we propose a dynamic gesture recognition method (named “Gesture-mmWAVE”) using millimeter-wave radar based on the multilevel feature fusion (MLFF) and transformer model. We first arrange each frame of the original echo collected by the frequency-modulated continuously modulated millimeter-wave radar in the Chirps × Samples format. Then, we use a 2-D fast Fourier transform to obtain the range-time map and Doppler-time map of gestures while improving the echo signal-to-noise ratio by coherent accumulation. Furthermore, we build an MLFF-transformer network for dynamic gesture recognition. The MLFF-transformer network comprises an MLFF module and a transformer module. The MLFF module employs the residual strategies to fuse the shallow, middle, and deep features and reduce the parameter size of the model using depthwise-separable convolution. The transformer module captures the global features of dynamic gestures and focuses on essential features using the multihead attention mechanism. The experimental results demonstrate that our proposed model achieves an average recognition accuracy of 99.11% on a dataset with 10% random interference. The scale of the proposed model is only 0.42M, which is 25% of that of the MobileNet V3-samll model. Thus, this method has excellent potential for application in embedded devices due to its small parameter size and high recognition accuracy.","PeriodicalId":48916,"journal":{"name":"IEEE Transactions on Human-Machine Systems","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Human-Machine Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10509809/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Dynamic gesture recognition using millimeter-wave radar is a promising contactless mode of human–computer interaction with wide-ranging applications in various fields, such as intelligent homes, automatic driving, and sign language translation. However, the existing models have too many parameters and are unsuitable for embedded devices. To address this issue, we propose a dynamic gesture recognition method (named “Gesture-mmWAVE”) using millimeter-wave radar based on the multilevel feature fusion (MLFF) and transformer model. We first arrange each frame of the original echo collected by the frequency-modulated continuously modulated millimeter-wave radar in the Chirps × Samples format. Then, we use a 2-D fast Fourier transform to obtain the range-time map and Doppler-time map of gestures while improving the echo signal-to-noise ratio by coherent accumulation. Furthermore, we build an MLFF-transformer network for dynamic gesture recognition. The MLFF-transformer network comprises an MLFF module and a transformer module. The MLFF module employs the residual strategies to fuse the shallow, middle, and deep features and reduce the parameter size of the model using depthwise-separable convolution. The transformer module captures the global features of dynamic gestures and focuses on essential features using the multihead attention mechanism. The experimental results demonstrate that our proposed model achieves an average recognition accuracy of 99.11% on a dataset with 10% random interference. The scale of the proposed model is only 0.42M, which is 25% of that of the MobileNet V3-samll model. Thus, this method has excellent potential for application in embedded devices due to its small parameter size and high recognition accuracy.
期刊介绍:
The scope of the IEEE Transactions on Human-Machine Systems includes the fields of human machine systems. It covers human systems and human organizational interactions including cognitive ergonomics, system test and evaluation, and human information processing concerns in systems and organizations.