基于TCN的按/持/放超声手势及低复杂度识别

2019 IEEE International Workshop on Signal Processing Systems (SiPS) Pub Date : 2019-10-01 DOI:10.1109/SiPS47522.2019.9020579

Emad A. Ibrahim, Min Li, J. P. D. Gyvez

{"title":"基于TCN的按/持/放超声手势及低复杂度识别","authors":"Emad A. Ibrahim, Min Li, J. P. D. Gyvez","doi":"10.1109/SiPS47522.2019.9020579","DOIUrl":null,"url":null,"abstract":"Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\\sim 5\\mathrm{K}$ parameters and as low as $\\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"PRESS/HOLD/RELEASE Ultrasonic Gestures and Low Complexity Recognition Based on TCN\",\"authors\":\"Emad A. Ibrahim, Min Li, J. P. D. Gyvez\",\"doi\":\"10.1109/SiPS47522.2019.9020579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\\\\sim 5\\\\mathrm{K}$ parameters and as low as $\\\\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.\",\"PeriodicalId\":256971,\"journal\":{\"name\":\"2019 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SiPS47522.2019.9020579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS47522.2019.9020579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

针对基于超声波的手势识别，本文提出了一种新的通用PRESS/HOLD/RELEASE方法，该方法利用了智能设备(如手机和物联网节点)上执行的手势的多样性。新的手势由交错的PRESS/HOLD/RELEASE模式生成;缩写为P/H/R，手势就像在几个麦克风之间扫。P/H/R模式是由一只手构建的，因为它接近麦克风的顶部，以产生一个虚拟的新闻。之后，手静止一段未定义的时间以产生一个虚拟保持，最后离开以产生一个虚拟释放。同样的手可以扫到第二个麦克风，并执行另一个P/H/R。交错的P/H/R模式增加了所执行手势的数量。假设车载扬声器传输超声波信号，检测是通过一只手在接近和离开麦克风顶部时产生的多普勒频移读数来执行的。多普勒频移读数呈现在一系列下混合超声频谱图帧中。我们训练了一个时间卷积网络(TCN)来对不同环境噪声下的P/H/R模式进行分类。实验结果表明，在不同噪声条件下，这种麦克风顶部的P/H/R模式可以达到96.6%的精度。一组基于P/H/R的手势已经在商用现货(COTS)三星Galaxy S7 Edge上进行了测试。不同的P/H/R交错手势(如扫描，长敲击等)使用两个麦克风和单个扬声器设计，同时使用低至$ $ sim 5\ maththrm {K}$参数和低至$ $ sim 0.15$百万次运算(MOPs)的计算能力。P/H/R交错手势是直观的，因此很容易被最终用户学习。这为智能手机和智能扬声器的大规模生产铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PRESS/HOLD/RELEASE Ultrasonic Gestures and Low Complexity Recognition Based on TCN

Targeting ultrasound-based gesture recognition, this paper proposes a new universal PRESS/HOLD/RELEASE approach that leverages the diversity of gestures performed on smart devices such as mobile phones and IoT nodes. The new set of gestures are generated by interleaving PRESS/HOLD/RELEASE patterns; abbreviated as P/H/R, with gestures like sweeps between a number of microphones. P/H/R patterns are constructed by a hand as it approaches a top of a microphone to generate a virtual Press. After that, the hand settles for an undefined period of time to generate a virtual Hold and finally departs to generate a virtual Release. The same hand can sweep to a 2nd microphone and perform another P/H/R. Interleaving the P/H/R patterns expands the number of performed gestures. Assuming an on-board speaker transmitting ultrasonic signals, the detection is performed on Doppler shift readings generated by a hand as it approaches and departs a top of a microphone. The Doppler shift readings are presented in a sequence of down-mixed ultrasonic spectrogram frames. We train a Temporal Convolutional Network (TCN) to classify the P/H/R patterns under different environmental noises. Our experimental results show that such P/H/R patterns at a top of a microphone can be achieved with 96.6% accuracy under different noise conditions. A group of P/H/R based gestures has been tested on commercially off-the-shelf (COTS) Samsung Galaxy S7 Edge. Different P/H/R interleaved gestures (such as sweeps, long taps, etc.) are designed using two microphones and a single speaker while using as low as $\sim 5\mathrm{K}$ parameters and as low as $\sim 0.15$ Million operations (MOPs) in compute power per inference. The P/H/R interleaved set of gestures are intuitive and hence are easy to learn by end users. This paves its way to be deployed by smartphones and smart speakers for mass production.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Workshop on Signal Processing Systems (SiPS)

自引率

0.00%

发文量