Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.最新文献_第8页

Geryon: Edge Assisted Real-time and Robust Object Detection on Drones via mmWave Radar and Camera Fusion 边缘辅助实时和鲁棒目标检测无人机通过毫米波雷达和相机融合

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3550298

Kaikai Deng, Dong Zhao, Qiaoyue Han, Shuyue Wang, Zihan Zhang, Anfu Zhou, Huadong Ma

Vision-based drone-view object detection suffers from severe performance degradation under adverse conditions (e.g., foggy weather, poor illumination). To remedy this, leveraging complementary mmWave radar has become a trend. However, existing fusion approaches seldom apply to drones due to i) the aggravated sparsity and noise of point clouds from low-cost commodity radars, and ii) explosive sensing data and intensive computations leading to high latency. To address these issues, we design Geryon , an edge assisted object detection system on drones, which utilizes a suit of approaches to fully exploit the complementary advantages of camera and mmWave radar on three levels: (i) a novel multi-frame compositing approach utilizes camera to assist radar to address the aggravated sparsity and noise of radar point clouds; (ii) a saliency area extraction and encoding approach utilizes radar to assist camera to reduce the bandwidth consumption and offloading latency; (iii) a parallel transmission and inference approach with a lightweight box enhancement scheme further reduces the offloading latency while ensuring the edge-side accuracy-latency trade-off by the parallelism and better camera-radar fusion. We implement and evaluate Geryon with four datasets we collect under foggy/rainy/snowy weather and poor illumination conditions, demonstrating its great advantages over other state-of-the-art approaches in terms of both accuracy and latency. CCS Concepts:

基于视觉的无人机视角目标检测在不利条件下(例如，雾天，光照不足)遭受严重的性能下降。为了解决这个问题，利用互补毫米波雷达已成为一种趋势。然而，现有的融合方法很少适用于无人机，因为1)来自低成本商用雷达的点云的稀疏性和噪声加剧，2)爆炸传感数据和密集的计算导致高延迟。为了解决这些问题，我们设计了无人机边缘辅助目标检测系统Geryon，该系统利用一系列方法在三个层面上充分利用相机和毫米波雷达的互补优势:(i)一种新颖的多帧合成方法利用相机辅助雷达解决雷达点云的稀疏性和噪声加剧;(ii)显著区域提取和编码方法利用雷达辅助相机，以减少带宽消耗和卸载延迟;(iii)采用轻量级盒增强方案的并行传输和推理方法进一步减少卸载延迟，同时通过并行性和更好的相机-雷达融合确保边缘精度-延迟权衡。我们使用在雾/雨/雪天气和恶劣光照条件下收集的四个数据集来实施和评估Geryon，证明其在准确性和延迟方面优于其他最先进的方法。CCS的概念:

{"title":"Geryon: Edge Assisted Real-time and Robust Object Detection on Drones via mmWave Radar and Camera Fusion","authors":"Kaikai Deng, Dong Zhao, Qiaoyue Han, Shuyue Wang, Zihan Zhang, Anfu Zhou, Huadong Ma","doi":"10.1145/3550298","DOIUrl":"https://doi.org/10.1145/3550298","url":null,"abstract":"Vision-based drone-view object detection suffers from severe performance degradation under adverse conditions (e.g., foggy weather, poor illumination). To remedy this, leveraging complementary mmWave radar has become a trend. However, existing fusion approaches seldom apply to drones due to i) the aggravated sparsity and noise of point clouds from low-cost commodity radars, and ii) explosive sensing data and intensive computations leading to high latency. To address these issues, we design Geryon , an edge assisted object detection system on drones, which utilizes a suit of approaches to fully exploit the complementary advantages of camera and mmWave radar on three levels: (i) a novel multi-frame compositing approach utilizes camera to assist radar to address the aggravated sparsity and noise of radar point clouds; (ii) a saliency area extraction and encoding approach utilizes radar to assist camera to reduce the bandwidth consumption and offloading latency; (iii) a parallel transmission and inference approach with a lightweight box enhancement scheme further reduces the offloading latency while ensuring the edge-side accuracy-latency trade-off by the parallelism and better camera-radar fusion. We implement and evaluate Geryon with four datasets we collect under foggy/rainy/snowy weather and poor illumination conditions, demonstrating its great advantages over other state-of-the-art approaches in terms of both accuracy and latency. CCS Concepts:","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"70 1","pages":"109:1-109:27"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72733418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

ClenchClick: Hands-Free Target Selection Method Leveraging Teeth-Clench for Augmented Reality ClenchClick:利用咬牙增强现实的免提目标选择方法

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3550327

Xiyuan Shen, Yukang Yan, Chun Yu, Yuanchun Shi

We propose to explore teeth-clenching-based target selection in Augmented Reality (AR), as the subtlety in the interaction can be beneficial to applications occupying the user’s hand or that are sensitive to social norms. To support the investigation, we implemented an EMG-based teeth-clenching detection system (ClenchClick), where we adopted customized thresholds for different users. We first explored and compared the potential interaction design leveraging head movements and teeth clenching in combination. We finalized the interaction to take the form of a Point-and-Click manner with clenches as the confirmation mechanism. We evaluated the taskload and performance of ClenchClick by comparing it with two baseline methods in target selection tasks. Results showed that ClenchClick outperformed hand gestures in workload, physical load, accuracy and speed, and outperformed dwell in work load and temporal load. Lastly, through user studies, we demonstrated the advantage of ClenchClick in real-world tasks, including efficient and accurate hands-free target selection, natural and unobtrusive interaction in public, and robust head gesture input. investigated the interaction design, user experience in target selection tasks, and user performance in real-world tasks in a series of user studies. In our first user study, we explored nine potential designs and compared the three most promising designs (ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge) with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline in target selection tasks. ClenchClick had the best overall user experience with the lowest workload. It outperformed Hand Gesture in both physical and temporal load, and outperformed Dwell in temporal and mental load. In the second study, we evaluated the performance of ClenchClick with two detection methods (General and Personalized), in comparison with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline. Results showed that ClenchClick outperformed Hand Gesture in accuracy (98.9% v.s. 89.4%), and was comparable with Dwell in accuracy and efficiency. We further investigated users’ behavioral characteristics by analyzing their cursor trajectories in the tasks, which showed that ClenchClick was a smoother target selection method. It was more psychologically friendly and occupied less of the user’s attention. Finally, we conducted user studies in three real-world tasks which supported hands-free, social-friendly, and head gesture interaction. Results revealed that ClenchClick is an efficient and accurate target selection method when both hands are occupied. It is social-friendly and satisfying when performing in public, and can serve as activation to head gestures which significantly alleviates false positive issues.

我们建议在增强现实(AR)中探索基于咬牙的目标选择，因为交互中的微妙之处可能有利于占用用户的手或对社会规范敏感的应用程序。为了支持调查，我们实现了一个基于肌电图的咬牙检测系统(ClenchClick)，我们为不同的用户采用了定制的阈值。我们首先探索并比较了头部运动和咬牙结合的潜在交互设计。我们最终确定了互动的形式，以点击的方式，握紧作为确认机制。通过与两种基线方法在目标选择任务中的比较，我们评估了ClenchClick的任务负载和性能。结果表明，ClenchClick在工作量、物理负荷、准确性和速度上优于手势，在工作负荷和时间负荷上优于驻留。最后，通过用户研究，我们展示了ClenchClick在现实世界任务中的优势，包括高效准确的免提目标选择，自然而不引人注目的公共互动，以及强大的头部手势输入。在一系列的用户研究中，研究了交互设计、目标选择任务中的用户体验和现实任务中的用户表现。在我们的第一个用户研究中，我们探索了九种潜在的设计，并将三种最有前途的设计(ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge)与基于手的(手势)和无手的(Dwell)基线在目标选择任务中进行比较。ClenchClick具有最佳的总体用户体验和最低的工作负载。它在身体和时间负荷上都优于手势，在时间和精神负荷上优于Dwell。在第二项研究中，我们用两种检测方法(通用和个性化)评估了ClenchClick的性能，并与基于手的(手势)和免提的(Dwell)基线进行了比较。结果表明，ClenchClick在准确率上优于Hand Gesture (98.9% vs . 89.4%)，在准确率和效率上与Dwell相当。通过分析用户在任务中的光标轨迹，我们进一步研究了用户的行为特征，结果表明ClenchClick是一种更平滑的目标选择方法。它在心理上更友好，占用用户的注意力更少。最后，我们在三个支持免提、社交友好和头部手势交互的现实世界任务中进行了用户研究。结果表明，当双手被占用时，ClenchClick是一种高效、准确的目标选择方法。在公共场合表演时，它是社交友好的，令人满意的，并且可以作为头部动作的激活，显着减轻假阳性问题。

{"title":"ClenchClick: Hands-Free Target Selection Method Leveraging Teeth-Clench for Augmented Reality","authors":"Xiyuan Shen, Yukang Yan, Chun Yu, Yuanchun Shi","doi":"10.1145/3550327","DOIUrl":"https://doi.org/10.1145/3550327","url":null,"abstract":"We propose to explore teeth-clenching-based target selection in Augmented Reality (AR), as the subtlety in the interaction can be beneficial to applications occupying the user’s hand or that are sensitive to social norms. To support the investigation, we implemented an EMG-based teeth-clenching detection system (ClenchClick), where we adopted customized thresholds for different users. We first explored and compared the potential interaction design leveraging head movements and teeth clenching in combination. We finalized the interaction to take the form of a Point-and-Click manner with clenches as the confirmation mechanism. We evaluated the taskload and performance of ClenchClick by comparing it with two baseline methods in target selection tasks. Results showed that ClenchClick outperformed hand gestures in workload, physical load, accuracy and speed, and outperformed dwell in work load and temporal load. Lastly, through user studies, we demonstrated the advantage of ClenchClick in real-world tasks, including efficient and accurate hands-free target selection, natural and unobtrusive interaction in public, and robust head gesture input. investigated the interaction design, user experience in target selection tasks, and user performance in real-world tasks in a series of user studies. In our first user study, we explored nine potential designs and compared the three most promising designs (ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge) with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline in target selection tasks. ClenchClick had the best overall user experience with the lowest workload. It outperformed Hand Gesture in both physical and temporal load, and outperformed Dwell in temporal and mental load. In the second study, we evaluated the performance of ClenchClick with two detection methods (General and Personalized), in comparison with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline. Results showed that ClenchClick outperformed Hand Gesture in accuracy (98.9% v.s. 89.4%), and was comparable with Dwell in accuracy and efficiency. We further investigated users’ behavioral characteristics by analyzing their cursor trajectories in the tasks, which showed that ClenchClick was a smoother target selection method. It was more psychologically friendly and occupied less of the user’s attention. Finally, we conducted user studies in three real-world tasks which supported hands-free, social-friendly, and head gesture interaction. Results revealed that ClenchClick is an efficient and accurate target selection method when both hands are occupied. It is social-friendly and satisfying when performing in public, and can serve as activation to head gestures which significantly alleviates false positive issues.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"27 1","pages":"139:1-139:26"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73774807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

MobiVQA: Efficient On-Device Visual Question Answering MobiVQA:高效的设备上可视化问答

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3534619

Qingqing Cao

Visual Question Answering (VQA) is a relatively new task where a user can ask a natural question about an image and obtain an answer. VQA is useful for many applications and is widely popular for users with visual impairments. Our goal is to design a VQA application that works efficiently on mobile devices without requiring cloud support. Such a system will allow users to ask visual questions privately, without having to send their questions to the cloud, while also reduce cloud communication costs. However, existing VQA applications use deep learning models that significantly improve accuracy, but is computationally heavy. Unfortunately, existing techniques that optimize deep learning for mobile devices cannot be applied for VQA because the VQA task is multi-modal—it requires both processing vision and text data. Existing mobile optimizations that work for vision-only or text-only neural networks cannot be applied here because of the dependencies between the two modes. Instead, we design MobiVQA, a set of optimizations that leverage the multi-modal nature of VQA. We show using extensive evaluation on two VQA testbeds and two mobile platforms, that MobiVQA significantly improves latency and energy with minimal accuracy loss compared to state-of-the-art VQA models. For instance, MobiVQA can answer a visual question in 163 milliseconds on the phone, compared to over 20-second latency incurred by the most accurate state-of-the-art model, while incurring less than 1 point reduction in accuracy.

视觉问答(VQA)是一项相对较新的任务，用户可以就图像提出一个自然的问题并获得答案。VQA对许多应用程序都很有用，并且在有视觉障碍的用户中广受欢迎。我们的目标是设计一个在移动设备上高效工作的VQA应用程序，而不需要云支持。这样的系统将允许用户私下提出可视化问题，而不必将他们的问题发送到云端，同时也降低了云通信成本。然而，现有的VQA应用程序使用深度学习模型，可以显着提高准确性，但计算量很大。不幸的是，现有的为移动设备优化深度学习的技术不能应用于VQA，因为VQA任务是多模态的——它需要同时处理视觉和文本数据。现有的仅用于视觉或纯文本神经网络的移动优化无法应用于此，因为这两种模式之间存在依赖关系。相反，我们设计了MobiVQA，这是一组利用VQA的多模态特性的优化。我们在两个VQA测试平台和两个移动平台上进行了广泛的评估，结果表明，与最先进的VQA模型相比，MobiVQA显著改善了延迟和能量，并且精度损失最小。例如，MobiVQA可以在163毫秒的时间内在手机上回答一个视觉问题，而最准确的最先进的模型需要超过20秒的延迟，而准确性降低不到1分。

{"title":"MobiVQA: Efficient On-Device Visual Question Answering","authors":"Qingqing Cao","doi":"10.1145/3534619","DOIUrl":"https://doi.org/10.1145/3534619","url":null,"abstract":"Visual Question Answering (VQA) is a relatively new task where a user can ask a natural question about an image and obtain an answer. VQA is useful for many applications and is widely popular for users with visual impairments. Our goal is to design a VQA application that works efficiently on mobile devices without requiring cloud support. Such a system will allow users to ask visual questions privately, without having to send their questions to the cloud, while also reduce cloud communication costs. However, existing VQA applications use deep learning models that significantly improve accuracy, but is computationally heavy. Unfortunately, existing techniques that optimize deep learning for mobile devices cannot be applied for VQA because the VQA task is multi-modal—it requires both processing vision and text data. Existing mobile optimizations that work for vision-only or text-only neural networks cannot be applied here because of the dependencies between the two modes. Instead, we design MobiVQA, a set of optimizations that leverage the multi-modal nature of VQA. We show using extensive evaluation on two VQA testbeds and two mobile platforms, that MobiVQA significantly improves latency and energy with minimal accuracy loss compared to state-of-the-art VQA models. For instance, MobiVQA can answer a visual question in 163 milliseconds on the phone, compared to over 20-second latency incurred by the most accurate state-of-the-art model, while incurring less than 1 point reduction in accuracy.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"107 1","pages":"44:1-44:23"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74642437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Earmonitor: In-ear Motion-resilient Acoustic Sensing Using Commodity Earphones 耳麦:使用普通耳机的入耳式运动弹性声学传感

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3569472

Xue Sun, Jie Xiong, Chao Feng, Wenwen Deng, Xudong Wei, Dingyi Fang, Xiaojiang Chen

引用次数: 0

WiAdv: Practical and Robust Adversarial Attack against WiFi-based Gesture Recognition System WiAdv:针对基于wifi的手势识别系统的实用且稳健的对抗性攻击

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3534618

Yuxuan Zhou, Huangxun Chen, Chenyu Huang, Qian Zhang

WiFi-based gesture recognition systems have attracted enormous interest owing to the non-intrusive of WiFi signals and the wide adoption of WiFi for communication. Despite boosted performance via integrating advanced deep neural network (DNN) classifiers, there lacks sufficient investigation on their security vulnerabilities, which are rooted in the open nature of the wireless medium and the inherent defects (e.g., adversarial attacks) of classifiers. To fill this gap, we aim to study adversarial attacks to DNN-powered WiFi-based gesture recognition to encourage proper countermeasures. We design WiAdv to construct physically realizable adversarial examples to fool these systems. WiAdv features a signal synthesis scheme to craft adversarial signals with desired motion features based on the fundamental principle of WiFi-based gesture recognition, and a black-box attack scheme to handle the inconsistency between the perturbation space and the input space of the classifier caused by the in-between non-differentiable processing modules. We realize and evaluate our attack strategies against a representative state-of-the-art system, Widar3.0 in realistic settings. The experimental results show that the adversarial wireless signals generated by WiAdv achieve over 70% attack success rate on average, and remain robust and effective across different physical settings. Our attack case study and analysis reveal the vulnerability of WiFi-based gesture recognition systems, and we hope WiAdv could help promote the improvement of the relevant systems.

基于WiFi的手势识别系统由于WiFi信号的非侵入性和WiFi通信的广泛采用而引起了人们的极大兴趣。尽管通过集成高级深度神经网络(DNN)分类器提高了性能，但缺乏对其安全漏洞的充分调查，这些漏洞源于无线媒体的开放性和分类器的固有缺陷(例如，对抗性攻击)。为了填补这一空白，我们的目标是研究对基于dnn的wifi手势识别的对抗性攻击，以鼓励适当的对策。我们设计WiAdv来构建物理上可实现的对抗示例来欺骗这些系统。WiAdv采用基于wifi手势识别基本原理的信号合成方案，生成具有所需运动特征的对抗信号;采用黑盒攻击方案，处理中间不可微处理模块导致的扰动空间与分类器输入空间不一致的问题。我们在现实环境中实现并评估了针对具有代表性的最先进系统Widar3.0的攻击策略。实验结果表明，WiAdv生成的对抗性无线信号平均攻击成功率超过70%，并且在不同物理环境下都保持鲁棒性和有效性。我们的攻击案例研究和分析揭示了基于wifi的手势识别系统的脆弱性，我们希望WiAdv能够帮助推动相关系统的完善。

{"title":"WiAdv: Practical and Robust Adversarial Attack against WiFi-based Gesture Recognition System","authors":"Yuxuan Zhou, Huangxun Chen, Chenyu Huang, Qian Zhang","doi":"10.1145/3534618","DOIUrl":"https://doi.org/10.1145/3534618","url":null,"abstract":"WiFi-based gesture recognition systems have attracted enormous interest owing to the non-intrusive of WiFi signals and the wide adoption of WiFi for communication. Despite boosted performance via integrating advanced deep neural network (DNN) classifiers, there lacks sufficient investigation on their security vulnerabilities, which are rooted in the open nature of the wireless medium and the inherent defects (e.g., adversarial attacks) of classifiers. To fill this gap, we aim to study adversarial attacks to DNN-powered WiFi-based gesture recognition to encourage proper countermeasures. We design WiAdv to construct physically realizable adversarial examples to fool these systems. WiAdv features a signal synthesis scheme to craft adversarial signals with desired motion features based on the fundamental principle of WiFi-based gesture recognition, and a black-box attack scheme to handle the inconsistency between the perturbation space and the input space of the classifier caused by the in-between non-differentiable processing modules. We realize and evaluate our attack strategies against a representative state-of-the-art system, Widar3.0 in realistic settings. The experimental results show that the adversarial wireless signals generated by WiAdv achieve over 70% attack success rate on average, and remain robust and effective across different physical settings. Our attack case study and analysis reveal the vulnerability of WiFi-based gesture recognition systems, and we hope WiAdv could help promote the improvement of the relevant systems.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"176 1","pages":"92:1-92:25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73444985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

IndexPen: Two-Finger Text Input with Millimeter-Wave Radar IndexPen:两指文本输入与毫米波雷达

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3534601

Hao‐Shun Wei, Worcester, Li Ziheng, Alexander D. Galvan, SU Zhuoran, Xiao Zhang, E. Solovey, Hao‐Shun Wei, Ziheng Li, Alexander D. Galvan, Zhuoran Su, Xiao Zhang, K. Pahlavan

In this paper, we introduce IndexPen , a novel interaction technique for text input through two-finger in-air micro-gestures, enabling touch-free, effortless, tracking-based interaction, designed to mirror real-world writing. Our system is based on millimeter-wave radar sensing, and does not require instrumentation on the user. IndexPen can successfully identify 30 distinct gestures, representing the letters A-Z , as well as Space , Backspace , Enter , and a special Activation gesture to prevent unintentional input. Additionally, we include a noise class to differentiate gesture and non-gesture noise. We present our system design, including the radio frequency (RF) processing pipeline, classification model, and real-time detection algorithms. We further demonstrate our proof-of-concept system with data collected over ten days with five participants yielding 95.89% cross-validation accuracy on 31 classes (including noise ). Moreover, we explore the learnability and adaptability of our system for real-world text input with 16 participants who are first-time users to IndexPen over five sessions. After each session, the pre-trained model from the previous five-user study is calibrated on the data collected so far for a new user through transfer learning. The F-1 score showed an average increase of 9.14% per session with the calibration, reaching an average of 88.3% on the last session across the 16 users. Meanwhile, we show that the users can type sentences with IndexPen at 86.2% accuracy, measured by string similarity. This work builds a foundation and vision for future interaction interfaces that could be enabled with this paradigm.

在本文中，我们介绍了IndexPen，这是一种新颖的交互技术，通过两指空中微手势进行文本输入，实现无触摸，轻松，基于跟踪的交互，旨在反映现实世界的书写。我们的系统是基于毫米波雷达传感，不需要对用户的仪器。IndexPen可以成功识别30种不同的手势，代表字母a - z，以及空格、退格、Enter和一个特殊的激活手势，以防止无意的输入。此外，我们还包括一个噪声类来区分手势和非手势噪声。我们介绍了我们的系统设计，包括射频(RF)处理管道，分类模型和实时检测算法。我们进一步展示了我们的概念验证系统，在10天内收集了5名参与者的数据，在31个类别(包括噪声)上产生了95.89%的交叉验证准确率。此外，我们探索了我们的系统对现实世界文本输入的可学习性和适应性，16名参与者是第一次使用IndexPen的用户。每次会话结束后，通过迁移学习，根据到目前为止为新用户收集的数据对先前五个用户研究的预训练模型进行校准。经校正后，F-1评分平均每录得9.14%的升幅，最后一次录得的平均升幅为88.3%。与此同时，我们表明用户使用IndexPen输入句子的准确率为86.2%，以字符串相似度来衡量。这项工作为未来的交互界面构建了一个基础和远景，这些交互界面可以使用这个范例来实现。

{"title":"IndexPen: Two-Finger Text Input with Millimeter-Wave Radar","authors":"Hao‐Shun Wei, Worcester, Li Ziheng, Alexander D. Galvan, SU Zhuoran, Xiao Zhang, E. Solovey, Hao‐Shun Wei, Ziheng Li, Alexander D. Galvan, Zhuoran Su, Xiao Zhang, K. Pahlavan","doi":"10.1145/3534601","DOIUrl":"https://doi.org/10.1145/3534601","url":null,"abstract":"In this paper, we introduce IndexPen , a novel interaction technique for text input through two-finger in-air micro-gestures, enabling touch-free, effortless, tracking-based interaction, designed to mirror real-world writing. Our system is based on millimeter-wave radar sensing, and does not require instrumentation on the user. IndexPen can successfully identify 30 distinct gestures, representing the letters A-Z , as well as Space , Backspace , Enter , and a special Activation gesture to prevent unintentional input. Additionally, we include a noise class to differentiate gesture and non-gesture noise. We present our system design, including the radio frequency (RF) processing pipeline, classification model, and real-time detection algorithms. We further demonstrate our proof-of-concept system with data collected over ten days with five participants yielding 95.89% cross-validation accuracy on 31 classes (including noise ). Moreover, we explore the learnability and adaptability of our system for real-world text input with 16 participants who are first-time users to IndexPen over five sessions. After each session, the pre-trained model from the previous five-user study is calibrated on the data collected so far for a new user through transfer learning. The F-1 score showed an average increase of 9.14% per session with the calibration, reaching an average of 88.3% on the last session across the 16 users. Meanwhile, we show that the users can type sentences with IndexPen at 86.2% accuracy, measured by string similarity. This work builds a foundation and vision for future interaction interfaces that could be enabled with this paradigm.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"39 1","pages":"79:1-79:39"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87089522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

AmbiEar: mmWave Based Voice Recognition in NLoS Scenarios 基于毫米波的语音识别在NLoS场景

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3550320

J. Zhang, Yinian Zhou, Rui Xi, Shuai Li, Junchen Guo, Yuan He

Millimeter wave (mmWave) based sensing is a significant technique that enables innovative smart applications, e.g., voice recognition. The existing works in this area require direct sensing of the human’s near-throat region and consequently have limited applicability in non-line-of-sight (NLoS) scenarios. This paper proposes AmbiEar, the first mmWave based voice recognition approach applicable in NLoS scenarios. AmbiEar is based on the insight that the human’s voice causes correlated vibrations of the surrounding objects, regardless of the human’s position and posture. Therefore, AmbiEar regards the surrounding objects as ears that can perceive sound and realizes indirect sensing of the human’s voice by sensing the vibration of the surrounding objects. By incorporating the designs like common component extraction, signal superimposition, and encoder-decoder network, AmbiEar tackles the challenges induced by low-SNR and distorted signals. We implement AmbiEar on a commercial mmWave radar and evaluate its performance under different settings. The experimental results show that AmbiEar has a word recognition accuracy of 87.21% in NLoS scenarios and reduces the recognition error by 35.1%, compared to the direct sensing approach.

基于毫米波(mmWave)的传感是一项重要的技术，可以实现创新的智能应用，例如语音识别。该领域的现有工作需要直接感知人类的近喉区域，因此在非视线(NLoS)场景中的适用性有限。本文提出了AmbiEar，这是第一个基于毫米波的语音识别方法，适用于NLoS场景。不管人的位置和姿势如何，人的声音都会引起周围物体的相关振动。因此，AmbiEar将周围的物体视为可以感知声音的耳朵，通过感知周围物体的振动来实现对人的声音的间接感知。通过整合通用分量提取、信号叠加和编码器-解码器网络等设计，AmbiEar解决了低信噪比和失真信号带来的挑战。我们在商用毫米波雷达上实现了AmbiEar，并在不同设置下评估了其性能。实验结果表明，与直接感知方法相比，AmbiEar在非自然语言场景下的单词识别准确率达到87.21%，识别误差降低35.1%。

{"title":"AmbiEar: mmWave Based Voice Recognition in NLoS Scenarios","authors":"J. Zhang, Yinian Zhou, Rui Xi, Shuai Li, Junchen Guo, Yuan He","doi":"10.1145/3550320","DOIUrl":"https://doi.org/10.1145/3550320","url":null,"abstract":"Millimeter wave (mmWave) based sensing is a significant technique that enables innovative smart applications, e.g., voice recognition. The existing works in this area require direct sensing of the human’s near-throat region and consequently have limited applicability in non-line-of-sight (NLoS) scenarios. This paper proposes AmbiEar, the first mmWave based voice recognition approach applicable in NLoS scenarios. AmbiEar is based on the insight that the human’s voice causes correlated vibrations of the surrounding objects, regardless of the human’s position and posture. Therefore, AmbiEar regards the surrounding objects as ears that can perceive sound and realizes indirect sensing of the human’s voice by sensing the vibration of the surrounding objects. By incorporating the designs like common component extraction, signal superimposition, and encoder-decoder network, AmbiEar tackles the challenges induced by low-SNR and distorted signals. We implement AmbiEar on a commercial mmWave radar and evaluate its performance under different settings. The experimental results show that AmbiEar has a word recognition accuracy of 87.21% in NLoS scenarios and reduces the recognition error by 35.1%, compared to the direct sensing approach.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"6 1","pages":"151:1-151:25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87327725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

LoEar: Push the Range Limit of Acoustic Sensing for Vital Sign Monitoring LoEar:推动声学传感在生命体征监测中的范围极限

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3550293

Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, Daqing Zhang

Acoustic sensing has been explored in numerous applications leveraging the wide deployment of acoustic-enabled devices. However, most of the existing acoustic sensing systems work in a very short range only due to fast attenuation of ultrasonic signals, hindering their real-world deployment. In this paper, we present a novel acoustic sensing system using only a single microphone and speaker, named LoEar, to detect vital signs (respiration and heartbeat) with a significantly increased sensing range. We first develop a model, namely Carrierforming , to enhance the signal-to-noise ratio (SNR) via coherent superposition across multiple subcarriers on the target path. We then propose a novel technique called Continuous-MUSIC (Continuous-MUltiple SIgnal Classification) to detect a dynamic reflections, containing subtle motion, and further identify the target user based on the frequency distribution to enable Carrierforming . Finally, we adopt an adaptive Infinite Impulse Response (IIR) comb notch filter to recover the heartbeat pattern from the Channel Frequency Response (CFR) measurements which are dominated by respiration and further develop a peak-based scheme to estimate respiration rate and heart rate. We conduct extensive experiments to evaluate our system, and results show that our system outperforms the state-of-the-art using commercial devices, i.e., the range of respiration sensing is increased from 2 m to 7 m, and the range of heartbeat sensing is increased from 1.2 m to 6.5 m.

声学传感已经在许多应用中进行了探索，这些应用利用了声学启用设备的广泛部署。然而，由于超声波信号的快速衰减，大多数现有的声学传感系统只能在很短的范围内工作，这阻碍了它们在现实世界中的部署。在本文中，我们提出了一种新的声学传感系统，仅使用一个麦克风和扬声器，称为LoEar，以显著增加的传感范围检测生命体征(呼吸和心跳)。我们首先开发了一个模型，即载波成形，通过目标路径上多个子载波的相干叠加来提高信噪比(SNR)。然后，我们提出了一种名为Continuous-MUSIC (Continuous-MUltiple SIgnal Classification)的新技术来检测包含细微运动的动态反射，并根据频率分布进一步识别目标用户，从而实现载波成形。最后，我们采用自适应无限脉冲响应(IIR)梳状陷波滤波器从通道频率响应(CFR)测量中恢复心跳模式，通道频率响应(CFR)测量以呼吸为主，并进一步开发了基于峰值的方案来估计呼吸速率和心率。我们进行了大量的实验来评估我们的系统，结果表明我们的系统优于目前使用的商业设备，即呼吸传感范围从2米增加到7米，心跳传感范围从1.2米增加到6.5米。

{"title":"LoEar: Push the Range Limit of Acoustic Sensing for Vital Sign Monitoring","authors":"Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, Daqing Zhang","doi":"10.1145/3550293","DOIUrl":"https://doi.org/10.1145/3550293","url":null,"abstract":"Acoustic sensing has been explored in numerous applications leveraging the wide deployment of acoustic-enabled devices. However, most of the existing acoustic sensing systems work in a very short range only due to fast attenuation of ultrasonic signals, hindering their real-world deployment. In this paper, we present a novel acoustic sensing system using only a single microphone and speaker, named LoEar, to detect vital signs (respiration and heartbeat) with a significantly increased sensing range. We first develop a model, namely Carrierforming , to enhance the signal-to-noise ratio (SNR) via coherent superposition across multiple subcarriers on the target path. We then propose a novel technique called Continuous-MUSIC (Continuous-MUltiple SIgnal Classification) to detect a dynamic reflections, containing subtle motion, and further identify the target user based on the frequency distribution to enable Carrierforming . Finally, we adopt an adaptive Infinite Impulse Response (IIR) comb notch filter to recover the heartbeat pattern from the Channel Frequency Response (CFR) measurements which are dominated by respiration and further develop a peak-based scheme to estimate respiration rate and heart rate. We conduct extensive experiments to evaluate our system, and results show that our system outperforms the state-of-the-art using commercial devices, i.e., the range of respiration sensing is increased from 2 m to 7 m, and the range of heartbeat sensing is increased from 1.2 m to 6.5 m.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"65 1","pages":"145:1-145:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76671850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

SunBox: Screen-to-Camera Communication with Ambient Light SunBox:与环境光的屏幕到相机通信

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3534602

Miguel Chávez Tapia, Talia Xu, Zehang Wu, M. Z. Zamalloa

A recent development in wireless communication is the use of optical shutters and smartphone cameras to create optical links solely from ambient light . At the transmitter, a liquid crystal display (LCD) modulates ambient light by changing its level of transparency. At the receiver, a smartphone camera decodes the optical pattern. This LCD-to-camera link requires low-power levels at the transmitter, and it is easy to deploy because it does not require modifying the existing lighting infrastructure. The system, however, provides a low data rate, of just a few tens of bps. This occurs because the LCDs used in the state-of-the-art are slow single-pixel transmitters. To overcome this limitation, we introduce a novel multi-pixel display. Our display is similar to a simple screen, but instead of using embedded LEDs to radiate information, it uses only the surrounding ambient light. We build a prototype, called SunBox, and evaluate it indoors and outdoors with both, artificial and natural ambient light. Our results show that SunBox can achieve a throughput between 2kbps and 10kbps using a low-end smartphone camera with just 30FPS. To the best of our knowledge, this is the first screen-to-camera system that works solely with ambient light. ;

无线通信的最新发展是使用光学快门和智能手机相机，仅从环境光创建光链路。在发射器处，液晶显示器(LCD)通过改变其透明度来调节环境光。在接收器处，智能手机摄像头解码光学模式。这种lcd到摄像机的连接需要发射器的低功率水平，并且由于不需要修改现有的照明基础设施，因此易于部署。然而，该系统提供的数据速率很低，只有几十个bps。这是因为目前使用的lcd是慢速的单像素发射器。为了克服这一限制，我们引入了一种新的多像素显示器。我们的显示器类似于一个简单的屏幕，但它没有使用嵌入式led来辐射信息，而是只使用周围的环境光。我们制作了一个名为SunBox的原型，并在室内和室外使用人工和自然环境光对其进行了评估。我们的研究结果表明，SunBox可以在只有30FPS的低端智能手机相机上实现2kbps到10kbps之间的吞吐量。据我们所知，这是第一个仅在环境光下工作的屏幕到摄像头系统。；

{"title":"SunBox: Screen-to-Camera Communication with Ambient Light","authors":"Miguel Chávez Tapia, Talia Xu, Zehang Wu, M. Z. Zamalloa","doi":"10.1145/3534602","DOIUrl":"https://doi.org/10.1145/3534602","url":null,"abstract":"A recent development in wireless communication is the use of optical shutters and smartphone cameras to create optical links solely from ambient light . At the transmitter, a liquid crystal display (LCD) modulates ambient light by changing its level of transparency. At the receiver, a smartphone camera decodes the optical pattern. This LCD-to-camera link requires low-power levels at the transmitter, and it is easy to deploy because it does not require modifying the existing lighting infrastructure. The system, however, provides a low data rate, of just a few tens of bps. This occurs because the LCDs used in the state-of-the-art are slow single-pixel transmitters. To overcome this limitation, we introduce a novel multi-pixel display. Our display is similar to a simple screen, but instead of using embedded LEDs to radiate information, it uses only the surrounding ambient light. We build a prototype, called SunBox, and evaluate it indoors and outdoors with both, artificial and natural ambient light. Our results show that SunBox can achieve a throughput between 2kbps and 10kbps using a low-end smartphone camera with just 30FPS. To the best of our knowledge, this is the first screen-to-camera system that works solely with ambient light. ;","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"124 1","pages":"46:1-46:26"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77342239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

CornerRadar: RF-Based Indoor Localization Around Corners 街角雷达:基于射频的室内角落定位

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

Pub Date : 2022-01-01 DOI: 10.1145/3517226

Shichao Yue, Hao He, Peng-Xia Cao, Kaiwen Zha, Masayuki Koizumi, D. Katabi

Unmanned robots are increasingly used around humans in factories, malls, and hotels. As they navigate our space, it is important to ensure that such robots do not collide with people who suddenly appear as they turn a corner. Today, however, there is no practical solution for localizing people around corners. Optical solutions try to track hidden people through their visible shadows on the floor or a sidewall, but they can easily fail depending on the ambient light and the environment. More recent work has considered the use of radio frequency (RF) signals to track people and vehicles around street corners. However, past RF-based proposals rely on a simplistic ray-tracing model that fails in practical indoor scenarios. This paper introduces CornerRadar, an RF-based method that provides accurate around-corner indoor localization. CornerRadar addresses the limitations of the ray-tracing model used in past work. It does so through a novel encoding of how RF signals bounce off walls and occlusions. The encoding, which we call the hint map , is then fed to a neural network along with the radio signals to localize people around corners. Empirical evaluation with people moving around corners in 56 indoor environments shows that CornerRadar achieves a median error that is 3x to 12x smaller than past RF-based solutions for localizing people around corners.

无人机器人越来越多地在工厂、商场和酒店等人类周围使用。当它们在我们的空间中穿行时，重要的是要确保这些机器人不会与转弯时突然出现的人发生碰撞。然而，目前还没有切实可行的解决方案来将人们定位到各个角落。光学解决方案试图通过地板或侧壁上的可见阴影来追踪隐藏的人，但它们很容易失败，这取决于周围的光线和环境。最近的工作考虑使用射频(RF)信号来跟踪街角的人和车辆。然而，过去基于射频的建议依赖于一个简单的光线追踪模型，在实际的室内场景中失败。本文介绍了一种基于射频的精确室内转角定位方法——街角雷达。CornerRadar解决了过去工作中使用的光线追踪模型的局限性。它通过一种新颖的编码来实现射频信号如何在墙壁和遮挡物上反射。编码，我们称之为提示图，然后与无线电信号一起被馈送到神经网络，以定位拐角处的人。在56个室内环境中对拐角处移动的人进行的经验评估表明，与过去基于射频的解决方案相比，在拐角处定位人员时，CornerRadar的中位数误差要小3到12倍。

{"title":"CornerRadar: RF-Based Indoor Localization Around Corners","authors":"Shichao Yue, Hao He, Peng-Xia Cao, Kaiwen Zha, Masayuki Koizumi, D. Katabi","doi":"10.1145/3517226","DOIUrl":"https://doi.org/10.1145/3517226","url":null,"abstract":"Unmanned robots are increasingly used around humans in factories, malls, and hotels. As they navigate our space, it is important to ensure that such robots do not collide with people who suddenly appear as they turn a corner. Today, however, there is no practical solution for localizing people around corners. Optical solutions try to track hidden people through their visible shadows on the floor or a sidewall, but they can easily fail depending on the ambient light and the environment. More recent work has considered the use of radio frequency (RF) signals to track people and vehicles around street corners. However, past RF-based proposals rely on a simplistic ray-tracing model that fails in practical indoor scenarios. This paper introduces CornerRadar, an RF-based method that provides accurate around-corner indoor localization. CornerRadar addresses the limitations of the ray-tracing model used in past work. It does so through a novel encoding of how RF signals bounce off walls and occlusions. The encoding, which we call the hint map , is then fed to a neural network along with the radio signals to localize people around corners. Empirical evaluation with people moving around corners in 56 indoor environments shows that CornerRadar achieves a median error that is 3x to 12x smaller than past RF-based solutions for localizing people around corners.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"123 1","pages":"34:1-34:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79481741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5