Vision-based drone-view object detection suffers from severe performance degradation under adverse conditions (e.g., foggy weather, poor illumination). To remedy this, leveraging complementary mmWave radar has become a trend. However, existing fusion approaches seldom apply to drones due to i) the aggravated sparsity and noise of point clouds from low-cost commodity radars, and ii) explosive sensing data and intensive computations leading to high latency. To address these issues, we design Geryon , an edge assisted object detection system on drones, which utilizes a suit of approaches to fully exploit the complementary advantages of camera and mmWave radar on three levels: (i) a novel multi-frame compositing approach utilizes camera to assist radar to address the aggravated sparsity and noise of radar point clouds; (ii) a saliency area extraction and encoding approach utilizes radar to assist camera to reduce the bandwidth consumption and offloading latency; (iii) a parallel transmission and inference approach with a lightweight box enhancement scheme further reduces the offloading latency while ensuring the edge-side accuracy-latency trade-off by the parallelism and better camera-radar fusion. We implement and evaluate Geryon with four datasets we collect under foggy/rainy/snowy weather and poor illumination conditions, demonstrating its great advantages over other state-of-the-art approaches in terms of both accuracy and latency. CCS Concepts:
{"title":"Geryon: Edge Assisted Real-time and Robust Object Detection on Drones via mmWave Radar and Camera Fusion","authors":"Kaikai Deng, Dong Zhao, Qiaoyue Han, Shuyue Wang, Zihan Zhang, Anfu Zhou, Huadong Ma","doi":"10.1145/3550298","DOIUrl":"https://doi.org/10.1145/3550298","url":null,"abstract":"Vision-based drone-view object detection suffers from severe performance degradation under adverse conditions (e.g., foggy weather, poor illumination). To remedy this, leveraging complementary mmWave radar has become a trend. However, existing fusion approaches seldom apply to drones due to i) the aggravated sparsity and noise of point clouds from low-cost commodity radars, and ii) explosive sensing data and intensive computations leading to high latency. To address these issues, we design Geryon , an edge assisted object detection system on drones, which utilizes a suit of approaches to fully exploit the complementary advantages of camera and mmWave radar on three levels: (i) a novel multi-frame compositing approach utilizes camera to assist radar to address the aggravated sparsity and noise of radar point clouds; (ii) a saliency area extraction and encoding approach utilizes radar to assist camera to reduce the bandwidth consumption and offloading latency; (iii) a parallel transmission and inference approach with a lightweight box enhancement scheme further reduces the offloading latency while ensuring the edge-side accuracy-latency trade-off by the parallelism and better camera-radar fusion. We implement and evaluate Geryon with four datasets we collect under foggy/rainy/snowy weather and poor illumination conditions, demonstrating its great advantages over other state-of-the-art approaches in terms of both accuracy and latency. CCS Concepts:","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"70 1","pages":"109:1-109:27"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72733418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose to explore teeth-clenching-based target selection in Augmented Reality (AR), as the subtlety in the interaction can be beneficial to applications occupying the user’s hand or that are sensitive to social norms. To support the investigation, we implemented an EMG-based teeth-clenching detection system (ClenchClick), where we adopted customized thresholds for different users. We first explored and compared the potential interaction design leveraging head movements and teeth clenching in combination. We finalized the interaction to take the form of a Point-and-Click manner with clenches as the confirmation mechanism. We evaluated the taskload and performance of ClenchClick by comparing it with two baseline methods in target selection tasks. Results showed that ClenchClick outperformed hand gestures in workload, physical load, accuracy and speed, and outperformed dwell in work load and temporal load. Lastly, through user studies, we demonstrated the advantage of ClenchClick in real-world tasks, including efficient and accurate hands-free target selection, natural and unobtrusive interaction in public, and robust head gesture input. investigated the interaction design, user experience in target selection tasks, and user performance in real-world tasks in a series of user studies. In our first user study, we explored nine potential designs and compared the three most promising designs (ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge) with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline in target selection tasks. ClenchClick had the best overall user experience with the lowest workload. It outperformed Hand Gesture in both physical and temporal load, and outperformed Dwell in temporal and mental load. In the second study, we evaluated the performance of ClenchClick with two detection methods (General and Personalized), in comparison with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline. Results showed that ClenchClick outperformed Hand Gesture in accuracy (98.9% v.s. 89.4%), and was comparable with Dwell in accuracy and efficiency. We further investigated users’ behavioral characteristics by analyzing their cursor trajectories in the tasks, which showed that ClenchClick was a smoother target selection method. It was more psychologically friendly and occupied less of the user’s attention. Finally, we conducted user studies in three real-world tasks which supported hands-free, social-friendly, and head gesture interaction. Results revealed that ClenchClick is an efficient and accurate target selection method when both hands are occupied. It is social-friendly and satisfying when performing in public, and can serve as activation to head gestures which significantly alleviates false positive issues.
我们建议在增强现实(AR)中探索基于咬牙的目标选择,因为交互中的微妙之处可能有利于占用用户的手或对社会规范敏感的应用程序。为了支持调查,我们实现了一个基于肌电图的咬牙检测系统(ClenchClick),我们为不同的用户采用了定制的阈值。我们首先探索并比较了头部运动和咬牙结合的潜在交互设计。我们最终确定了互动的形式,以点击的方式,握紧作为确认机制。通过与两种基线方法在目标选择任务中的比较,我们评估了ClenchClick的任务负载和性能。结果表明,ClenchClick在工作量、物理负荷、准确性和速度上优于手势,在工作负荷和时间负荷上优于驻留。最后,通过用户研究,我们展示了ClenchClick在现实世界任务中的优势,包括高效准确的免提目标选择,自然而不引人注目的公共互动,以及强大的头部手势输入。在一系列的用户研究中,研究了交互设计、目标选择任务中的用户体验和现实任务中的用户表现。在我们的第一个用户研究中,我们探索了九种潜在的设计,并将三种最有前途的设计(ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge)与基于手的(手势)和无手的(Dwell)基线在目标选择任务中进行比较。ClenchClick具有最佳的总体用户体验和最低的工作负载。它在身体和时间负荷上都优于手势,在时间和精神负荷上优于Dwell。在第二项研究中,我们用两种检测方法(通用和个性化)评估了ClenchClick的性能,并与基于手的(手势)和免提的(Dwell)基线进行了比较。结果表明,ClenchClick在准确率上优于Hand Gesture (98.9% vs . 89.4%),在准确率和效率上与Dwell相当。通过分析用户在任务中的光标轨迹,我们进一步研究了用户的行为特征,结果表明ClenchClick是一种更平滑的目标选择方法。它在心理上更友好,占用用户的注意力更少。最后,我们在三个支持免提、社交友好和头部手势交互的现实世界任务中进行了用户研究。结果表明,当双手被占用时,ClenchClick是一种高效、准确的目标选择方法。在公共场合表演时,它是社交友好的,令人满意的,并且可以作为头部动作的激活,显着减轻假阳性问题。
{"title":"ClenchClick: Hands-Free Target Selection Method Leveraging Teeth-Clench for Augmented Reality","authors":"Xiyuan Shen, Yukang Yan, Chun Yu, Yuanchun Shi","doi":"10.1145/3550327","DOIUrl":"https://doi.org/10.1145/3550327","url":null,"abstract":"We propose to explore teeth-clenching-based target selection in Augmented Reality (AR), as the subtlety in the interaction can be beneficial to applications occupying the user’s hand or that are sensitive to social norms. To support the investigation, we implemented an EMG-based teeth-clenching detection system (ClenchClick), where we adopted customized thresholds for different users. We first explored and compared the potential interaction design leveraging head movements and teeth clenching in combination. We finalized the interaction to take the form of a Point-and-Click manner with clenches as the confirmation mechanism. We evaluated the taskload and performance of ClenchClick by comparing it with two baseline methods in target selection tasks. Results showed that ClenchClick outperformed hand gestures in workload, physical load, accuracy and speed, and outperformed dwell in work load and temporal load. Lastly, through user studies, we demonstrated the advantage of ClenchClick in real-world tasks, including efficient and accurate hands-free target selection, natural and unobtrusive interaction in public, and robust head gesture input. investigated the interaction design, user experience in target selection tasks, and user performance in real-world tasks in a series of user studies. In our first user study, we explored nine potential designs and compared the three most promising designs (ClenchClick, ClenchCross-ingTarget, ClenchCrossingEdge) with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline in target selection tasks. ClenchClick had the best overall user experience with the lowest workload. It outperformed Hand Gesture in both physical and temporal load, and outperformed Dwell in temporal and mental load. In the second study, we evaluated the performance of ClenchClick with two detection methods (General and Personalized), in comparison with a hand-based (Hand Gesture) and a hands-free (Dwell) baseline. Results showed that ClenchClick outperformed Hand Gesture in accuracy (98.9% v.s. 89.4%), and was comparable with Dwell in accuracy and efficiency. We further investigated users’ behavioral characteristics by analyzing their cursor trajectories in the tasks, which showed that ClenchClick was a smoother target selection method. It was more psychologically friendly and occupied less of the user’s attention. Finally, we conducted user studies in three real-world tasks which supported hands-free, social-friendly, and head gesture interaction. Results revealed that ClenchClick is an efficient and accurate target selection method when both hands are occupied. It is social-friendly and satisfying when performing in public, and can serve as activation to head gestures which significantly alleviates false positive issues.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"27 1","pages":"139:1-139:26"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73774807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual Question Answering (VQA) is a relatively new task where a user can ask a natural question about an image and obtain an answer. VQA is useful for many applications and is widely popular for users with visual impairments. Our goal is to design a VQA application that works efficiently on mobile devices without requiring cloud support. Such a system will allow users to ask visual questions privately, without having to send their questions to the cloud, while also reduce cloud communication costs. However, existing VQA applications use deep learning models that significantly improve accuracy, but is computationally heavy. Unfortunately, existing techniques that optimize deep learning for mobile devices cannot be applied for VQA because the VQA task is multi-modal—it requires both processing vision and text data. Existing mobile optimizations that work for vision-only or text-only neural networks cannot be applied here because of the dependencies between the two modes. Instead, we design MobiVQA, a set of optimizations that leverage the multi-modal nature of VQA. We show using extensive evaluation on two VQA testbeds and two mobile platforms, that MobiVQA significantly improves latency and energy with minimal accuracy loss compared to state-of-the-art VQA models. For instance, MobiVQA can answer a visual question in 163 milliseconds on the phone, compared to over 20-second latency incurred by the most accurate state-of-the-art model, while incurring less than 1 point reduction in accuracy.
{"title":"MobiVQA: Efficient On-Device Visual Question Answering","authors":"Qingqing Cao","doi":"10.1145/3534619","DOIUrl":"https://doi.org/10.1145/3534619","url":null,"abstract":"Visual Question Answering (VQA) is a relatively new task where a user can ask a natural question about an image and obtain an answer. VQA is useful for many applications and is widely popular for users with visual impairments. Our goal is to design a VQA application that works efficiently on mobile devices without requiring cloud support. Such a system will allow users to ask visual questions privately, without having to send their questions to the cloud, while also reduce cloud communication costs. However, existing VQA applications use deep learning models that significantly improve accuracy, but is computationally heavy. Unfortunately, existing techniques that optimize deep learning for mobile devices cannot be applied for VQA because the VQA task is multi-modal—it requires both processing vision and text data. Existing mobile optimizations that work for vision-only or text-only neural networks cannot be applied here because of the dependencies between the two modes. Instead, we design MobiVQA, a set of optimizations that leverage the multi-modal nature of VQA. We show using extensive evaluation on two VQA testbeds and two mobile platforms, that MobiVQA significantly improves latency and energy with minimal accuracy loss compared to state-of-the-art VQA models. For instance, MobiVQA can answer a visual question in 163 milliseconds on the phone, compared to over 20-second latency incurred by the most accurate state-of-the-art model, while incurring less than 1 point reduction in accuracy.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"107 1","pages":"44:1-44:23"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74642437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
WiFi-based gesture recognition systems have attracted enormous interest owing to the non-intrusive of WiFi signals and the wide adoption of WiFi for communication. Despite boosted performance via integrating advanced deep neural network (DNN) classifiers, there lacks sufficient investigation on their security vulnerabilities, which are rooted in the open nature of the wireless medium and the inherent defects (e.g., adversarial attacks) of classifiers. To fill this gap, we aim to study adversarial attacks to DNN-powered WiFi-based gesture recognition to encourage proper countermeasures. We design WiAdv to construct physically realizable adversarial examples to fool these systems. WiAdv features a signal synthesis scheme to craft adversarial signals with desired motion features based on the fundamental principle of WiFi-based gesture recognition, and a black-box attack scheme to handle the inconsistency between the perturbation space and the input space of the classifier caused by the in-between non-differentiable processing modules. We realize and evaluate our attack strategies against a representative state-of-the-art system, Widar3.0 in realistic settings. The experimental results show that the adversarial wireless signals generated by WiAdv achieve over 70% attack success rate on average, and remain robust and effective across different physical settings. Our attack case study and analysis reveal the vulnerability of WiFi-based gesture recognition systems, and we hope WiAdv could help promote the improvement of the relevant systems.
{"title":"WiAdv: Practical and Robust Adversarial Attack against WiFi-based Gesture Recognition System","authors":"Yuxuan Zhou, Huangxun Chen, Chenyu Huang, Qian Zhang","doi":"10.1145/3534618","DOIUrl":"https://doi.org/10.1145/3534618","url":null,"abstract":"WiFi-based gesture recognition systems have attracted enormous interest owing to the non-intrusive of WiFi signals and the wide adoption of WiFi for communication. Despite boosted performance via integrating advanced deep neural network (DNN) classifiers, there lacks sufficient investigation on their security vulnerabilities, which are rooted in the open nature of the wireless medium and the inherent defects (e.g., adversarial attacks) of classifiers. To fill this gap, we aim to study adversarial attacks to DNN-powered WiFi-based gesture recognition to encourage proper countermeasures. We design WiAdv to construct physically realizable adversarial examples to fool these systems. WiAdv features a signal synthesis scheme to craft adversarial signals with desired motion features based on the fundamental principle of WiFi-based gesture recognition, and a black-box attack scheme to handle the inconsistency between the perturbation space and the input space of the classifier caused by the in-between non-differentiable processing modules. We realize and evaluate our attack strategies against a representative state-of-the-art system, Widar3.0 in realistic settings. The experimental results show that the adversarial wireless signals generated by WiAdv achieve over 70% attack success rate on average, and remain robust and effective across different physical settings. Our attack case study and analysis reveal the vulnerability of WiFi-based gesture recognition systems, and we hope WiAdv could help promote the improvement of the relevant systems.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"176 1","pages":"92:1-92:25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73444985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao‐Shun Wei, Worcester, Li Ziheng, Alexander D. Galvan, SU Zhuoran, Xiao Zhang, E. Solovey, Hao‐Shun Wei, Ziheng Li, Alexander D. Galvan, Zhuoran Su, Xiao Zhang, K. Pahlavan
In this paper, we introduce IndexPen , a novel interaction technique for text input through two-finger in-air micro-gestures, enabling touch-free, effortless, tracking-based interaction, designed to mirror real-world writing. Our system is based on millimeter-wave radar sensing, and does not require instrumentation on the user. IndexPen can successfully identify 30 distinct gestures, representing the letters A-Z , as well as Space , Backspace , Enter , and a special Activation gesture to prevent unintentional input. Additionally, we include a noise class to differentiate gesture and non-gesture noise. We present our system design, including the radio frequency (RF) processing pipeline, classification model, and real-time detection algorithms. We further demonstrate our proof-of-concept system with data collected over ten days with five participants yielding 95.89% cross-validation accuracy on 31 classes (including noise ). Moreover, we explore the learnability and adaptability of our system for real-world text input with 16 participants who are first-time users to IndexPen over five sessions. After each session, the pre-trained model from the previous five-user study is calibrated on the data collected so far for a new user through transfer learning. The F-1 score showed an average increase of 9.14% per session with the calibration, reaching an average of 88.3% on the last session across the 16 users. Meanwhile, we show that the users can type sentences with IndexPen at 86.2% accuracy, measured by string similarity. This work builds a foundation and vision for future interaction interfaces that could be enabled with this paradigm.
{"title":"IndexPen: Two-Finger Text Input with Millimeter-Wave Radar","authors":"Hao‐Shun Wei, Worcester, Li Ziheng, Alexander D. Galvan, SU Zhuoran, Xiao Zhang, E. Solovey, Hao‐Shun Wei, Ziheng Li, Alexander D. Galvan, Zhuoran Su, Xiao Zhang, K. Pahlavan","doi":"10.1145/3534601","DOIUrl":"https://doi.org/10.1145/3534601","url":null,"abstract":"In this paper, we introduce IndexPen , a novel interaction technique for text input through two-finger in-air micro-gestures, enabling touch-free, effortless, tracking-based interaction, designed to mirror real-world writing. Our system is based on millimeter-wave radar sensing, and does not require instrumentation on the user. IndexPen can successfully identify 30 distinct gestures, representing the letters A-Z , as well as Space , Backspace , Enter , and a special Activation gesture to prevent unintentional input. Additionally, we include a noise class to differentiate gesture and non-gesture noise. We present our system design, including the radio frequency (RF) processing pipeline, classification model, and real-time detection algorithms. We further demonstrate our proof-of-concept system with data collected over ten days with five participants yielding 95.89% cross-validation accuracy on 31 classes (including noise ). Moreover, we explore the learnability and adaptability of our system for real-world text input with 16 participants who are first-time users to IndexPen over five sessions. After each session, the pre-trained model from the previous five-user study is calibrated on the data collected so far for a new user through transfer learning. The F-1 score showed an average increase of 9.14% per session with the calibration, reaching an average of 88.3% on the last session across the 16 users. Meanwhile, we show that the users can type sentences with IndexPen at 86.2% accuracy, measured by string similarity. This work builds a foundation and vision for future interaction interfaces that could be enabled with this paradigm.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"39 1","pages":"79:1-79:39"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87089522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Zhang, Yinian Zhou, Rui Xi, Shuai Li, Junchen Guo, Yuan He
Millimeter wave (mmWave) based sensing is a significant technique that enables innovative smart applications, e.g., voice recognition. The existing works in this area require direct sensing of the human’s near-throat region and consequently have limited applicability in non-line-of-sight (NLoS) scenarios. This paper proposes AmbiEar, the first mmWave based voice recognition approach applicable in NLoS scenarios. AmbiEar is based on the insight that the human’s voice causes correlated vibrations of the surrounding objects, regardless of the human’s position and posture. Therefore, AmbiEar regards the surrounding objects as ears that can perceive sound and realizes indirect sensing of the human’s voice by sensing the vibration of the surrounding objects. By incorporating the designs like common component extraction, signal superimposition, and encoder-decoder network, AmbiEar tackles the challenges induced by low-SNR and distorted signals. We implement AmbiEar on a commercial mmWave radar and evaluate its performance under different settings. The experimental results show that AmbiEar has a word recognition accuracy of 87.21% in NLoS scenarios and reduces the recognition error by 35.1%, compared to the direct sensing approach.
{"title":"AmbiEar: mmWave Based Voice Recognition in NLoS Scenarios","authors":"J. Zhang, Yinian Zhou, Rui Xi, Shuai Li, Junchen Guo, Yuan He","doi":"10.1145/3550320","DOIUrl":"https://doi.org/10.1145/3550320","url":null,"abstract":"Millimeter wave (mmWave) based sensing is a significant technique that enables innovative smart applications, e.g., voice recognition. The existing works in this area require direct sensing of the human’s near-throat region and consequently have limited applicability in non-line-of-sight (NLoS) scenarios. This paper proposes AmbiEar, the first mmWave based voice recognition approach applicable in NLoS scenarios. AmbiEar is based on the insight that the human’s voice causes correlated vibrations of the surrounding objects, regardless of the human’s position and posture. Therefore, AmbiEar regards the surrounding objects as ears that can perceive sound and realizes indirect sensing of the human’s voice by sensing the vibration of the surrounding objects. By incorporating the designs like common component extraction, signal superimposition, and encoder-decoder network, AmbiEar tackles the challenges induced by low-SNR and distorted signals. We implement AmbiEar on a commercial mmWave radar and evaluate its performance under different settings. The experimental results show that AmbiEar has a word recognition accuracy of 87.21% in NLoS scenarios and reduces the recognition error by 35.1%, compared to the direct sensing approach.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"6 1","pages":"151:1-151:25"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87327725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, Daqing Zhang
Acoustic sensing has been explored in numerous applications leveraging the wide deployment of acoustic-enabled devices. However, most of the existing acoustic sensing systems work in a very short range only due to fast attenuation of ultrasonic signals, hindering their real-world deployment. In this paper, we present a novel acoustic sensing system using only a single microphone and speaker, named LoEar, to detect vital signs (respiration and heartbeat) with a significantly increased sensing range. We first develop a model, namely Carrierforming , to enhance the signal-to-noise ratio (SNR) via coherent superposition across multiple subcarriers on the target path. We then propose a novel technique called Continuous-MUSIC (Continuous-MUltiple SIgnal Classification) to detect a dynamic reflections, containing subtle motion, and further identify the target user based on the frequency distribution to enable Carrierforming . Finally, we adopt an adaptive Infinite Impulse Response (IIR) comb notch filter to recover the heartbeat pattern from the Channel Frequency Response (CFR) measurements which are dominated by respiration and further develop a peak-based scheme to estimate respiration rate and heart rate. We conduct extensive experiments to evaluate our system, and results show that our system outperforms the state-of-the-art using commercial devices, i.e., the range of respiration sensing is increased from 2 m to 7 m, and the range of heartbeat sensing is increased from 1.2 m to 6.5 m.
声学传感已经在许多应用中进行了探索,这些应用利用了声学启用设备的广泛部署。然而,由于超声波信号的快速衰减,大多数现有的声学传感系统只能在很短的范围内工作,这阻碍了它们在现实世界中的部署。在本文中,我们提出了一种新的声学传感系统,仅使用一个麦克风和扬声器,称为LoEar,以显著增加的传感范围检测生命体征(呼吸和心跳)。我们首先开发了一个模型,即载波成形,通过目标路径上多个子载波的相干叠加来提高信噪比(SNR)。然后,我们提出了一种名为Continuous-MUSIC (Continuous-MUltiple SIgnal Classification)的新技术来检测包含细微运动的动态反射,并根据频率分布进一步识别目标用户,从而实现载波成形。最后,我们采用自适应无限脉冲响应(IIR)梳状陷波滤波器从通道频率响应(CFR)测量中恢复心跳模式,通道频率响应(CFR)测量以呼吸为主,并进一步开发了基于峰值的方案来估计呼吸速率和心率。我们进行了大量的实验来评估我们的系统,结果表明我们的系统优于目前使用的商业设备,即呼吸传感范围从2米增加到7米,心跳传感范围从1.2米增加到6.5米。
{"title":"LoEar: Push the Range Limit of Acoustic Sensing for Vital Sign Monitoring","authors":"Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, Daqing Zhang","doi":"10.1145/3550293","DOIUrl":"https://doi.org/10.1145/3550293","url":null,"abstract":"Acoustic sensing has been explored in numerous applications leveraging the wide deployment of acoustic-enabled devices. However, most of the existing acoustic sensing systems work in a very short range only due to fast attenuation of ultrasonic signals, hindering their real-world deployment. In this paper, we present a novel acoustic sensing system using only a single microphone and speaker, named LoEar, to detect vital signs (respiration and heartbeat) with a significantly increased sensing range. We first develop a model, namely Carrierforming , to enhance the signal-to-noise ratio (SNR) via coherent superposition across multiple subcarriers on the target path. We then propose a novel technique called Continuous-MUSIC (Continuous-MUltiple SIgnal Classification) to detect a dynamic reflections, containing subtle motion, and further identify the target user based on the frequency distribution to enable Carrierforming . Finally, we adopt an adaptive Infinite Impulse Response (IIR) comb notch filter to recover the heartbeat pattern from the Channel Frequency Response (CFR) measurements which are dominated by respiration and further develop a peak-based scheme to estimate respiration rate and heart rate. We conduct extensive experiments to evaluate our system, and results show that our system outperforms the state-of-the-art using commercial devices, i.e., the range of respiration sensing is increased from 2 m to 7 m, and the range of heartbeat sensing is increased from 1.2 m to 6.5 m.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"65 1","pages":"145:1-145:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76671850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Chávez Tapia, Talia Xu, Zehang Wu, M. Z. Zamalloa
A recent development in wireless communication is the use of optical shutters and smartphone cameras to create optical links solely from ambient light . At the transmitter, a liquid crystal display (LCD) modulates ambient light by changing its level of transparency. At the receiver, a smartphone camera decodes the optical pattern. This LCD-to-camera link requires low-power levels at the transmitter, and it is easy to deploy because it does not require modifying the existing lighting infrastructure. The system, however, provides a low data rate, of just a few tens of bps. This occurs because the LCDs used in the state-of-the-art are slow single-pixel transmitters. To overcome this limitation, we introduce a novel multi-pixel display. Our display is similar to a simple screen, but instead of using embedded LEDs to radiate information, it uses only the surrounding ambient light. We build a prototype, called SunBox, and evaluate it indoors and outdoors with both, artificial and natural ambient light. Our results show that SunBox can achieve a throughput between 2kbps and 10kbps using a low-end smartphone camera with just 30FPS. To the best of our knowledge, this is the first screen-to-camera system that works solely with ambient light. ;
{"title":"SunBox: Screen-to-Camera Communication with Ambient Light","authors":"Miguel Chávez Tapia, Talia Xu, Zehang Wu, M. Z. Zamalloa","doi":"10.1145/3534602","DOIUrl":"https://doi.org/10.1145/3534602","url":null,"abstract":"A recent development in wireless communication is the use of optical shutters and smartphone cameras to create optical links solely from ambient light . At the transmitter, a liquid crystal display (LCD) modulates ambient light by changing its level of transparency. At the receiver, a smartphone camera decodes the optical pattern. This LCD-to-camera link requires low-power levels at the transmitter, and it is easy to deploy because it does not require modifying the existing lighting infrastructure. The system, however, provides a low data rate, of just a few tens of bps. This occurs because the LCDs used in the state-of-the-art are slow single-pixel transmitters. To overcome this limitation, we introduce a novel multi-pixel display. Our display is similar to a simple screen, but instead of using embedded LEDs to radiate information, it uses only the surrounding ambient light. We build a prototype, called SunBox, and evaluate it indoors and outdoors with both, artificial and natural ambient light. Our results show that SunBox can achieve a throughput between 2kbps and 10kbps using a low-end smartphone camera with just 30FPS. To the best of our knowledge, this is the first screen-to-camera system that works solely with ambient light. ;","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"124 1","pages":"46:1-46:26"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77342239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmanned robots are increasingly used around humans in factories, malls, and hotels. As they navigate our space, it is important to ensure that such robots do not collide with people who suddenly appear as they turn a corner. Today, however, there is no practical solution for localizing people around corners. Optical solutions try to track hidden people through their visible shadows on the floor or a sidewall, but they can easily fail depending on the ambient light and the environment. More recent work has considered the use of radio frequency (RF) signals to track people and vehicles around street corners. However, past RF-based proposals rely on a simplistic ray-tracing model that fails in practical indoor scenarios. This paper introduces CornerRadar, an RF-based method that provides accurate around-corner indoor localization. CornerRadar addresses the limitations of the ray-tracing model used in past work. It does so through a novel encoding of how RF signals bounce off walls and occlusions. The encoding, which we call the hint map , is then fed to a neural network along with the radio signals to localize people around corners. Empirical evaluation with people moving around corners in 56 indoor environments shows that CornerRadar achieves a median error that is 3x to 12x smaller than past RF-based solutions for localizing people around corners.
{"title":"CornerRadar: RF-Based Indoor Localization Around Corners","authors":"Shichao Yue, Hao He, Peng-Xia Cao, Kaiwen Zha, Masayuki Koizumi, D. Katabi","doi":"10.1145/3517226","DOIUrl":"https://doi.org/10.1145/3517226","url":null,"abstract":"Unmanned robots are increasingly used around humans in factories, malls, and hotels. As they navigate our space, it is important to ensure that such robots do not collide with people who suddenly appear as they turn a corner. Today, however, there is no practical solution for localizing people around corners. Optical solutions try to track hidden people through their visible shadows on the floor or a sidewall, but they can easily fail depending on the ambient light and the environment. More recent work has considered the use of radio frequency (RF) signals to track people and vehicles around street corners. However, past RF-based proposals rely on a simplistic ray-tracing model that fails in practical indoor scenarios. This paper introduces CornerRadar, an RF-based method that provides accurate around-corner indoor localization. CornerRadar addresses the limitations of the ray-tracing model used in past work. It does so through a novel encoding of how RF signals bounce off walls and occlusions. The encoding, which we call the hint map , is then fed to a neural network along with the radio signals to localize people around corners. Empirical evaluation with people moving around corners in 56 indoor environments shows that CornerRadar achieves a median error that is 3x to 12x smaller than past RF-based solutions for localizing people around corners.","PeriodicalId":20463,"journal":{"name":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.","volume":"123 1","pages":"34:1-34:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79481741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}