Objects engaged by users' hands contain rich contextual information for their strong correlation with user activities. Tools such as toothbrushes and wipes indicate cleansing and sanitation, while mice and keyboards imply work. Much research has been endeavored to sense hand-engaged objects to supply wearables with implicit interactions or ambient computing with personal informatics. We propose TextureSight, a smart-ring sensor that detects hand-engaged objects by detecting their distinctive surface textures using laser speckle imaging on a ring form factor. We conducted a two-day experience sampling study to investigate the unicity and repeatability of the object-texture combinations across routine objects. We grounded our sensing with a theoretical model and simulations, powered it with state-of-the-art deep neural net techniques, and evaluated it with a user study. TextureSight constitutes a valuable addition to the literature for its capability to sense passive objects without emission of EMI or vibration and its elimination of lens for preserving user privacy, leading to a new, practical method for activity recognition and context-aware computing.
{"title":"TextureSight","authors":"Xue Wang, Yang Zhang","doi":"10.1145/3631413","DOIUrl":"https://doi.org/10.1145/3631413","url":null,"abstract":"Objects engaged by users' hands contain rich contextual information for their strong correlation with user activities. Tools such as toothbrushes and wipes indicate cleansing and sanitation, while mice and keyboards imply work. Much research has been endeavored to sense hand-engaged objects to supply wearables with implicit interactions or ambient computing with personal informatics. We propose TextureSight, a smart-ring sensor that detects hand-engaged objects by detecting their distinctive surface textures using laser speckle imaging on a ring form factor. We conducted a two-day experience sampling study to investigate the unicity and repeatability of the object-texture combinations across routine objects. We grounded our sensing with a theoretical model and simulations, powered it with state-of-the-art deep neural net techniques, and evaluated it with a user study. TextureSight constitutes a valuable addition to the literature for its capability to sense passive objects without emission of EMI or vibration and its elimination of lens for preserving user privacy, leading to a new, practical method for activity recognition and context-aware computing.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 6","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advancement of wireless sensing technologies, RF-based contact-less liquid detection attracts more and more attention. Compared with other RF devices, the mmWave radar has the advantages of large bandwidth and low cost. While existing radar-based liquid detection systems demonstrate promising performance, they still have a shortcoming that in the detection result depends on container-related factors (e.g., container placement, container caliber, and container material). In this paper, to enable container-independent liquid detection with a COTS mmWave radar, we propose a dual-reflection model by exploring reflections from different interfaces of the liquid container. Specifically, we design a pair of amplitude ratios based on the signals reflected from different interfaces, and theoretically demonstrate how the refractive index of liquids can be estimated by eliminating the container's impact. To validate the proposed approach, we implement a liquid detection system LiqDetector. Experimental results show that LiqDetector achieves cross-container estimation of the liquid's refractive index with a mean absolute percentage error (MAPE) of about 4.4%. Moreover, the classification accuracies for 6 different liquids and alcohol with different strengths (even a difference of 1%) exceed 96% and 95%, respectively. To the best of our knowledge, this is the first study that achieves container-independent liquid detection based on the COTS mmWave radar by leveraging only one pair of Tx-Rx antennas.
{"title":"LiqDetector","authors":"Zhu Wang, Yifan Guo, Zhihui Ren, Wenchao Song, Zhuo Sun, Chaoxiong Chen, Bin Guo, Zhiwen Yu","doi":"10.1145/3631443","DOIUrl":"https://doi.org/10.1145/3631443","url":null,"abstract":"With the advancement of wireless sensing technologies, RF-based contact-less liquid detection attracts more and more attention. Compared with other RF devices, the mmWave radar has the advantages of large bandwidth and low cost. While existing radar-based liquid detection systems demonstrate promising performance, they still have a shortcoming that in the detection result depends on container-related factors (e.g., container placement, container caliber, and container material). In this paper, to enable container-independent liquid detection with a COTS mmWave radar, we propose a dual-reflection model by exploring reflections from different interfaces of the liquid container. Specifically, we design a pair of amplitude ratios based on the signals reflected from different interfaces, and theoretically demonstrate how the refractive index of liquids can be estimated by eliminating the container's impact. To validate the proposed approach, we implement a liquid detection system LiqDetector. Experimental results show that LiqDetector achieves cross-container estimation of the liquid's refractive index with a mean absolute percentage error (MAPE) of about 4.4%. Moreover, the classification accuracies for 6 different liquids and alcohol with different strengths (even a difference of 1%) exceed 96% and 95%, respectively. To the best of our knowledge, this is the first study that achieves container-independent liquid detection based on the COTS mmWave radar by leveraging only one pair of Tx-Rx antennas.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 29","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy
This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 μm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.
{"title":"Scribe","authors":"Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy","doi":"10.1145/3631411","DOIUrl":"https://doi.org/10.1145/3631411","url":null,"abstract":"This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 μm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 3","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hankai Liu, Xiulong Liu, Xin Xie, Xinyu Tong, Keqiu Li
The difficulty in obtaining targets' identity poses a significant obstacle to the pursuit of personalized and customized millimeter-wave (mmWave) sensing. Existing solutions that learn individual differences from signal features have limitations in practical applications. This paper presents a Personalized mmWave-based human Tracking system, PmTrack, by introducing inertial measurement units (IMUs) as identity indicators. Widely available in portable devices such as smartwatches and smartphones, IMUs utilize existing wireless networks for data uploading of identity and data, and are therefore able to assist in radar target identification in a lightweight manner with little deployment and carrying burden for users. PmTrack innovatively adopts orientation as the matching feature, thus well overcoming the data heterogeneity between radar and IMU while avoiding the effect of cumulative errors. In the implementation of PmTrack, we propose a comprehensive set of optimization methods in detection enhancement, interference suppression, continuity maintenance, and trajectory correction, which successfully solved a series of practical problems caused by the three major challenges of weak reflection, point cloud overlap, and body-bounce ghost in multi-person tracking. In addition, an orientation correction method is proposed to overcome the IMU gimbal lock. Extensive experimental results demonstrate that PmTrack achieves an identification accuracy of 98% and 95% with five people in the hall and meeting room, respectively.
{"title":"PmTrack","authors":"Hankai Liu, Xiulong Liu, Xin Xie, Xinyu Tong, Keqiu Li","doi":"10.1145/3631433","DOIUrl":"https://doi.org/10.1145/3631433","url":null,"abstract":"The difficulty in obtaining targets' identity poses a significant obstacle to the pursuit of personalized and customized millimeter-wave (mmWave) sensing. Existing solutions that learn individual differences from signal features have limitations in practical applications. This paper presents a Personalized mmWave-based human Tracking system, PmTrack, by introducing inertial measurement units (IMUs) as identity indicators. Widely available in portable devices such as smartwatches and smartphones, IMUs utilize existing wireless networks for data uploading of identity and data, and are therefore able to assist in radar target identification in a lightweight manner with little deployment and carrying burden for users. PmTrack innovatively adopts orientation as the matching feature, thus well overcoming the data heterogeneity between radar and IMU while avoiding the effect of cumulative errors. In the implementation of PmTrack, we propose a comprehensive set of optimization methods in detection enhancement, interference suppression, continuity maintenance, and trajectory correction, which successfully solved a series of practical problems caused by the three major challenges of weak reflection, point cloud overlap, and body-bounce ghost in multi-person tracking. In addition, an orientation correction method is proposed to overcome the IMU gimbal lock. Extensive experimental results demonstrate that PmTrack achieves an identification accuracy of 98% and 95% with five people in the hall and meeting room, respectively.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 8","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuheng Wei, Jie Xiong, Hui Liu, Yingtao Yu, Jiangtao Pan, Junzhao Du
Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.
{"title":"AdaStreamLite","authors":"Yuheng Wei, Jie Xiong, Hui Liu, Yingtao Yu, Jiangtao Pan, Junzhao Du","doi":"10.1145/3631460","DOIUrl":"https://doi.org/10.1145/3631460","url":null,"abstract":"Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 22","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rishiraj Adhikary, M. Sadeh, N. Batra, Mayank Goel
Smartphones and smartwatches have contributed significantly to fitness monitoring by providing real-time statistics, thanks to accurate tracking of physiological indices such as heart rate. However, the estimation of calories burned during exercise is inaccurate and cannot be used for medical diagnosis. In this work, we present JoulesEye, a smartphone thermal camera-based system that can accurately estimate calorie burn by monitoring respiration rate. We evaluated JoulesEye on 54 participants who performed high intensity cycling and running. The mean absolute percentage error (MAPE) of JoulesEye was 5.8%, which is significantly better than the MAPE of 37.6% observed with commercial smartwatch-based methods that only use heart rate. Finally, we show that an ultra-low-resolution thermal camera that is small enough to fit inside a watch or other wearables is sufficient for accurate calorie burn estimation. These results suggest that JoulesEye is a promising new method for accurate and reliable calorie burn estimation.
{"title":"JoulesEye","authors":"Rishiraj Adhikary, M. Sadeh, N. Batra, Mayank Goel","doi":"10.1145/3631422","DOIUrl":"https://doi.org/10.1145/3631422","url":null,"abstract":"Smartphones and smartwatches have contributed significantly to fitness monitoring by providing real-time statistics, thanks to accurate tracking of physiological indices such as heart rate. However, the estimation of calories burned during exercise is inaccurate and cannot be used for medical diagnosis. In this work, we present JoulesEye, a smartphone thermal camera-based system that can accurately estimate calorie burn by monitoring respiration rate. We evaluated JoulesEye on 54 participants who performed high intensity cycling and running. The mean absolute percentage error (MAPE) of JoulesEye was 5.8%, which is significantly better than the MAPE of 37.6% observed with commercial smartwatch-based methods that only use heart rate. Finally, we show that an ultra-low-resolution thermal camera that is small enough to fit inside a watch or other wearables is sufficient for accurate calorie burn estimation. These results suggest that JoulesEye is a promising new method for accurate and reliable calorie burn estimation.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"10 51","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harish Venugopalan, Z. Din, Trevor Carpenter, Jason Lowe-Power, Samuel T. King, Zubair Shafiq
Mobile app developers often rely on cameras to implement rich features. However, giving apps unfettered access to the mobile camera poses a privacy threat when camera frames capture sensitive information that is not needed for the app's functionality. To mitigate this threat, we present Aragorn, a novel privacy-enhancing mobile camera system that provides fine grained control over what information can be present in camera frames before apps can access them. Aragorn automatically sanitizes camera frames by detecting regions that are essential to an app's functionality and blocking out everything else to protect privacy while retaining app utility. Aragorn can cater to a wide range of camera apps and incorporates knowledge distillation and crowdsourcing to extend robust support to previously unsupported apps. In our evaluations, we see that, with no degradation in utility, Aragorn detects credit cards with 89% accuracy and faces with 100% accuracy in context of credit card scanning and face recognition respectively. We show that Aragorn's implementation in the Android camera subsystem only suffers an average drop of 0.01 frames per second in frame rate. Our evaluations show that the overhead incurred by Aragorn to system performance is reasonable.
{"title":"Aragorn","authors":"Harish Venugopalan, Z. Din, Trevor Carpenter, Jason Lowe-Power, Samuel T. King, Zubair Shafiq","doi":"10.1145/3631406","DOIUrl":"https://doi.org/10.1145/3631406","url":null,"abstract":"Mobile app developers often rely on cameras to implement rich features. However, giving apps unfettered access to the mobile camera poses a privacy threat when camera frames capture sensitive information that is not needed for the app's functionality. To mitigate this threat, we present Aragorn, a novel privacy-enhancing mobile camera system that provides fine grained control over what information can be present in camera frames before apps can access them. Aragorn automatically sanitizes camera frames by detecting regions that are essential to an app's functionality and blocking out everything else to protect privacy while retaining app utility. Aragorn can cater to a wide range of camera apps and incorporates knowledge distillation and crowdsourcing to extend robust support to previously unsupported apps. In our evaluations, we see that, with no degradation in utility, Aragorn detects credit cards with 89% accuracy and faces with 100% accuracy in context of credit card scanning and face recognition respectively. We show that Aragorn's implementation in the Android camera subsystem only suffers an average drop of 0.01 frames per second in frame rate. Our evaluations show that the overhead incurred by Aragorn to system performance is reasonable.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"4 4","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Yi, Fusang Zhang, Jie Xiong, Kai Niu, Zhiyun Yao, Daqing Zhang
The last few years have witnessed the rapid development of WiFi sensing with a large spectrum of applications enabled. However, existing works mainly leverage the obsolete 802.11n WiFi cards (i.e., Intel 5300 and Atheros AR9k series cards) for sensing. On the other hand, the mainstream WiFi protocols currently in use are 802.11ac/ax and commodity WiFi products on the market are equipped with new-generation WiFi chips such as Broadcom BCM43794 and Qualcomm QCN5054. After conducting some benchmark experiments, we find that WiFi sensing has problems working on these new cards. The new communication features (e.g., MU-MIMO) designed to facilitate data transmissions negatively impact WiFi sensing. Conventional CSI base signals such as CSI amplitude and/or CSI phase difference between antennas which worked well on Intel 5300 802.11n WiFi card may fail on new cards. In this paper, we propose delicate signal processing schemes to make wireless sensing work well on these new WiFi cards. We employ two typical sensing applications, i.e., human respiration monitoring and human trajectory tracking to demonstrate the effectiveness of the proposed schemes. We believe it is critical to ensure WiFi sensing compatible with the latest WiFi protocols and this work moves one important step towards real-life adoption of WiFi sensing.
{"title":"Enabling WiFi Sensing on New-generation WiFi Cards","authors":"E. Yi, Fusang Zhang, Jie Xiong, Kai Niu, Zhiyun Yao, Daqing Zhang","doi":"10.1145/3633807","DOIUrl":"https://doi.org/10.1145/3633807","url":null,"abstract":"The last few years have witnessed the rapid development of WiFi sensing with a large spectrum of applications enabled. However, existing works mainly leverage the obsolete 802.11n WiFi cards (i.e., Intel 5300 and Atheros AR9k series cards) for sensing. On the other hand, the mainstream WiFi protocols currently in use are 802.11ac/ax and commodity WiFi products on the market are equipped with new-generation WiFi chips such as Broadcom BCM43794 and Qualcomm QCN5054. After conducting some benchmark experiments, we find that WiFi sensing has problems working on these new cards. The new communication features (e.g., MU-MIMO) designed to facilitate data transmissions negatively impact WiFi sensing. Conventional CSI base signals such as CSI amplitude and/or CSI phase difference between antennas which worked well on Intel 5300 802.11n WiFi card may fail on new cards. In this paper, we propose delicate signal processing schemes to make wireless sensing work well on these new WiFi cards. We employ two typical sensing applications, i.e., human respiration monitoring and human trajectory tracking to demonstrate the effectiveness of the proposed schemes. We believe it is critical to ensure WiFi sensing compatible with the latest WiFi protocols and this work moves one important step towards real-life adoption of WiFi sensing.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"7 4","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wireless earbuds have been gaining increasing popularity and using them to make phone calls or issue voice commands requires the earbud microphones to pick up human speech. When the speaker is in a noisy environment, speech quality degrades significantly and requires speech enhancement (SE). In this paper, we present ClearSpeech, a novel deep-learning-based SE system designed for wireless earbuds. Specifically, by jointly using the earbud's in-ear and out-ear microphones, we devised a suite of techniques to effectively fuse the two signals and enhance the magnitude and phase of the speech spectrogram. We built an earbud prototype to evaluate ClearSpeech under various settings with data collected from 20 subjects. Our results suggest that ClearSpeech can improve the SE performance significantly compared to conventional approaches using the out-ear microphone only. We also show that ClearSpeech can process user speech in real-time on smartphones.
无线耳塞越来越受欢迎,使用它拨打电话或发出语音命令需要耳塞麦克风拾取人的语音。当说话者处于嘈杂环境中时,语音质量会明显下降,因此需要进行语音增强(SE)。在本文中,我们介绍了 ClearSpeech,这是一种基于深度学习的新型 SE 系统,专为无线耳塞设计。具体来说,通过联合使用耳塞的耳内和耳外麦克风,我们设计了一套技术来有效融合这两个信号,并增强语音频谱图的幅度和相位。我们制作了一个耳塞原型,利用从 20 名受试者那里收集的数据,对 ClearSpeech 在各种设置下的效果进行了评估。结果表明,与只使用耳外麦克风的传统方法相比,ClearSpeech 能显著提高 SE 性能。我们还证明 ClearSpeech 可以在智能手机上实时处理用户语音。
{"title":"ClearSpeech","authors":"Dong Ma, Ting Dang, Ming Ding, Rajesh Balan","doi":"10.1145/3631409","DOIUrl":"https://doi.org/10.1145/3631409","url":null,"abstract":"Wireless earbuds have been gaining increasing popularity and using them to make phone calls or issue voice commands requires the earbud microphones to pick up human speech. When the speaker is in a noisy environment, speech quality degrades significantly and requires speech enhancement (SE). In this paper, we present ClearSpeech, a novel deep-learning-based SE system designed for wireless earbuds. Specifically, by jointly using the earbud's in-ear and out-ear microphones, we devised a suite of techniques to effectively fuse the two signals and enhance the magnitude and phase of the speech spectrogram. We built an earbud prototype to evaluate ClearSpeech under various settings with data collected from 20 subjects. Our results suggest that ClearSpeech can improve the SE performance significantly compared to conventional approaches using the out-ear microphone only. We also show that ClearSpeech can process user speech in real-time on smartphones.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 6","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyu Zhang, Dongheng Zhang, Guanzhong Wang, Yadong Li, Yang Hu, Qibin sun, Yan Chen
In recent years, decimeter-level accuracy in WiFi indoor localization has become attainable within controlled environments. However, existing methods encounter challenges in maintaining robustness in more complex indoor environments: angle-based methods are compromised by the significant localization errors due to unreliable Angle of Arrival (AoA) estimations, and fingerprint-based methods suffer from performance degradation due to environmental changes. In this paper, we propose RLoc, a learning-based system designed for reliable localization and tracking. The key design principle of RLoc lies in quantifying the uncertainty level arises in the AoA estimation task and then exploiting the uncertainty to enhance the reliability of localization and tracking. To this end, RLoc first manually extracts the underutilized beamwidth feature via signal processing techniques. Then, it integrates the uncertainty quantification into neural network design through Kullback-Leibler (KL) divergence loss and ensemble techniques. Finally, these quantified uncertainties guide RLoc to optimally leverage the diversity of Access Points (APs) and the temporal continuous information of AoAs. Our experiments, evaluating on two datasets gathered from commercial off-the-shelf WiFi devices, demonstrate that RLoc surpasses state-of-the-art approaches by an average of 36.27% in in-domain scenarios and 20.40% in cross-domain scenarios.
{"title":"RLoc","authors":"Tianyu Zhang, Dongheng Zhang, Guanzhong Wang, Yadong Li, Yang Hu, Qibin sun, Yan Chen","doi":"10.1145/3631437","DOIUrl":"https://doi.org/10.1145/3631437","url":null,"abstract":"In recent years, decimeter-level accuracy in WiFi indoor localization has become attainable within controlled environments. However, existing methods encounter challenges in maintaining robustness in more complex indoor environments: angle-based methods are compromised by the significant localization errors due to unreliable Angle of Arrival (AoA) estimations, and fingerprint-based methods suffer from performance degradation due to environmental changes. In this paper, we propose RLoc, a learning-based system designed for reliable localization and tracking. The key design principle of RLoc lies in quantifying the uncertainty level arises in the AoA estimation task and then exploiting the uncertainty to enhance the reliability of localization and tracking. To this end, RLoc first manually extracts the underutilized beamwidth feature via signal processing techniques. Then, it integrates the uncertainty quantification into neural network design through Kullback-Leibler (KL) divergence loss and ensemble techniques. Finally, these quantified uncertainties guide RLoc to optimally leverage the diversity of Access Points (APs) and the temporal continuous information of AoAs. Our experiments, evaluating on two datasets gathered from commercial off-the-shelf WiFi devices, demonstrate that RLoc surpasses state-of-the-art approaches by an average of 36.27% in in-domain scenarios and 20.40% in cross-domain scenarios.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 3","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}