The proliferation of the Internet of Things is calling for new modalities that enable human interaction with smart objects. Recent research has explored RFID tags as passive sensors to detect finger touch. However, existing approaches either rely on custom-built RFID readers or are limited to pre-trained finger-swiping gestures. In this paper, we introduce KeyStub, which can discriminate multiple discrete keystrokes on an RFID tag. KeyStub interfaces with commodity RFID ICs with multiple microwave-band resonant stubs as keys. Each stub's geometry is designed to create a predefined impedance mismatch to the RFID IC upon a keystroke, which in turn translates into a known amplitude and phase shift, remotely detectable by an RFID reader. KeyStub combines two ICs' signals through a single common-mode antenna and performs differential detection to evade the need for calibration and ensure reliability in heavy multi-path environments. Our experiments using a commercial-off-the-shelf RFID reader and ICs show that up to 8 buttons can be detected and decoded with accuracy greater than 95%. KeyStub points towards a novel way of using resonant stubs to augment RF antenna structures, thus enabling new passive wireless interaction modalities.
物联网的普及要求采用新的模式来实现人类与智能物体的互动。最近的研究探索了将 RFID 标签作为被动传感器来检测手指触摸。然而,现有的方法要么依赖于定制的 RFID 阅读器,要么仅限于预先训练好的手指滑动手势。在本文中,我们介绍了 KeyStub,它可以分辨 RFID 标签上的多个离散按键。KeyStub 与商品 RFID IC 相连接,以多个微波带谐振存根作为按键。每个谐振块的几何形状都经过设计,可在按键时对 RFID IC 产生预定义的阻抗失配,进而转化为已知的振幅和相移,由 RFID 阅读器远程检测。KeyStub 通过一根共模天线将两个集成电路的信号结合在一起,并进行差分检测,从而避免了校准的需要,确保了在多路径环境下的可靠性。我们使用现成的商用 RFID 阅读器和集成电路进行的实验表明,最多可检测和解码 8 个按钮,准确率超过 95%。KeyStub指出了一种使用谐振存根增强射频天线结构的新方法,从而实现了新的无源无线交互模式。
{"title":"KeyStub","authors":"John Nolan, Kun Qian, Xinyu Zhang","doi":"10.1145/3631442","DOIUrl":"https://doi.org/10.1145/3631442","url":null,"abstract":"The proliferation of the Internet of Things is calling for new modalities that enable human interaction with smart objects. Recent research has explored RFID tags as passive sensors to detect finger touch. However, existing approaches either rely on custom-built RFID readers or are limited to pre-trained finger-swiping gestures. In this paper, we introduce KeyStub, which can discriminate multiple discrete keystrokes on an RFID tag. KeyStub interfaces with commodity RFID ICs with multiple microwave-band resonant stubs as keys. Each stub's geometry is designed to create a predefined impedance mismatch to the RFID IC upon a keystroke, which in turn translates into a known amplitude and phase shift, remotely detectable by an RFID reader. KeyStub combines two ICs' signals through a single common-mode antenna and performs differential detection to evade the need for calibration and ensure reliability in heavy multi-path environments. Our experiments using a commercial-off-the-shelf RFID reader and ICs show that up to 8 buttons can be detected and decoded with accuracy greater than 95%. KeyStub points towards a novel way of using resonant stubs to augment RF antenna structures, thus enabling new passive wireless interaction modalities.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 6","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a study on the touch precision of an eye-free, body-based interface using on-body and near-body touch methods with and without skin contact. We evaluate user touch accuracy on four different button layouts. These layouts progressively increase the number of buttons between adjacent body joints, resulting in 12, 20, 28, and 36 touch buttons distributed across the body. Our study indicates that the on-body method achieved an accuracy beyond 95% for the 12- and 20-button layouts, whereas the near-body method only for the 12-button layout. Investigating user touch patterns, we applied SVM classifiers, which boost both the on-body and near-body methods to support up to the 28-button layouts by learning individual touch patterns. However, using generalized touch patterns did not significantly improve accuracy for more complex layouts, highlighting considerable differences in individual touch habits. When evaluating user experience metrics such as workload perception, confidence, convenience, and willingness-to-use, users consistently favored the 20-button layout regardless of the touch technique used. Remarkably, the 20-button layout, when applied to on-body touch methods, does not necessitate personal touch patterns, showcasing an optimal balance of practicality, effectiveness, and user experience without the need for trained models. In contrast, the near-body touch targeting the 20-button layout needs a personalized model; otherwise, the 12-button layout offers the best immediate practicality.
{"title":"BodyTouch","authors":"Wen-Wei Cheng, Liwei Chan","doi":"10.1145/3631426","DOIUrl":"https://doi.org/10.1145/3631426","url":null,"abstract":"This paper presents a study on the touch precision of an eye-free, body-based interface using on-body and near-body touch methods with and without skin contact. We evaluate user touch accuracy on four different button layouts. These layouts progressively increase the number of buttons between adjacent body joints, resulting in 12, 20, 28, and 36 touch buttons distributed across the body. Our study indicates that the on-body method achieved an accuracy beyond 95% for the 12- and 20-button layouts, whereas the near-body method only for the 12-button layout. Investigating user touch patterns, we applied SVM classifiers, which boost both the on-body and near-body methods to support up to the 28-button layouts by learning individual touch patterns. However, using generalized touch patterns did not significantly improve accuracy for more complex layouts, highlighting considerable differences in individual touch habits. When evaluating user experience metrics such as workload perception, confidence, convenience, and willingness-to-use, users consistently favored the 20-button layout regardless of the touch technique used. Remarkably, the 20-button layout, when applied to on-body touch methods, does not necessitate personal touch patterns, showcasing an optimal balance of practicality, effectiveness, and user experience without the need for trained models. In contrast, the near-body touch targeting the 20-button layout needs a personalized model; otherwise, the 12-button layout offers the best immediate practicality.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"2 2","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A variety of consumer Augmented Reality (AR) applications have been released on mobile devices and novel immersive headsets over the last five years, creating a breadth of new AR-enabled experiences. However, these applications, particularly those designed for immersive headsets, require users to employ unfamiliar gestural input and adopt novel interaction paradigms. To better understand how everyday users discover gestures and classify the types of interaction challenges they face, we observed how 25 novices from diverse backgrounds and technical knowledge used four different AR applications requiring a range of interaction techniques. A detailed analysis of gesture interaction traces showed that users struggled to discover the correct gestures, with the majority of errors occurring when participants could not determine the correct sequence of actions to perform or could not evaluate their actions. To further reflect on the prevalence of our findings, we carried out an expert validation study with eight professional AR designers, engineers, and researchers. We discuss implications for designing discoverable gestural input techniques that align with users' mental models, inventing AR-specific onboarding and help systems, and enhancing system-level machine recognition.
过去五年来,移动设备和新型沉浸式头戴设备上发布了各种消费类增强现实(AR)应用,创造了大量新的增强现实体验。然而,这些应用,尤其是那些为沉浸式头显设计的应用,需要用户使用陌生的手势输入并采用新颖的交互模式。为了更好地了解日常用户如何发现手势并对他们所面临的交互挑战类型进行分类,我们观察了 25 位来自不同背景和技术知识的新手如何使用四种不同的 AR 应用程序,这些应用程序需要一系列的交互技术。对手势交互痕迹的详细分析显示,用户在发现正确的手势方面困难重重,大多数错误发生在参与者无法确定正确的操作顺序或无法评估自己的操作时。为了进一步反思我们研究结果的普遍性,我们与八位专业 AR 设计师、工程师和研究人员开展了一项专家验证研究。我们讨论了设计符合用户心理模型的可发现手势输入技术、发明AR专用入门和帮助系统以及增强系统级机器识别的意义。
{"title":"Do I Just Tap My Headset?","authors":"Anjali Khurana, Michael Glueck, Parmit K. Chilana","doi":"10.1145/3631451","DOIUrl":"https://doi.org/10.1145/3631451","url":null,"abstract":"A variety of consumer Augmented Reality (AR) applications have been released on mobile devices and novel immersive headsets over the last five years, creating a breadth of new AR-enabled experiences. However, these applications, particularly those designed for immersive headsets, require users to employ unfamiliar gestural input and adopt novel interaction paradigms. To better understand how everyday users discover gestures and classify the types of interaction challenges they face, we observed how 25 novices from diverse backgrounds and technical knowledge used four different AR applications requiring a range of interaction techniques. A detailed analysis of gesture interaction traces showed that users struggled to discover the correct gestures, with the majority of errors occurring when participants could not determine the correct sequence of actions to perform or could not evaluate their actions. To further reflect on the prevalence of our findings, we carried out an expert validation study with eight professional AR designers, engineers, and researchers. We discuss implications for designing discoverable gestural input techniques that align with users' mental models, inventing AR-specific onboarding and help systems, and enhancing system-level machine recognition.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"4 3","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenqiang Chen, Yexin Hu, Wei Song, Yingcheng Liu, Antonio Torralba, Wojciech Matusik
Human mesh reconstruction is essential for various applications, including virtual reality, motion capture, sports performance analysis, and healthcare monitoring. In healthcare contexts such as nursing homes, it is crucial to employ plausible and non-invasive methods for human mesh reconstruction that preserve privacy and dignity. Traditional vision-based techniques encounter challenges related to occlusion, viewpoint limitations, lighting conditions, and privacy concerns. In this research, we present CAvatar, a real-time human mesh reconstruction approach that innovatively utilizes pressure maps recorded by a tactile carpet as input. This advanced, non-intrusive technology obviates the need for cameras during usage, thereby safeguarding privacy. Our approach addresses several challenges, such as the limited spatial resolution of tactile sensors, extracting meaningful information from noisy pressure maps, and accommodating user variations and multiple users. We have developed an attention-based deep learning network, complemented by a discriminator network, to predict 3D human pose and shape from 2D pressure maps with notable accuracy. Our model demonstrates promising results, with a mean per joint position error (MPJPE) of 5.89 cm and a per vertex error (PVE) of 6.88 cm. To the best of our knowledge, we are the first to generate 3D mesh of human activities solely using tactile carpet signals, offering a novel approach that addresses privacy concerns and surpasses the limitations of existing vision-based and wearable solutions. The demonstration of CAvatar is shown at https://youtu.be/ZpO3LEsgV7Y.
{"title":"CAvatar","authors":"Wenqiang Chen, Yexin Hu, Wei Song, Yingcheng Liu, Antonio Torralba, Wojciech Matusik","doi":"10.1145/3631424","DOIUrl":"https://doi.org/10.1145/3631424","url":null,"abstract":"Human mesh reconstruction is essential for various applications, including virtual reality, motion capture, sports performance analysis, and healthcare monitoring. In healthcare contexts such as nursing homes, it is crucial to employ plausible and non-invasive methods for human mesh reconstruction that preserve privacy and dignity. Traditional vision-based techniques encounter challenges related to occlusion, viewpoint limitations, lighting conditions, and privacy concerns. In this research, we present CAvatar, a real-time human mesh reconstruction approach that innovatively utilizes pressure maps recorded by a tactile carpet as input. This advanced, non-intrusive technology obviates the need for cameras during usage, thereby safeguarding privacy. Our approach addresses several challenges, such as the limited spatial resolution of tactile sensors, extracting meaningful information from noisy pressure maps, and accommodating user variations and multiple users. We have developed an attention-based deep learning network, complemented by a discriminator network, to predict 3D human pose and shape from 2D pressure maps with notable accuracy. Our model demonstrates promising results, with a mean per joint position error (MPJPE) of 5.89 cm and a per vertex error (PVE) of 6.88 cm. To the best of our knowledge, we are the first to generate 3D mesh of human activities solely using tactile carpet signals, offering a novel approach that addresses privacy concerns and surpasses the limitations of existing vision-based and wearable solutions. The demonstration of CAvatar is shown at https://youtu.be/ZpO3LEsgV7Y.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 7","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objects engaged by users' hands contain rich contextual information for their strong correlation with user activities. Tools such as toothbrushes and wipes indicate cleansing and sanitation, while mice and keyboards imply work. Much research has been endeavored to sense hand-engaged objects to supply wearables with implicit interactions or ambient computing with personal informatics. We propose TextureSight, a smart-ring sensor that detects hand-engaged objects by detecting their distinctive surface textures using laser speckle imaging on a ring form factor. We conducted a two-day experience sampling study to investigate the unicity and repeatability of the object-texture combinations across routine objects. We grounded our sensing with a theoretical model and simulations, powered it with state-of-the-art deep neural net techniques, and evaluated it with a user study. TextureSight constitutes a valuable addition to the literature for its capability to sense passive objects without emission of EMI or vibration and its elimination of lens for preserving user privacy, leading to a new, practical method for activity recognition and context-aware computing.
{"title":"TextureSight","authors":"Xue Wang, Yang Zhang","doi":"10.1145/3631413","DOIUrl":"https://doi.org/10.1145/3631413","url":null,"abstract":"Objects engaged by users' hands contain rich contextual information for their strong correlation with user activities. Tools such as toothbrushes and wipes indicate cleansing and sanitation, while mice and keyboards imply work. Much research has been endeavored to sense hand-engaged objects to supply wearables with implicit interactions or ambient computing with personal informatics. We propose TextureSight, a smart-ring sensor that detects hand-engaged objects by detecting their distinctive surface textures using laser speckle imaging on a ring form factor. We conducted a two-day experience sampling study to investigate the unicity and repeatability of the object-texture combinations across routine objects. We grounded our sensing with a theoretical model and simulations, powered it with state-of-the-art deep neural net techniques, and evaluated it with a user study. TextureSight constitutes a valuable addition to the literature for its capability to sense passive objects without emission of EMI or vibration and its elimination of lens for preserving user privacy, leading to a new, practical method for activity recognition and context-aware computing.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 6","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advancement of wireless sensing technologies, RF-based contact-less liquid detection attracts more and more attention. Compared with other RF devices, the mmWave radar has the advantages of large bandwidth and low cost. While existing radar-based liquid detection systems demonstrate promising performance, they still have a shortcoming that in the detection result depends on container-related factors (e.g., container placement, container caliber, and container material). In this paper, to enable container-independent liquid detection with a COTS mmWave radar, we propose a dual-reflection model by exploring reflections from different interfaces of the liquid container. Specifically, we design a pair of amplitude ratios based on the signals reflected from different interfaces, and theoretically demonstrate how the refractive index of liquids can be estimated by eliminating the container's impact. To validate the proposed approach, we implement a liquid detection system LiqDetector. Experimental results show that LiqDetector achieves cross-container estimation of the liquid's refractive index with a mean absolute percentage error (MAPE) of about 4.4%. Moreover, the classification accuracies for 6 different liquids and alcohol with different strengths (even a difference of 1%) exceed 96% and 95%, respectively. To the best of our knowledge, this is the first study that achieves container-independent liquid detection based on the COTS mmWave radar by leveraging only one pair of Tx-Rx antennas.
{"title":"LiqDetector","authors":"Zhu Wang, Yifan Guo, Zhihui Ren, Wenchao Song, Zhuo Sun, Chaoxiong Chen, Bin Guo, Zhiwen Yu","doi":"10.1145/3631443","DOIUrl":"https://doi.org/10.1145/3631443","url":null,"abstract":"With the advancement of wireless sensing technologies, RF-based contact-less liquid detection attracts more and more attention. Compared with other RF devices, the mmWave radar has the advantages of large bandwidth and low cost. While existing radar-based liquid detection systems demonstrate promising performance, they still have a shortcoming that in the detection result depends on container-related factors (e.g., container placement, container caliber, and container material). In this paper, to enable container-independent liquid detection with a COTS mmWave radar, we propose a dual-reflection model by exploring reflections from different interfaces of the liquid container. Specifically, we design a pair of amplitude ratios based on the signals reflected from different interfaces, and theoretically demonstrate how the refractive index of liquids can be estimated by eliminating the container's impact. To validate the proposed approach, we implement a liquid detection system LiqDetector. Experimental results show that LiqDetector achieves cross-container estimation of the liquid's refractive index with a mean absolute percentage error (MAPE) of about 4.4%. Moreover, the classification accuracies for 6 different liquids and alcohol with different strengths (even a difference of 1%) exceed 96% and 95%, respectively. To the best of our knowledge, this is the first study that achieves container-independent liquid detection based on the COTS mmWave radar by leveraging only one pair of Tx-Rx antennas.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 29","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy
This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 μm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.
{"title":"Scribe","authors":"Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy","doi":"10.1145/3631411","DOIUrl":"https://doi.org/10.1145/3631411","url":null,"abstract":"This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 μm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 3","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hankai Liu, Xiulong Liu, Xin Xie, Xinyu Tong, Keqiu Li
The difficulty in obtaining targets' identity poses a significant obstacle to the pursuit of personalized and customized millimeter-wave (mmWave) sensing. Existing solutions that learn individual differences from signal features have limitations in practical applications. This paper presents a Personalized mmWave-based human Tracking system, PmTrack, by introducing inertial measurement units (IMUs) as identity indicators. Widely available in portable devices such as smartwatches and smartphones, IMUs utilize existing wireless networks for data uploading of identity and data, and are therefore able to assist in radar target identification in a lightweight manner with little deployment and carrying burden for users. PmTrack innovatively adopts orientation as the matching feature, thus well overcoming the data heterogeneity between radar and IMU while avoiding the effect of cumulative errors. In the implementation of PmTrack, we propose a comprehensive set of optimization methods in detection enhancement, interference suppression, continuity maintenance, and trajectory correction, which successfully solved a series of practical problems caused by the three major challenges of weak reflection, point cloud overlap, and body-bounce ghost in multi-person tracking. In addition, an orientation correction method is proposed to overcome the IMU gimbal lock. Extensive experimental results demonstrate that PmTrack achieves an identification accuracy of 98% and 95% with five people in the hall and meeting room, respectively.
{"title":"PmTrack","authors":"Hankai Liu, Xiulong Liu, Xin Xie, Xinyu Tong, Keqiu Li","doi":"10.1145/3631433","DOIUrl":"https://doi.org/10.1145/3631433","url":null,"abstract":"The difficulty in obtaining targets' identity poses a significant obstacle to the pursuit of personalized and customized millimeter-wave (mmWave) sensing. Existing solutions that learn individual differences from signal features have limitations in practical applications. This paper presents a Personalized mmWave-based human Tracking system, PmTrack, by introducing inertial measurement units (IMUs) as identity indicators. Widely available in portable devices such as smartwatches and smartphones, IMUs utilize existing wireless networks for data uploading of identity and data, and are therefore able to assist in radar target identification in a lightweight manner with little deployment and carrying burden for users. PmTrack innovatively adopts orientation as the matching feature, thus well overcoming the data heterogeneity between radar and IMU while avoiding the effect of cumulative errors. In the implementation of PmTrack, we propose a comprehensive set of optimization methods in detection enhancement, interference suppression, continuity maintenance, and trajectory correction, which successfully solved a series of practical problems caused by the three major challenges of weak reflection, point cloud overlap, and body-bounce ghost in multi-person tracking. In addition, an orientation correction method is proposed to overcome the IMU gimbal lock. Extensive experimental results demonstrate that PmTrack achieves an identification accuracy of 98% and 95% with five people in the hall and meeting room, respectively.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 8","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuheng Wei, Jie Xiong, Hui Liu, Yingtao Yu, Jiangtao Pan, Junzhao Du
Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.
{"title":"AdaStreamLite","authors":"Yuheng Wei, Jie Xiong, Hui Liu, Yingtao Yu, Jiangtao Pan, Junzhao Du","doi":"10.1145/3631460","DOIUrl":"https://doi.org/10.1145/3631460","url":null,"abstract":"Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 22","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rishiraj Adhikary, M. Sadeh, N. Batra, Mayank Goel
Smartphones and smartwatches have contributed significantly to fitness monitoring by providing real-time statistics, thanks to accurate tracking of physiological indices such as heart rate. However, the estimation of calories burned during exercise is inaccurate and cannot be used for medical diagnosis. In this work, we present JoulesEye, a smartphone thermal camera-based system that can accurately estimate calorie burn by monitoring respiration rate. We evaluated JoulesEye on 54 participants who performed high intensity cycling and running. The mean absolute percentage error (MAPE) of JoulesEye was 5.8%, which is significantly better than the MAPE of 37.6% observed with commercial smartwatch-based methods that only use heart rate. Finally, we show that an ultra-low-resolution thermal camera that is small enough to fit inside a watch or other wearables is sufficient for accurate calorie burn estimation. These results suggest that JoulesEye is a promising new method for accurate and reliable calorie burn estimation.
{"title":"JoulesEye","authors":"Rishiraj Adhikary, M. Sadeh, N. Batra, Mayank Goel","doi":"10.1145/3631422","DOIUrl":"https://doi.org/10.1145/3631422","url":null,"abstract":"Smartphones and smartwatches have contributed significantly to fitness monitoring by providing real-time statistics, thanks to accurate tracking of physiological indices such as heart rate. However, the estimation of calories burned during exercise is inaccurate and cannot be used for medical diagnosis. In this work, we present JoulesEye, a smartphone thermal camera-based system that can accurately estimate calorie burn by monitoring respiration rate. We evaluated JoulesEye on 54 participants who performed high intensity cycling and running. The mean absolute percentage error (MAPE) of JoulesEye was 5.8%, which is significantly better than the MAPE of 37.6% observed with commercial smartwatch-based methods that only use heart rate. Finally, we show that an ultra-low-resolution thermal camera that is small enough to fit inside a watch or other wearables is sufficient for accurate calorie burn estimation. These results suggest that JoulesEye is a promising new method for accurate and reliable calorie burn estimation.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"10 51","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}