Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, L. Shang
Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to drive conversations. Chatbots also require this capability to enable human-like companionship. They should act based on personalized, real-time, and time-evolving knowledge of their users. We define such essential knowledge as the common ground between chatbots and their users, and we propose to build a common-ground-aware dialogue system from an LLM-based module, named OS-1, to enable chatbot companionship. Hosted by eyewear, OS-1 can sense the visual and audio signals the user receives and extract real-time contextual semantics. Those semantics are categorized and recorded to formulate historical contexts from which the user's profile is distilled and evolves over time, i.e., OS-1 gradually learns about its user. OS-1 combines knowledge from real-time semantics, historical contexts, and user-specific profiles to produce a common-ground-aware prompt input into the LLM module. The LLM's output is converted to audio, spoken to the wearer when appropriate. We conduct laboratory and in-field studies to assess OS-1's ability to build common ground between the chatbot and its user. The technical feasibility and capabilities of the system are also evaluated. Our results show that by utilizing personal context, OS-1 progressively develops a better understanding of its users. This enhances user satisfaction and potentially leads to various personal service scenarios, such as emotional support and assistance.
{"title":"Can Large Language Models Be Good Companions?","authors":"Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, L. Shang","doi":"10.1145/3659600","DOIUrl":"https://doi.org/10.1145/3659600","url":null,"abstract":"Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to drive conversations. Chatbots also require this capability to enable human-like companionship. They should act based on personalized, real-time, and time-evolving knowledge of their users. We define such essential knowledge as the common ground between chatbots and their users, and we propose to build a common-ground-aware dialogue system from an LLM-based module, named OS-1, to enable chatbot companionship. Hosted by eyewear, OS-1 can sense the visual and audio signals the user receives and extract real-time contextual semantics. Those semantics are categorized and recorded to formulate historical contexts from which the user's profile is distilled and evolves over time, i.e., OS-1 gradually learns about its user. OS-1 combines knowledge from real-time semantics, historical contexts, and user-specific profiles to produce a common-ground-aware prompt input into the LLM module. The LLM's output is converted to audio, spoken to the wearer when appropriate. We conduct laboratory and in-field studies to assess OS-1's ability to build common ground between the chatbot and its user. The technical feasibility and capabilities of the system are also evaluated. Our results show that by utilizing personal context, OS-1 progressively develops a better understanding of its users. This enhances user satisfaction and potentially leads to various personal service scenarios, such as emotional support and assistance.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140985040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junqi Ma, Fusang Zhang, Beihong Jin, C. Su, Siheng Li, Zhi Wang, Jiazhi Ni
Ranging plays a crucial role in many wireless sensing applications. Among the wireless techniques employed for ranging, Ultra-Wideband (UWB) has received much attention due to its excellent performance and widespread integration into consumer-level electronics. However, the ranging accuracy of the current UWB systems is limited to the centimeter level due to bandwidth limitation, hindering their use for applications that require a very high resolution. This paper proposes a novel system that achieves sub-millimeter-level ranging accuracy on commercial UWB devices for the first time. Our approach leverages the fine-grained phase information of commercial UWB devices. To eliminate the phase drift, we design a fine-grained phase recovery method by utilizing the bi-directional messages in UWB two-way ranging. We further present a dual-frequency switching method to resolve phase ambiguity. Building upon this, we design and implement the ranging system on commercial UWB modules. Extensive experiments demonstrate that our system achieves a median ranging error of just 0.77 mm, reducing the error by 96.54% compared to the state-of-the-art method. We also present three real-life applications to showcase the fine-grained sensing capabilities of our system, including i) smart speaker control, ii) free-style user handwriting, and iii) 3D tracking for virtual-reality (VR) controllers.
{"title":"Push the Limit of Highly Accurate Ranging on Commercial UWB Devices","authors":"Junqi Ma, Fusang Zhang, Beihong Jin, C. Su, Siheng Li, Zhi Wang, Jiazhi Ni","doi":"10.1145/3659602","DOIUrl":"https://doi.org/10.1145/3659602","url":null,"abstract":"Ranging plays a crucial role in many wireless sensing applications. Among the wireless techniques employed for ranging, Ultra-Wideband (UWB) has received much attention due to its excellent performance and widespread integration into consumer-level electronics. However, the ranging accuracy of the current UWB systems is limited to the centimeter level due to bandwidth limitation, hindering their use for applications that require a very high resolution. This paper proposes a novel system that achieves sub-millimeter-level ranging accuracy on commercial UWB devices for the first time. Our approach leverages the fine-grained phase information of commercial UWB devices. To eliminate the phase drift, we design a fine-grained phase recovery method by utilizing the bi-directional messages in UWB two-way ranging. We further present a dual-frequency switching method to resolve phase ambiguity. Building upon this, we design and implement the ranging system on commercial UWB modules. Extensive experiments demonstrate that our system achieves a median ranging error of just 0.77 mm, reducing the error by 96.54% compared to the state-of-the-art method. We also present three real-life applications to showcase the fine-grained sensing capabilities of our system, including i) smart speaker control, ii) free-style user handwriting, and iii) 3D tracking for virtual-reality (VR) controllers.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140984729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Kraemer, William Gelder, Josiah D. Hester
The time for battery-free computing is now. Lithium mining depletes and pollutes local water supplies and dead batteries in landfills leak toxic metals into the ground[20][12]. Battery-free devices represent a probable future for sustainable ubiquitous computing and we will need many more new devices and programmers to bring that future into reality. Yet, energy harvesting and battery-free devices that frequently fail are challenging to program. The maker movement has organically developed a considerable variety of platforms to prototype and program ubiquitous sensing and computing devices, but only a few have been modified to be usable with energy harvesting and to hide those pesky power failures that are the norm from variable energy availability (platforms like Microsoft's Makecode and AdaFruit's CircuitPython). Many platforms, especially Arduino (the first and most famous maker platform), do not support energy harvesting devices and intermittent computing. To bridge this gap and lay a strong foundation for potential new platforms for maker programming, we build a tool called BOOTHAMMER: a lightweight assembly re-writer for ARM Thumb. BOOTHAMMER analyzes and rewrites the low-level assembly to insert careful checkpoint and restore operations to enable programs to persist through power failures. The approach is easily insertable in existing toolchains and is general-purpose enough to be resilient to future platforms and devices/chipsets. We close the loop with the user by designing a small set of program annotations that any maker coder can use to provide extra information to this low-level tool that will significantly increase checkpoint efficiency and resolution. These optional extensions represent a way to include the user in decision-making about energy harvesting while ensuring the tool supports existing platforms. We conduct an extensive evaluation using various program benchmarks with Arduino as our chosen evaluation platform. We also demonstrate the usability of this approach by evaluating BOOTHAMMER with a user study and show that makers feel very confident in their ability to write intermittent computing programs using this tool. With this new tool, we enable maker hardware and software for sustainable, energy-harvesting-based computing for all.
{"title":"User-directed Assembly Code Transformations Enabling Efficient Batteryless Arduino Applications","authors":"Christopher Kraemer, William Gelder, Josiah D. Hester","doi":"10.1145/3659590","DOIUrl":"https://doi.org/10.1145/3659590","url":null,"abstract":"The time for battery-free computing is now. Lithium mining depletes and pollutes local water supplies and dead batteries in landfills leak toxic metals into the ground[20][12]. Battery-free devices represent a probable future for sustainable ubiquitous computing and we will need many more new devices and programmers to bring that future into reality. Yet, energy harvesting and battery-free devices that frequently fail are challenging to program. The maker movement has organically developed a considerable variety of platforms to prototype and program ubiquitous sensing and computing devices, but only a few have been modified to be usable with energy harvesting and to hide those pesky power failures that are the norm from variable energy availability (platforms like Microsoft's Makecode and AdaFruit's CircuitPython). Many platforms, especially Arduino (the first and most famous maker platform), do not support energy harvesting devices and intermittent computing. To bridge this gap and lay a strong foundation for potential new platforms for maker programming, we build a tool called BOOTHAMMER: a lightweight assembly re-writer for ARM Thumb. BOOTHAMMER analyzes and rewrites the low-level assembly to insert careful checkpoint and restore operations to enable programs to persist through power failures. The approach is easily insertable in existing toolchains and is general-purpose enough to be resilient to future platforms and devices/chipsets. We close the loop with the user by designing a small set of program annotations that any maker coder can use to provide extra information to this low-level tool that will significantly increase checkpoint efficiency and resolution. These optional extensions represent a way to include the user in decision-making about energy harvesting while ensuring the tool supports existing platforms. We conduct an extensive evaluation using various program benchmarks with Arduino as our chosen evaluation platform. We also demonstrate the usability of this approach by evaluating BOOTHAMMER with a user study and show that makers feel very confident in their ability to write intermittent computing programs using this tool. With this new tool, we enable maker hardware and software for sustainable, energy-harvesting-based computing for all.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140982739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyu Wu, Fangting Xie, Yiran Fang, Zhen Liang, Quan Wan, Yufan Xiong, Xiaohui Cai
Humans spend about one-third of their lives resting. Reconstructing human dynamics in in-bed scenarios is of considerable significance in sleep studies, bedsore monitoring, and biomedical factor extractions. However, the mainstream human pose and shape estimation methods mainly focus on visual cues, facing serious issues in non-line-of-sight environments. Since in-bed scenarios contain complicated human-environment contact, pressure-sensing bedsheets provide a non-invasive and privacy-preserving approach to capture the pressure distribution on the contact surface, and have shown prospects in many downstream tasks. However, few studies focus on in-bed human mesh recovery. To explore the potential of reconstructing human meshes from the sensed pressure distribution, we first build a high-quality temporal human in-bed pose dataset, TIP, with 152K multi-modality synchronized images. We then propose a label generation pipeline for in-bed scenarios to generate reliable 3D mesh labels with a SMPLify-based optimizer. Finally, we present PIMesh, a simple yet effective temporal human shape estimator to directly generate human meshes from pressure image sequences. We conduct various experiments to evaluate PIMesh's performance, showing that PIMesh archives 79.17mm joint position errors on our TIP dataset. The results demonstrate that the pressure-sensing bedsheet could be a promising alternative for long-term in-bed human shape estimation.
人类一生中约有三分之一的时间在休息。在睡眠研究、褥疮监测和生物医学因素提取中,重建床上场景中的人体动态具有相当重要的意义。然而,主流的人体姿态和形状估计方法主要侧重于视觉线索,在非视线环境中面临严重问题。由于床上场景包含复杂的人与环境接触,压力传感床单提供了一种无创、保护隐私的方法来捕捉接触面的压力分布,并在许多下游任务中展现了前景。然而,很少有研究关注床上人体网状结构的恢复。为了探索从感应到的压力分布重建人体网格的潜力,我们首先建立了一个包含 152K 张多模态同步图像的高质量时态床内人体姿态数据集 TIP。然后,我们提出了床上场景的标签生成管道,利用基于 SMPLify 的优化器生成可靠的 3D 网格标签。最后,我们介绍了 PIMesh,这是一种简单而有效的时间人体形状估计器,可直接从压力图像序列生成人体网格。我们进行了各种实验来评估 PIMesh 的性能,结果表明 PIMesh 在 TIP 数据集上归档了 79.17 毫米的关节位置误差。结果表明,压力传感床单可作为长期床内人体形状估计的一种有前途的替代方法。
{"title":"Seeing through the Tactile","authors":"Ziyu Wu, Fangting Xie, Yiran Fang, Zhen Liang, Quan Wan, Yufan Xiong, Xiaohui Cai","doi":"10.1145/3659612","DOIUrl":"https://doi.org/10.1145/3659612","url":null,"abstract":"Humans spend about one-third of their lives resting. Reconstructing human dynamics in in-bed scenarios is of considerable significance in sleep studies, bedsore monitoring, and biomedical factor extractions. However, the mainstream human pose and shape estimation methods mainly focus on visual cues, facing serious issues in non-line-of-sight environments. Since in-bed scenarios contain complicated human-environment contact, pressure-sensing bedsheets provide a non-invasive and privacy-preserving approach to capture the pressure distribution on the contact surface, and have shown prospects in many downstream tasks. However, few studies focus on in-bed human mesh recovery. To explore the potential of reconstructing human meshes from the sensed pressure distribution, we first build a high-quality temporal human in-bed pose dataset, TIP, with 152K multi-modality synchronized images. We then propose a label generation pipeline for in-bed scenarios to generate reliable 3D mesh labels with a SMPLify-based optimizer. Finally, we present PIMesh, a simple yet effective temporal human shape estimator to directly generate human meshes from pressure image sequences. We conduct various experiments to evaluate PIMesh's performance, showing that PIMesh archives 79.17mm joint position errors on our TIP dataset. The results demonstrate that the pressure-sensing bedsheet could be a promising alternative for long-term in-bed human shape estimation.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140983342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Gebhardt, Andreas Brombach, Tiffany Luong, Otmar Hilliges, Christian Holz
The widespread use of social media significantly impacts users' emotions. Negative emotions, in particular, are frequently produced, which can drastically affect mental health. Recognizing these emotional states is essential for implementing effective warning systems for social networks. However, detecting emotions during passive social media use---the predominant mode of engagement---is challenging. We introduce the first predictive model that estimates user emotions during passive social media consumption alone. We conducted a study with 29 participants who interacted with a controlled social media feed. Our apparatus captured participants' behavior and their physiological signals while they browsed the feed and filled out self-reports from two validated emotion models. Using this data for supervised training, our emotion classifier robustly detected up to 8 emotional states and achieved 83% peak accuracy to classify affect. Our analysis shows that behavioral features were sufficient to robustly recognize participants' emotions. It further highlights that within 8 seconds following a change in media content, objective features reveal a participant's new emotional state. We show that grounding labels in a componential emotion model outperforms dimensional models in higher-resolutional state detection. Our findings also demonstrate that using emotional properties of images, predicted by a deep learning model, further improves emotion recognition.
{"title":"Detecting Users' Emotional States during Passive Social Media Use","authors":"Christoph Gebhardt, Andreas Brombach, Tiffany Luong, Otmar Hilliges, Christian Holz","doi":"10.1145/3659606","DOIUrl":"https://doi.org/10.1145/3659606","url":null,"abstract":"The widespread use of social media significantly impacts users' emotions. Negative emotions, in particular, are frequently produced, which can drastically affect mental health. Recognizing these emotional states is essential for implementing effective warning systems for social networks. However, detecting emotions during passive social media use---the predominant mode of engagement---is challenging. We introduce the first predictive model that estimates user emotions during passive social media consumption alone. We conducted a study with 29 participants who interacted with a controlled social media feed. Our apparatus captured participants' behavior and their physiological signals while they browsed the feed and filled out self-reports from two validated emotion models. Using this data for supervised training, our emotion classifier robustly detected up to 8 emotional states and achieved 83% peak accuracy to classify affect. Our analysis shows that behavioral features were sufficient to robustly recognize participants' emotions. It further highlights that within 8 seconds following a change in media content, objective features reveal a participant's new emotional state. We show that grounding labels in a componential emotion model outperforms dimensional models in higher-resolutional state detection. Our findings also demonstrate that using emotional properties of images, predicted by a deep learning model, further improves emotion recognition.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140982847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Our research investigated whether music can communicate physical activity levels in daily life. Past studies have shown that simple musical tunes can provide wellness information, but no study has examined whether musical feedback can affect daily behavior or lead to healthier habits. We conducted a within-subject study with 62 participants over a period of 76 days, providing either musical or text-based feedback on their daily physical activity. The music was built and personalized based on participants' step counts and baseline wellness perceptions. Results showed that participants were marginally more active during the music feedback compared to their baseline period, and significantly more active compared to the text-based feedback (p = 0.000). We also find that the participant's average activity may influence the musical features they find most inspiration within a song. Finally, context influenced how musical feedback was interpreted, and specific musical features correlated with higher activity levels regardless of baseline perceptions. We discuss lessons learned for designing music-based feedback systems for health communication.
{"title":"Changing Your Tune: Lessons for Using Music to Encourage Physical Activity","authors":"Matthew Clark, Afsaneh Doryab","doi":"10.1145/3659611","DOIUrl":"https://doi.org/10.1145/3659611","url":null,"abstract":"Our research investigated whether music can communicate physical activity levels in daily life. Past studies have shown that simple musical tunes can provide wellness information, but no study has examined whether musical feedback can affect daily behavior or lead to healthier habits. We conducted a within-subject study with 62 participants over a period of 76 days, providing either musical or text-based feedback on their daily physical activity. The music was built and personalized based on participants' step counts and baseline wellness perceptions. Results showed that participants were marginally more active during the music feedback compared to their baseline period, and significantly more active compared to the text-based feedback (p = 0.000). We also find that the participant's average activity may influence the musical features they find most inspiration within a song. Finally, context influenced how musical feedback was interpreted, and specific musical features correlated with higher activity levels regardless of baseline perceptions. We discuss lessons learned for designing music-based feedback systems for health communication.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140985657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taewoo Jo, Dohyeon Yeo, Gwangbin Kim, Seokhyun Hwang, SeungJun Kim
Individuals with low vision (LV) frequently face challenges in scanning performance, which in turn complicates daily activities requiring visual recognition. Although those with PVL can theoretically compensate for these scanning deficiencies through the use of active head movements, few practical applications have sought to capitalize on this potential, especially during visual recognition tasks. In this paper, we present WatchCap, a novel device that leverages the hanger reflex phenomenon to naturally elicit head movements through stimulation feedback. Our user studies, conducted with both sighted individuals in a simulated environment and people with glaucoma-related PVL, demonstrated that WatchCap's scanning-contingent stimulation enhances visual exploration. This improvement is evidenced by the fixation and saccade-related features and positive feedback from participants, which did not cause discomfort to the users. This study highlights the promise of facilitating head movements to aid those with LVs in visual recognition tasks. Critically, since WatchCap functions independently of predefined or task-specific cues, it has a wide scope of applicability, even in ambient task situations. This independence positions WatchCap to complement existing tools aimed at detailed visual information acquisition, allowing integration with existing tools and facilitating a comprehensive approach to assisting individuals with LV.
{"title":"WatchCap: Improving Scanning Efficiency in People with Low Vision through Compensatory Head Movement Stimulation","authors":"Taewoo Jo, Dohyeon Yeo, Gwangbin Kim, Seokhyun Hwang, SeungJun Kim","doi":"10.1145/3659592","DOIUrl":"https://doi.org/10.1145/3659592","url":null,"abstract":"Individuals with low vision (LV) frequently face challenges in scanning performance, which in turn complicates daily activities requiring visual recognition. Although those with PVL can theoretically compensate for these scanning deficiencies through the use of active head movements, few practical applications have sought to capitalize on this potential, especially during visual recognition tasks. In this paper, we present WatchCap, a novel device that leverages the hanger reflex phenomenon to naturally elicit head movements through stimulation feedback. Our user studies, conducted with both sighted individuals in a simulated environment and people with glaucoma-related PVL, demonstrated that WatchCap's scanning-contingent stimulation enhances visual exploration. This improvement is evidenced by the fixation and saccade-related features and positive feedback from participants, which did not cause discomfort to the users. This study highlights the promise of facilitating head movements to aid those with LVs in visual recognition tasks. Critically, since WatchCap functions independently of predefined or task-specific cues, it has a wide scope of applicability, even in ambient task situations. This independence positions WatchCap to complement existing tools aimed at detailed visual information acquisition, allowing integration with existing tools and facilitating a comprehensive approach to assisting individuals with LV.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140984758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yexu Zhou, Hai-qiang Zhao, Yiran Huang, Tobias Röddiger, Murat Kurnaz, T. Riedel, M. Beigl
Sensor-based HAR models face challenges in cross-subject generalization due to the complexities of data collection and annotation, impacting the size and representativeness of datasets. While data augmentation has been successfully employed in domains like natural language and image processing, its application in HAR remains underexplored. This study presents AutoAugHAR, an innovative two-stage gradient-based data augmentation optimization framework. AutoAugHAR is designed to take into account the unique attributes of candidate augmentation operations and the unique nature and challenges of HAR tasks. Notably, it optimizes the augmentation pipeline during HAR model training without substantially extending the training duration. In evaluations on eight inertial-measurement-units-based benchmark datasets using five HAR models, AutoAugHAR has demonstrated superior robustness and effectiveness compared to other leading data augmentation frameworks. A salient feature of AutoAugHAR is its model-agnostic design, allowing for its seamless integration with any HAR model without the need for structural modifications. Furthermore, we also demonstrate the generalizability and flexible extensibility of AutoAugHAR on four datasets from other adjacent domains. We strongly recommend its integration as a standard protocol in HAR model training and will release it as an open-source tool1.
由于数据收集和标注的复杂性,影响了数据集的规模和代表性,基于传感器的 HAR 模型在跨主体泛化方面面临挑战。虽然数据扩增已成功应用于自然语言和图像处理等领域,但其在 HAR 中的应用仍未得到充分探索。本研究提出了基于梯度的两阶段数据扩增优化框架 AutoAugHAR。AutoAugHAR 的设计考虑到了候选扩增操作的独特属性以及 HAR 任务的独特性质和挑战。值得注意的是,它能在 HAR 模型训练期间优化增强管道,而不会大幅延长训练时间。在使用五种 HAR 模型对八个基于惯性测量单位的基准数据集进行的评估中,与其他领先的数据增强框架相比,AutoAugHAR 展示了卓越的鲁棒性和有效性。AutoAugHAR 的一个显著特点是其与模型无关的设计,可与任何 HAR 模型无缝集成,无需进行结构修改。此外,我们还在其他相邻领域的四个数据集上展示了 AutoAugHAR 的通用性和灵活扩展性。我们强烈建议将其整合为 HAR 模型训练的标准协议,并将其作为开源工具发布1。
{"title":"AutoAugHAR: Automated Data Augmentation for Sensor-based Human Activity Recognition","authors":"Yexu Zhou, Hai-qiang Zhao, Yiran Huang, Tobias Röddiger, Murat Kurnaz, T. Riedel, M. Beigl","doi":"10.1145/3659589","DOIUrl":"https://doi.org/10.1145/3659589","url":null,"abstract":"Sensor-based HAR models face challenges in cross-subject generalization due to the complexities of data collection and annotation, impacting the size and representativeness of datasets. While data augmentation has been successfully employed in domains like natural language and image processing, its application in HAR remains underexplored. This study presents AutoAugHAR, an innovative two-stage gradient-based data augmentation optimization framework. AutoAugHAR is designed to take into account the unique attributes of candidate augmentation operations and the unique nature and challenges of HAR tasks. Notably, it optimizes the augmentation pipeline during HAR model training without substantially extending the training duration. In evaluations on eight inertial-measurement-units-based benchmark datasets using five HAR models, AutoAugHAR has demonstrated superior robustness and effectiveness compared to other leading data augmentation frameworks. A salient feature of AutoAugHAR is its model-agnostic design, allowing for its seamless integration with any HAR model without the need for structural modifications. Furthermore, we also demonstrate the generalizability and flexible extensibility of AutoAugHAR on four datasets from other adjacent domains. We strongly recommend its integration as a standard protocol in HAR model training and will release it as an open-source tool1.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140985058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zimo Liao, Meng Jin, Shun An, Chaoyue Niu, Fan Wu, Tao Deng, Guihai Chen
Gases in the environment can significantly affect our health and safety. As mobile devices gain popularity, we consider to explore a human-centered gas detection system that can be integrated into commercial mobile devices to realize ubiquitous gas detection. However, existing gas sensors either have too long response delays or are too cumbersome. This paper shows the feasibility of performing gas sensing by shining infrared (IR) signals emitted from our hands through the gas, allowing the system to rely on a single IR detector. The core opportunity arises from the fact that the human hand can provide stable, broadband, and omnidirectional IR radiation. Considering that IR signals experience distinct attenuation when passing through different gases or gases with different concentrations, we can integrate the human hand into the gas sensing system to enable extremely low-power and sustainable gas sensing. Yet, it is challenging to build up a robust system directly utilizing the hand's IR radiation. Practical issues include low IR radiation from the hand, unstable optical path, impact of environmental factors such as ambient temperature, etc. To tackle these issues, we on one hand modulate the IR radiation from the hand leveraging the controllability of the human hand, which improves the hand's IR radiation. On the other hand, we provide a dual-channel IR detector design to filter out the impact of environmental factors and gases in the environment. Extensive experiments show that our system can realize ethanol, gaseous water, and CO2 detection with 96.7%, 92.1% and 94.2%, respectively.
{"title":"Waving Hand as Infrared Source for Ubiquitous Gas Sensing","authors":"Zimo Liao, Meng Jin, Shun An, Chaoyue Niu, Fan Wu, Tao Deng, Guihai Chen","doi":"10.1145/3659605","DOIUrl":"https://doi.org/10.1145/3659605","url":null,"abstract":"Gases in the environment can significantly affect our health and safety. As mobile devices gain popularity, we consider to explore a human-centered gas detection system that can be integrated into commercial mobile devices to realize ubiquitous gas detection. However, existing gas sensors either have too long response delays or are too cumbersome. This paper shows the feasibility of performing gas sensing by shining infrared (IR) signals emitted from our hands through the gas, allowing the system to rely on a single IR detector. The core opportunity arises from the fact that the human hand can provide stable, broadband, and omnidirectional IR radiation. Considering that IR signals experience distinct attenuation when passing through different gases or gases with different concentrations, we can integrate the human hand into the gas sensing system to enable extremely low-power and sustainable gas sensing. Yet, it is challenging to build up a robust system directly utilizing the hand's IR radiation. Practical issues include low IR radiation from the hand, unstable optical path, impact of environmental factors such as ambient temperature, etc. To tackle these issues, we on one hand modulate the IR radiation from the hand leveraging the controllability of the human hand, which improves the hand's IR radiation. On the other hand, we provide a dual-channel IR detector design to filter out the impact of environmental factors and gases in the environment. Extensive experiments show that our system can realize ethanol, gaseous water, and CO2 detection with 96.7%, 92.1% and 94.2%, respectively.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140982362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu
Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze --- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables --- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.
{"title":"G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios","authors":"Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu","doi":"10.1145/3659623","DOIUrl":"https://doi.org/10.1145/3659623","url":null,"abstract":"Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze --- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables --- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140982957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}