Wentao Xie, Huangxun Chen, Jing Wei, Jin Zhang, Qian Zhang
Smart eyewear's interaction mode has attracted significant research attention. While most commercial devices have adopted touch panels situated on the temple front of eyeglasses for interaction, this paper identifies a drawback stemming from the unparalleled plane between the touch panel and the display, which disrupts the direct mapping between gestures and the manipulated objects on display. Therefore, this paper proposes RimSense, a proof-of-concept design for smart eyewear, to introduce an alternative realm for interaction - touch gestures on eyewear rim. RimSense leverages piezoelectric (PZT) transducers to convert the eyeglass rim into a touch-sensitive surface. When users touch the rim, the alteration in the eyeglass's structural signal manifests its effect into a channel frequency response (CFR). This allows RimSense to recognize the executed touch gestures based on the collected CFR patterns. Technically, we employ a buffered chirp as the probe signal to fulfil the sensing granularity and noise resistance requirements. Additionally, we present a deep learning-based gesture recognition framework tailored for fine-grained time sequence prediction and further integrated with a Finite-State Machine (FSM) algorithm for event-level prediction to suit the interaction experience for gestures of varying durations. We implement a functional eyewear prototype with two commercial PZT transducers. RimSense can recognize eight touch gestures on the eyeglass rim and estimate gesture durations simultaneously, allowing gestures of varying lengths to serve as distinct inputs. We evaluate the performance of RimSense on 30 subjects and show that it can sense eight gestures and an additional negative class with an F1-score of 0.95 and a relative duration estimation error of 11%. We further make the system work in real-time and conduct a user study on 14 subjects to assess the practicability of RimSense through interactions with two demo applications. The user study demonstrates RimSense's good performance, high usability, learnability and enjoyability. Additionally, we conduct interviews with the subjects, and their comments provide valuable insight for future eyewear design.
{"title":"RimSense","authors":"Wentao Xie, Huangxun Chen, Jing Wei, Jin Zhang, Qian Zhang","doi":"10.1145/3631456","DOIUrl":"https://doi.org/10.1145/3631456","url":null,"abstract":"Smart eyewear's interaction mode has attracted significant research attention. While most commercial devices have adopted touch panels situated on the temple front of eyeglasses for interaction, this paper identifies a drawback stemming from the unparalleled plane between the touch panel and the display, which disrupts the direct mapping between gestures and the manipulated objects on display. Therefore, this paper proposes RimSense, a proof-of-concept design for smart eyewear, to introduce an alternative realm for interaction - touch gestures on eyewear rim. RimSense leverages piezoelectric (PZT) transducers to convert the eyeglass rim into a touch-sensitive surface. When users touch the rim, the alteration in the eyeglass's structural signal manifests its effect into a channel frequency response (CFR). This allows RimSense to recognize the executed touch gestures based on the collected CFR patterns. Technically, we employ a buffered chirp as the probe signal to fulfil the sensing granularity and noise resistance requirements. Additionally, we present a deep learning-based gesture recognition framework tailored for fine-grained time sequence prediction and further integrated with a Finite-State Machine (FSM) algorithm for event-level prediction to suit the interaction experience for gestures of varying durations. We implement a functional eyewear prototype with two commercial PZT transducers. RimSense can recognize eight touch gestures on the eyeglass rim and estimate gesture durations simultaneously, allowing gestures of varying lengths to serve as distinct inputs. We evaluate the performance of RimSense on 30 subjects and show that it can sense eight gestures and an additional negative class with an F1-score of 0.95 and a relative duration estimation error of 11%. We further make the system work in real-time and conduct a user study on 14 subjects to assess the practicability of RimSense through interactions with two demo applications. The user study demonstrates RimSense's good performance, high usability, learnability and enjoyability. Additionally, we conduct interviews with the subjects, and their comments provide valuable insight for future eyewear design.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 9","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong-Sig Kang, Eunsu Baek, S. Son, Youngki Lee, Taesik Gong, Hyung-Sin Kim
We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5× better accuracy with 20.1× faster video conversion and 16.9× less energy consumption.
{"title":"MIRROR","authors":"Dong-Sig Kang, Eunsu Baek, S. Son, Youngki Lee, Taesik Gong, Hyung-Sin Kim","doi":"10.1145/3631420","DOIUrl":"https://doi.org/10.1145/3631420","url":null,"abstract":"We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5× better accuracy with 20.1× faster video conversion and 16.9× less energy consumption.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 51","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meagan B. Loerakker, Jasmin Niess, Marit Bentvelzen, Paweł W. Woźniak
Wearable personal trackers offer exciting opportunities to contribute to one's well-being, but they also can foster negative experiences. It remains a challenge to understand how we can design personal informatics experiences that help users frame their data in a positive manner and foster self-compassion. To explore this, we conducted a study where we compared different visualisations for user-generated screen time data. We examined positive, neutral and negative framings of the data and whether or not a point of reference was provided in a visualisation. The results show that framing techniques have a significant effect on reflection, rumination and self-compassion. We contribute insights into what design features of data representations can support positive experiences in personal informatics.
{"title":"Designing Data Visualisations for Self-Compassion in Personal Informatics","authors":"Meagan B. Loerakker, Jasmin Niess, Marit Bentvelzen, Paweł W. Woźniak","doi":"10.1145/3631448","DOIUrl":"https://doi.org/10.1145/3631448","url":null,"abstract":"Wearable personal trackers offer exciting opportunities to contribute to one's well-being, but they also can foster negative experiences. It remains a challenge to understand how we can design personal informatics experiences that help users frame their data in a positive manner and foster self-compassion. To explore this, we conducted a study where we compared different visualisations for user-generated screen time data. We examined positive, neutral and negative framings of the data and whether or not a point of reference was provided in a visualisation. The results show that framing techniques have a significant effect on reflection, rumination and self-compassion. We contribute insights into what design features of data representations can support positive experiences in personal informatics.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 8","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunpeng Song, Yiheng Bian, Xiaorui Wang, Zhongmin Cai
Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans.
{"title":"Learning from User-driven Events to Generate Automation Sequences","authors":"Yunpeng Song, Yiheng Bian, Xiaorui Wang, Zhongmin Cai","doi":"10.1145/3631427","DOIUrl":"https://doi.org/10.1145/3631427","url":null,"abstract":"Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 4","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liqiong Chang, Xiaofeng Yang, Ruyue Liu, Guodong Xie, Fuwei Wang, Ju Wang
Material sensing is crucial in many emerging applications, such as waste classification and hazardous material detection. Although existing Radio Frequency (RF) signal based systems achieved great success, they have limited identification accuracy when either RF signals can not penetrate through a target or a target has different outer and inner materials. This paper introduces a Frequency Selective Surface (FSS) tag based high accuracy material identification system, namely FSS-Tag, which utilises both the penetrating signals and the coupling effect. Specifically, we design and attach a FSS tag to a target, and use frequency responses of the tag for material sensing, since different target materials have different frequency responses. The key advantage of our system is that, when RF signals pass through a target with the FSS tag, the penetrating signal responds more to the inner material, and the coupling effect (between the target and the tag) reflects more about the outer material; thus, one can achieve a higher sensing accuracy. The challenge lies in how to find optimal tag design parameters so that the frequency response of different target materials can be clearly distinguished. We address this challenge by establishing a tag parameter optimization model. Real-world experiments show that FSS-Tag achieves more than 91% accuracy on identifying eight common materials, and improves the accuracy by up to 38% and 8% compared with the state of the art (SOTA) penetrating signal based method TagScan and the SOTA coupling effect based method Tagtag.
{"title":"FSS-Tag","authors":"Liqiong Chang, Xiaofeng Yang, Ruyue Liu, Guodong Xie, Fuwei Wang, Ju Wang","doi":"10.1145/3631457","DOIUrl":"https://doi.org/10.1145/3631457","url":null,"abstract":"Material sensing is crucial in many emerging applications, such as waste classification and hazardous material detection. Although existing Radio Frequency (RF) signal based systems achieved great success, they have limited identification accuracy when either RF signals can not penetrate through a target or a target has different outer and inner materials. This paper introduces a Frequency Selective Surface (FSS) tag based high accuracy material identification system, namely FSS-Tag, which utilises both the penetrating signals and the coupling effect. Specifically, we design and attach a FSS tag to a target, and use frequency responses of the tag for material sensing, since different target materials have different frequency responses. The key advantage of our system is that, when RF signals pass through a target with the FSS tag, the penetrating signal responds more to the inner material, and the coupling effect (between the target and the tag) reflects more about the outer material; thus, one can achieve a higher sensing accuracy. The challenge lies in how to find optimal tag design parameters so that the frequency response of different target materials can be clearly distinguished. We address this challenge by establishing a tag parameter optimization model. Real-world experiments show that FSS-Tag achieves more than 91% accuracy on identifying eight common materials, and improves the accuracy by up to 38% and 8% compared with the state of the art (SOTA) penetrating signal based method TagScan and the SOTA coupling effect based method Tagtag.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"10 42","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Liu, Anlan Yu, Leye Wang, Bin Guo, Yang Li, E. Yi, Daqing Zhang
In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).
{"title":"UniFi","authors":"Yan Liu, Anlan Yu, Leye Wang, Bin Guo, Yang Li, E. Yi, Daqing Zhang","doi":"10.1145/3631429","DOIUrl":"https://doi.org/10.1145/3631429","url":null,"abstract":"In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"14 8","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marvin Martin, Etienne Meunier, P. Moreau, Jean-Eudes Gadenne, J. Dautel, Félicien Catherin, Eugene Pinsky, Reza Rawassizadeh
Due to global warming, sharks are moving closer to the beaches, affecting the risk to humans and their own lives. Within the past decade, several technologies were developed to reduce the risks for swimmers and surfers. This study proposes a robust method based on computer vision to detect sharks using an underwater camera monitoring system to secure coastlines. The system is autonomous, environment-friendly, and requires low maintenance. 43,679 images extracted from 175 hours of videos of marine life were used to train our algorithms. Our approach allows the collection and analysis of videos in real-time using an autonomous underwater camera connected to a smart buoy charged with solar panels. The videos are processed by a Domain Adversarial Convolutional Neural Network to discern sharks regardless of the background environment with an F2-score of 83.2% and a recall of 90.9%, while human experts have an F2-score of 94% and a recall of 95.7%.
{"title":"ADA-SHARK","authors":"Marvin Martin, Etienne Meunier, P. Moreau, Jean-Eudes Gadenne, J. Dautel, Félicien Catherin, Eugene Pinsky, Reza Rawassizadeh","doi":"10.1145/3631416","DOIUrl":"https://doi.org/10.1145/3631416","url":null,"abstract":"Due to global warming, sharks are moving closer to the beaches, affecting the risk to humans and their own lives. Within the past decade, several technologies were developed to reduce the risks for swimmers and surfers. This study proposes a robust method based on computer vision to detect sharks using an underwater camera monitoring system to secure coastlines. The system is autonomous, environment-friendly, and requires low maintenance. 43,679 images extracted from 175 hours of videos of marine life were used to train our algorithms. Our approach allows the collection and analysis of videos in real-time using an autonomous underwater camera connected to a smart buoy charged with solar panels. The videos are processed by a Domain Adversarial Convolutional Neural Network to discern sharks regardless of the background environment with an F2-score of 83.2% and a recall of 90.9%, while human experts have an F2-score of 94% and a recall of 95.7%.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 2","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wearable sensor-based human activity recognition (HAR) has gained significant attention due to the widespread use of smart wearable devices. However, variations in different subjects can cause a domain shift that impedes the scaling of the recognition model. Unsupervised domain adaptation has been proposed as a solution to recognize activities in new, unlabeled target domains by training the source and target data together. However, the need for accessing source data raises privacy concerns. Source-free domain adaptation has emerged as a practical setting, where only a pre-trained source model is provided for the unlabeled target domain. This setup aligns with the need for personalized activity model adaptation on target local devices. As the edge devices are resource-constrained with limited memory, it is crucial to take the computational efficiency, i.e., memory cost into consideration. In this paper, we develop a source-free domain adaptation framework for wearable sensor-based HAR, with a focus on computational efficiency for target edge devices. Firstly, we design a lightweight add-on module called adapter to adapt the frozen pre-trained model to the unlabeled target domain. Secondly, to optimize the adapter, we adopt a simple yet effective model adaptation method that leverages local representation similarity and prediction consistency. Additionally, we design a set of sample selection optimization strategies to select samples effective for adaptation and further enhance computational efficiency while maintaining adaptation performance. Our extensive experiments on three datasets demonstrate that our method achieves comparable recognition accuracy to the state-of-the-art source free domain adaptation methods with fewer than 1% of the parameters updated and saves up to 4.99X memory cost.
由于智能可穿戴设备的广泛使用,基于可穿戴传感器的人类活动识别(HAR)受到了广泛关注。然而,不同主体的变化会导致领域转移,从而阻碍识别模型的扩展。有人提出了一种无监督领域适应解决方案,通过将源数据和目标数据一起训练,在新的、无标记的目标领域中识别活动。然而,访问源数据的需要会引发隐私问题。无源域适配已成为一种实用的设置,在这种设置中,只为未标记的目标域提供预先训练好的源模型。这种设置符合在目标本地设备上进行个性化活动模型适配的需求。由于边缘设备资源有限,内存有限,因此必须考虑计算效率,即内存成本。在本文中,我们为基于传感器的可穿戴 HAR 开发了一个无源域适配框架,重点关注目标边缘设备的计算效率。首先,我们设计了一个名为适配器的轻量级附加模块,用于将冻结的预训练模型适配到未标记的目标领域。其次,为了优化适配器,我们采用了一种简单而有效的模型适配方法,该方法利用了局部表示相似性和预测一致性。此外,我们还设计了一套样本选择优化策略,以选择对适配有效的样本,并在保持适配性能的同时进一步提高计算效率。我们在三个数据集上进行的大量实验证明,我们的方法只需更新不到 1% 的参数,就能达到与最先进的无源域适配方法相当的识别准确率,并节省高达 4.99 倍的内存成本。
{"title":"SF-Adapter","authors":"Hua Kang, Qingyong Hu, Qian Zhang","doi":"10.1145/3631428","DOIUrl":"https://doi.org/10.1145/3631428","url":null,"abstract":"Wearable sensor-based human activity recognition (HAR) has gained significant attention due to the widespread use of smart wearable devices. However, variations in different subjects can cause a domain shift that impedes the scaling of the recognition model. Unsupervised domain adaptation has been proposed as a solution to recognize activities in new, unlabeled target domains by training the source and target data together. However, the need for accessing source data raises privacy concerns. Source-free domain adaptation has emerged as a practical setting, where only a pre-trained source model is provided for the unlabeled target domain. This setup aligns with the need for personalized activity model adaptation on target local devices. As the edge devices are resource-constrained with limited memory, it is crucial to take the computational efficiency, i.e., memory cost into consideration. In this paper, we develop a source-free domain adaptation framework for wearable sensor-based HAR, with a focus on computational efficiency for target edge devices. Firstly, we design a lightweight add-on module called adapter to adapt the frozen pre-trained model to the unlabeled target domain. Secondly, to optimize the adapter, we adopt a simple yet effective model adaptation method that leverages local representation similarity and prediction consistency. Additionally, we design a set of sample selection optimization strategies to select samples effective for adaptation and further enhance computational efficiency while maintaining adaptation performance. Our extensive experiments on three datasets demonstrate that our method achieves comparable recognition accuracy to the state-of-the-art source free domain adaptation methods with fewer than 1% of the parameters updated and saves up to 4.99X memory cost.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"1 4","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel sensorless approach to indoor localization by leveraging natural language conversations with users, which we call conversational localization. To show the feasibility of conversational localization, we develop a proof-of-concept system that guides users to describe their surroundings in a chat and estimates their position based on the information they provide. We devised a modular architecture for our system with four modules. First, we construct an entity database with available image-based floor maps. Second, we enable the dynamic identification and scoring of information provided by users through our utterance processing module. Then, we implement a conversational agent that can intelligently strategize and guide the interaction to elicit localizationally valuable information from users. Finally, we employ visibility catchment area and line-of-sight heuristics to generate spatial estimates for the user's location. We conduct two user studies in designing and testing the system. We collect 800 natural language descriptions of unfamiliar indoor spaces in an online crowdsourcing study to learn the feasibility of extracting localizationally useful entities from user utterances. We then conduct a field study with 10 participants at 10 locations to evaluate the feasibility and performance of conversational localization. The results show that conversational localization can achieve within-10 meter localization accuracy at eight out of the ten study sites, showing the technique's utility for classes of indoor location-based services.
{"title":"Conversational Localization","authors":"Smitha Sheshadri, Kotaro Hara","doi":"10.1145/3631404","DOIUrl":"https://doi.org/10.1145/3631404","url":null,"abstract":"We propose a novel sensorless approach to indoor localization by leveraging natural language conversations with users, which we call conversational localization. To show the feasibility of conversational localization, we develop a proof-of-concept system that guides users to describe their surroundings in a chat and estimates their position based on the information they provide. We devised a modular architecture for our system with four modules. First, we construct an entity database with available image-based floor maps. Second, we enable the dynamic identification and scoring of information provided by users through our utterance processing module. Then, we implement a conversational agent that can intelligently strategize and guide the interaction to elicit localizationally valuable information from users. Finally, we employ visibility catchment area and line-of-sight heuristics to generate spatial estimates for the user's location. We conduct two user studies in designing and testing the system. We collect 800 natural language descriptions of unfamiliar indoor spaces in an online crowdsourcing study to learn the feasibility of extracting localizationally useful entities from user utterances. We then conduct a field study with 10 participants at 10 locations to evaluate the feasibility and performance of conversational localization. The results show that conversational localization can achieve within-10 meter localization accuracy at eight out of the ten study sites, showing the technique's utility for classes of indoor location-based services.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 7","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke He, Chentao Li, Yongjie Duan, Jianjiang Feng, Jie Zhou
Several studies have explored the estimation of finger pose/angle to enhance the expressiveness of touchscreens. However, the accuracy of previous algorithms is limited by large estimation errors, and the sequential output angles are unstable, making it difficult to meet the demands of practical applications. We believe the defect arises from improper rotation representation, the lack of time-series modeling, and the difficulty in accommodating individual differences among users. To address these issues, we conduct in-depth study of rotation representation for the 2D pose problem by minimizing the errors between representation space and original space. A deep learning model, TrackPose, using a self-attention mechanism is proposed for time-series modeling to improve accuracy and stability of finger pose. A registration application on a mobile phone is developed to collect touchscreen images of each new user without the use of optical tracking device. The combination of the three measures mentioned above has resulted in a 33% reduction in the angle estimation error, 47% for the yaw angle especially. Additionally, the instability of sequential estimations, measured by the proposed metric MAEΔ, is reduced by 62%. User study further confirms the effectiveness of our proposed algorithm.
{"title":"TrackPose","authors":"Ke He, Chentao Li, Yongjie Duan, Jianjiang Feng, Jie Zhou","doi":"10.1145/3631459","DOIUrl":"https://doi.org/10.1145/3631459","url":null,"abstract":"Several studies have explored the estimation of finger pose/angle to enhance the expressiveness of touchscreens. However, the accuracy of previous algorithms is limited by large estimation errors, and the sequential output angles are unstable, making it difficult to meet the demands of practical applications. We believe the defect arises from improper rotation representation, the lack of time-series modeling, and the difficulty in accommodating individual differences among users. To address these issues, we conduct in-depth study of rotation representation for the 2D pose problem by minimizing the errors between representation space and original space. A deep learning model, TrackPose, using a self-attention mechanism is proposed for time-series modeling to improve accuracy and stability of finger pose. A registration application on a mobile phone is developed to collect touchscreen images of each new user without the use of optical tracking device. The combination of the three measures mentioned above has resulted in a 33% reduction in the angle estimation error, 47% for the yaw angle especially. Additionally, the instability of sequential estimations, measured by the proposed metric MAEΔ, is reduced by 62%. User study further confirms the effectiveness of our proposed algorithm.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 7","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}