Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies最新文献_第6页

RimSense 边缘感应

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631456

Wentao Xie, Huangxun Chen, Jing Wei, Jin Zhang, Qian Zhang

Smart eyewear's interaction mode has attracted significant research attention. While most commercial devices have adopted touch panels situated on the temple front of eyeglasses for interaction, this paper identifies a drawback stemming from the unparalleled plane between the touch panel and the display, which disrupts the direct mapping between gestures and the manipulated objects on display. Therefore, this paper proposes RimSense, a proof-of-concept design for smart eyewear, to introduce an alternative realm for interaction - touch gestures on eyewear rim. RimSense leverages piezoelectric (PZT) transducers to convert the eyeglass rim into a touch-sensitive surface. When users touch the rim, the alteration in the eyeglass's structural signal manifests its effect into a channel frequency response (CFR). This allows RimSense to recognize the executed touch gestures based on the collected CFR patterns. Technically, we employ a buffered chirp as the probe signal to fulfil the sensing granularity and noise resistance requirements. Additionally, we present a deep learning-based gesture recognition framework tailored for fine-grained time sequence prediction and further integrated with a Finite-State Machine (FSM) algorithm for event-level prediction to suit the interaction experience for gestures of varying durations. We implement a functional eyewear prototype with two commercial PZT transducers. RimSense can recognize eight touch gestures on the eyeglass rim and estimate gesture durations simultaneously, allowing gestures of varying lengths to serve as distinct inputs. We evaluate the performance of RimSense on 30 subjects and show that it can sense eight gestures and an additional negative class with an F1-score of 0.95 and a relative duration estimation error of 11%. We further make the system work in real-time and conduct a user study on 14 subjects to assess the practicability of RimSense through interactions with two demo applications. The user study demonstrates RimSense's good performance, high usability, learnability and enjoyability. Additionally, we conduct interviews with the subjects, and their comments provide valuable insight for future eyewear design.

智能眼镜的交互模式引起了研究人员的极大关注。虽然大多数商用设备都采用了位于眼镜镜腿前端的触摸屏来进行交互，但本文发现了触摸屏与显示屏之间存在的一个缺陷，即触摸屏与显示屏之间无与伦比的平面，破坏了手势与显示屏上的操作对象之间的直接映射。因此，本文提出了智能眼镜的概念验证设计 RimSense，以引入另一种交互领域--眼镜边框上的触摸手势。RimSense 利用压电（PZT）传感器将眼镜边框转换为触摸感应表面。当用户触摸眼镜边框时，眼镜结构信号的变化会以通道频率响应（CFR）的形式表现出来。这样，RimSense 就能根据收集到的信道频率响应模式识别所执行的触摸手势。在技术上，我们采用缓冲啁啾作为探测信号，以满足传感粒度和抗噪要求。此外，我们还提出了基于深度学习的手势识别框架，该框架专为细粒度时间序列预测而定制，并进一步与有限状态机（FSM）算法集成，用于事件级预测，以适应不同持续时间的手势的交互体验。我们利用两个商用 PZT 传感器实现了一个功能性眼镜原型。RimSense 可以识别眼镜边缘上的八种触摸手势，并同时估算手势持续时间，从而允许不同长度的手势作为不同的输入。我们在 30 名受试者身上评估了 RimSense 的性能，结果表明它能感知八种手势和一个额外的负面类别，F1 分数为 0.95，相对持续时间估计误差为 11%。我们进一步使系统实时运行，并对 14 名受试者进行了用户研究，通过与两个演示应用程序的交互来评估 RimSense 的实用性。用户研究证明了 RimSense 的良好性能、高可用性、可学习性和可欣赏性。此外，我们还对受试者进行了访谈，他们的意见为未来的眼镜设计提供了宝贵的启示。

{"title":"RimSense","authors":"Wentao Xie, Huangxun Chen, Jing Wei, Jin Zhang, Qian Zhang","doi":"10.1145/3631456","DOIUrl":"https://doi.org/10.1145/3631456","url":null,"abstract":"Smart eyewear's interaction mode has attracted significant research attention. While most commercial devices have adopted touch panels situated on the temple front of eyeglasses for interaction, this paper identifies a drawback stemming from the unparalleled plane between the touch panel and the display, which disrupts the direct mapping between gestures and the manipulated objects on display. Therefore, this paper proposes RimSense, a proof-of-concept design for smart eyewear, to introduce an alternative realm for interaction - touch gestures on eyewear rim. RimSense leverages piezoelectric (PZT) transducers to convert the eyeglass rim into a touch-sensitive surface. When users touch the rim, the alteration in the eyeglass's structural signal manifests its effect into a channel frequency response (CFR). This allows RimSense to recognize the executed touch gestures based on the collected CFR patterns. Technically, we employ a buffered chirp as the probe signal to fulfil the sensing granularity and noise resistance requirements. Additionally, we present a deep learning-based gesture recognition framework tailored for fine-grained time sequence prediction and further integrated with a Finite-State Machine (FSM) algorithm for event-level prediction to suit the interaction experience for gestures of varying durations. We implement a functional eyewear prototype with two commercial PZT transducers. RimSense can recognize eight touch gestures on the eyeglass rim and estimate gesture durations simultaneously, allowing gestures of varying lengths to serve as distinct inputs. We evaluate the performance of RimSense on 30 subjects and show that it can sense eight gestures and an additional negative class with an F1-score of 0.95 and a relative duration estimation error of 11%. We further make the system work in real-time and conduct a user study on 14 subjects to assess the practicability of RimSense through interactions with two demo applications. The user study demonstrates RimSense's good performance, high usability, learnability and enjoyability. Additionally, we conduct interviews with the subjects, and their comments provide valuable insight for future eyewear design.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 9","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MIRROR 镜子

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631420

Dong-Sig Kang, Eunsu Baek, S. Son, Youngki Lee, Taesik Gong, Hyung-Sin Kim

We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5× better accuracy with 20.1× faster video conversion and 16.9× less energy consumption.

我们介绍的 MIRROR 是一种设备上的视频虚拟试穿（VTO）系统，可提供逼真、私密和快速的移动服装购物体验。尽管用于虚拟试穿的生成式对抗网络（GANs）最近取得了进展，但 MIRROR 的设计仍面临两个挑战：（1）由于训练数据有限，错过了各种姿势、体型和背景，导致数据不一致；（2）本地计算开销大，仅转换单个视频就要耗费 24% 的电池。为了缓解这些问题，我们提出了一种可通用的 VTO GAN，它不仅能识别复杂的人体语义，还能捕捉领域不变特征，而无需额外的训练数据。此外，我们还精心设计了轻量级、可靠的服装/姿势跟踪，无需神经网络计算即可生成精细的像素扭曲流。作为一个整体系统，MIRROR 将新的 VTO GAN 和跟踪方法与细致的前/后处理集成在一起，分两个不同阶段（在线/离线）运行。我们在安卓智能手机和真实用户视频上的研究结果表明，与最先进的 VTO GAN 相比，MIRROR 的精确度提高了 6.5 倍，视频转换速度提高了 20.1 倍，能耗降低了 16.9 倍。

{"title":"MIRROR","authors":"Dong-Sig Kang, Eunsu Baek, S. Son, Youngki Lee, Taesik Gong, Hyung-Sin Kim","doi":"10.1145/3631420","DOIUrl":"https://doi.org/10.1145/3631420","url":null,"abstract":"We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5× better accuracy with 20.1× faster video conversion and 16.9× less energy consumption.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 51","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Designing Data Visualisations for Self-Compassion in Personal Informatics 为个人信息学中的自我同情设计数据可视化

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631448

Meagan B. Loerakker, Jasmin Niess, Marit Bentvelzen, Paweł W. Woźniak

Wearable personal trackers offer exciting opportunities to contribute to one's well-being, but they also can foster negative experiences. It remains a challenge to understand how we can design personal informatics experiences that help users frame their data in a positive manner and foster self-compassion. To explore this, we conducted a study where we compared different visualisations for user-generated screen time data. We examined positive, neutral and negative framings of the data and whether or not a point of reference was provided in a visualisation. The results show that framing techniques have a significant effect on reflection, rumination and self-compassion. We contribute insights into what design features of data representations can support positive experiences in personal informatics.

可穿戴个人追踪器为促进个人福祉提供了令人兴奋的机会，但也可能助长负面体验。如何设计个人信息学体验，帮助用户以积极的方式构架他们的数据并促进自我同情，仍然是一项挑战。为了探讨这个问题，我们进行了一项研究，比较了用户生成的屏幕时间数据的不同可视化方式。我们考察了数据的积极、中立和消极框架，以及可视化中是否提供了参考点。结果表明，框架技术对反思、反刍和自我同情有显著影响。我们就数据表示的哪些设计特点可以支持个人信息学中的积极体验提出了自己的见解。

引用次数: 0

Learning from User-driven Events to Generate Automation Sequences 从用户驱动的事件中学习生成自动化序列

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631427

Yunpeng Song, Yiheng Bian, Xiaorui Wang, Zhongmin Cai

Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans.

让智能设备按照预期学习自动操作是一项至关重要但又极具挑战性的任务。传统的设备自动化 "触发-行动 "规则方法在复杂的场景中容易产生歧义。为了解决这个问题，我们提出了一种数据驱动方法，利用记录的用户驱动事件序列来预测用户可能采取的行动，并生成细粒度的设备自动化序列。我们的主要直觉是，用户驱动的事件序列与人类撰写的文章和程序一样，受一致的语义上下文支配，并包含可建模的规律性，从而生成表达用户偏好的序列。我们介绍的 ASGen 是一种深度学习框架，它将序列信息、事件属性和外部知识结合在一起，形成事件表示法并输出任意长度的序列，从而促进自动化。为了从定量和定性两个角度对我们的方法进行评估，我们使用包含超过 440 万个事件的现实数据集进行了两项研究。结果表明，我们的方法超越了其他方法，能提供更准确的建议。与人工生成的自动化序列相比，我们的模型生成的自动化序列被认为同样合理，甚至更加有用。

{"title":"Learning from User-driven Events to Generate Automation Sequences","authors":"Yunpeng Song, Yiheng Bian, Xiaorui Wang, Zhongmin Cai","doi":"10.1145/3631427","DOIUrl":"https://doi.org/10.1145/3631427","url":null,"abstract":"Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 4","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FSS-Tag FSS 日

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631457

Liqiong Chang, Xiaofeng Yang, Ruyue Liu, Guodong Xie, Fuwei Wang, Ju Wang

Material sensing is crucial in many emerging applications, such as waste classification and hazardous material detection. Although existing Radio Frequency (RF) signal based systems achieved great success, they have limited identification accuracy when either RF signals can not penetrate through a target or a target has different outer and inner materials. This paper introduces a Frequency Selective Surface (FSS) tag based high accuracy material identification system, namely FSS-Tag, which utilises both the penetrating signals and the coupling effect. Specifically, we design and attach a FSS tag to a target, and use frequency responses of the tag for material sensing, since different target materials have different frequency responses. The key advantage of our system is that, when RF signals pass through a target with the FSS tag, the penetrating signal responds more to the inner material, and the coupling effect (between the target and the tag) reflects more about the outer material; thus, one can achieve a higher sensing accuracy. The challenge lies in how to find optimal tag design parameters so that the frequency response of different target materials can be clearly distinguished. We address this challenge by establishing a tag parameter optimization model. Real-world experiments show that FSS-Tag achieves more than 91% accuracy on identifying eight common materials, and improves the accuracy by up to 38% and 8% compared with the state of the art (SOTA) penetrating signal based method TagScan and the SOTA coupling effect based method Tagtag.

在许多新兴应用领域，如垃圾分类和危险材料检测中，材料传感至关重要。虽然现有的基于射频（RF）信号的系统取得了巨大成功，但在射频信号无法穿透目标或目标内外材料不同的情况下，它们的识别精度有限。本文介绍了一种基于频率选择性表面（FSS）标签的高精度材料识别系统，即 FSS-Tag，它同时利用了穿透信号和耦合效应。具体来说，我们设计了一个 FSS 标签并将其贴在目标上，然后利用标签的频率响应进行材料感应，因为不同的目标材料具有不同的频率响应。我们系统的主要优势在于，当射频信号穿过带有 FSS 标签的目标时，穿透信号更多地响应内部材料，而耦合效应（目标与标签之间）更多地反映外部材料；因此，我们可以实现更高的传感精度。目前的挑战在于如何找到最佳的标签设计参数，从而明确区分不同目标材料的频率响应。我们通过建立标签参数优化模型来解决这一难题。实际实验表明，FSS-Tag 对八种常见材料的识别准确率超过 91%，与基于穿透信号的最新方法 TagScan 和基于耦合效应的最新方法 Tagtag 相比，准确率分别提高了 38% 和 8%。

{"title":"FSS-Tag","authors":"Liqiong Chang, Xiaofeng Yang, Ruyue Liu, Guodong Xie, Fuwei Wang, Ju Wang","doi":"10.1145/3631457","DOIUrl":"https://doi.org/10.1145/3631457","url":null,"abstract":"Material sensing is crucial in many emerging applications, such as waste classification and hazardous material detection. Although existing Radio Frequency (RF) signal based systems achieved great success, they have limited identification accuracy when either RF signals can not penetrate through a target or a target has different outer and inner materials. This paper introduces a Frequency Selective Surface (FSS) tag based high accuracy material identification system, namely FSS-Tag, which utilises both the penetrating signals and the coupling effect. Specifically, we design and attach a FSS tag to a target, and use frequency responses of the tag for material sensing, since different target materials have different frequency responses. The key advantage of our system is that, when RF signals pass through a target with the FSS tag, the penetrating signal responds more to the inner material, and the coupling effect (between the target and the tag) reflects more about the outer material; thus, one can achieve a higher sensing accuracy. The challenge lies in how to find optimal tag design parameters so that the frequency response of different target materials can be clearly distinguished. We address this challenge by establishing a tag parameter optimization model. Real-world experiments show that FSS-Tag achieves more than 91% accuracy on identifying eight common materials, and improves the accuracy by up to 38% and 8% compared with the state of the art (SOTA) penetrating signal based method TagScan and the SOTA coupling effect based method Tagtag.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"10 42","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

UniFi UniFi

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631429

Yan Liu, Anlan Yu, Leye Wang, Bin Guo, Yang Li, E. Yi, Daqing Zhang

In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).

近年来，通过对接收信号和相应人类活动之间的复杂映射进行建模，人们致力于探索基于 Wi-Fi 的传感技术。然而，Wi-Fi 信号固有的复杂性给实际应用带来了巨大挑战，因为它们很容易受到部署环境的影响。为了应对这一挑战，我们深入研究了 Wi-Fi 信号的显著特征，并提炼出三个关键因素，可用于增强基于深度学习的 Wi-Fi 感知模型的泛化能力：1）有效捕捉有价值的输入，以减轻噪声测量的不利影响；2）自适应融合来自多个 Wi-Fi 设备的互补信息，以提高与不同活动相关的信号模式的可区分性；3）提取可泛化的特征，以克服不同环境条件（如位置、方向）下活动表征的不一致性。利用这些见解，我们设计了一种基于 Wi-Fi 信号的新型统一传感框架（称为 UniFi），并将手势识别作为一种应用来展示其有效性。UniFi 通过从多个收发器收集到的预先去噪信号中提取与环境因素无关的具有区分性和一致性的特征，在真实世界场景中实现了稳健且可通用的手势识别。为此，我们首先引入了一种有效的信号预处理方法，从嘈杂的接收信号中捕捉适用于深度学习模型的输入数据。其次，我们提出了一种基于时空跨视角注意力的多视角深度网络，它能整合多载波和多设备信号，以提取可区分的信息。最后，我们提出将互信息最大化作为正则化器，通过对比损失来学习环境不变表征，而不需要从未曾见过的环境中获取任何信号来进行实际适应。在 Widar 3.0 数据集上进行的大量实验表明，我们提出的框架在不同环境下的表现明显优于最先进的方法（在不额外收集数据和训练模型的情况下，域内和跨域识别的准确率分别为 99% 和 90%-98%）。

{"title":"UniFi","authors":"Yan Liu, Anlan Yu, Leye Wang, Bin Guo, Yang Li, E. Yi, Daqing Zhang","doi":"10.1145/3631429","DOIUrl":"https://doi.org/10.1145/3631429","url":null,"abstract":"In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"14 8","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

ADA-SHARK ADA-SHARK

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631416

Marvin Martin, Etienne Meunier, P. Moreau, Jean-Eudes Gadenne, J. Dautel, Félicien Catherin, Eugene Pinsky, Reza Rawassizadeh

Due to global warming, sharks are moving closer to the beaches, affecting the risk to humans and their own lives. Within the past decade, several technologies were developed to reduce the risks for swimmers and surfers. This study proposes a robust method based on computer vision to detect sharks using an underwater camera monitoring system to secure coastlines. The system is autonomous, environment-friendly, and requires low maintenance. 43,679 images extracted from 175 hours of videos of marine life were used to train our algorithms. Our approach allows the collection and analysis of videos in real-time using an autonomous underwater camera connected to a smart buoy charged with solar panels. The videos are processed by a Domain Adversarial Convolutional Neural Network to discern sharks regardless of the background environment with an F2-score of 83.2% and a recall of 90.9%, while human experts have an F2-score of 94% and a recall of 95.7%.

由于全球变暖，鲨鱼正在向海滩靠近，从而影响到人类及其自身的生命安全。在过去的十年中，人们开发了多种技术来降低游泳者和冲浪者的风险。本研究提出了一种基于计算机视觉的稳健方法，利用水下摄像监控系统检测鲨鱼，以确保海岸线安全。该系统具有自主性、环保性和低维护要求。从 175 个小时的海洋生物视频中提取的 43,679 幅图像被用于训练我们的算法。我们的方法允许使用连接到太阳能电池板充电的智能浮标的自主水下摄像机实时收集和分析视频。视频经领域对抗卷积神经网络处理后，无论背景环境如何，都能辨别出鲨鱼，其 F2 分数为 83.2%，召回率为 90.9%，而人类专家的 F2 分数为 94%，召回率为 95.7%。

引用次数: 0

SF-Adapter SF 适配器

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631428

Hua Kang, Qingyong Hu, Qian Zhang

Wearable sensor-based human activity recognition (HAR) has gained significant attention due to the widespread use of smart wearable devices. However, variations in different subjects can cause a domain shift that impedes the scaling of the recognition model. Unsupervised domain adaptation has been proposed as a solution to recognize activities in new, unlabeled target domains by training the source and target data together. However, the need for accessing source data raises privacy concerns. Source-free domain adaptation has emerged as a practical setting, where only a pre-trained source model is provided for the unlabeled target domain. This setup aligns with the need for personalized activity model adaptation on target local devices. As the edge devices are resource-constrained with limited memory, it is crucial to take the computational efficiency, i.e., memory cost into consideration. In this paper, we develop a source-free domain adaptation framework for wearable sensor-based HAR, with a focus on computational efficiency for target edge devices. Firstly, we design a lightweight add-on module called adapter to adapt the frozen pre-trained model to the unlabeled target domain. Secondly, to optimize the adapter, we adopt a simple yet effective model adaptation method that leverages local representation similarity and prediction consistency. Additionally, we design a set of sample selection optimization strategies to select samples effective for adaptation and further enhance computational efficiency while maintaining adaptation performance. Our extensive experiments on three datasets demonstrate that our method achieves comparable recognition accuracy to the state-of-the-art source free domain adaptation methods with fewer than 1% of the parameters updated and saves up to 4.99X memory cost.

由于智能可穿戴设备的广泛使用，基于可穿戴传感器的人类活动识别（HAR）受到了广泛关注。然而，不同主体的变化会导致领域转移，从而阻碍识别模型的扩展。有人提出了一种无监督领域适应解决方案，通过将源数据和目标数据一起训练，在新的、无标记的目标领域中识别活动。然而，访问源数据的需要会引发隐私问题。无源域适配已成为一种实用的设置，在这种设置中，只为未标记的目标域提供预先训练好的源模型。这种设置符合在目标本地设备上进行个性化活动模型适配的需求。由于边缘设备资源有限，内存有限，因此必须考虑计算效率，即内存成本。在本文中，我们为基于传感器的可穿戴 HAR 开发了一个无源域适配框架，重点关注目标边缘设备的计算效率。首先，我们设计了一个名为适配器的轻量级附加模块，用于将冻结的预训练模型适配到未标记的目标领域。其次，为了优化适配器，我们采用了一种简单而有效的模型适配方法，该方法利用了局部表示相似性和预测一致性。此外，我们还设计了一套样本选择优化策略，以选择对适配有效的样本，并在保持适配性能的同时进一步提高计算效率。我们在三个数据集上进行的大量实验证明，我们的方法只需更新不到 1% 的参数，就能达到与最先进的无源域适配方法相当的识别准确率，并节省高达 4.99 倍的内存成本。

{"title":"SF-Adapter","authors":"Hua Kang, Qingyong Hu, Qian Zhang","doi":"10.1145/3631428","DOIUrl":"https://doi.org/10.1145/3631428","url":null,"abstract":"Wearable sensor-based human activity recognition (HAR) has gained significant attention due to the widespread use of smart wearable devices. However, variations in different subjects can cause a domain shift that impedes the scaling of the recognition model. Unsupervised domain adaptation has been proposed as a solution to recognize activities in new, unlabeled target domains by training the source and target data together. However, the need for accessing source data raises privacy concerns. Source-free domain adaptation has emerged as a practical setting, where only a pre-trained source model is provided for the unlabeled target domain. This setup aligns with the need for personalized activity model adaptation on target local devices. As the edge devices are resource-constrained with limited memory, it is crucial to take the computational efficiency, i.e., memory cost into consideration. In this paper, we develop a source-free domain adaptation framework for wearable sensor-based HAR, with a focus on computational efficiency for target edge devices. Firstly, we design a lightweight add-on module called adapter to adapt the frozen pre-trained model to the unlabeled target domain. Secondly, to optimize the adapter, we adopt a simple yet effective model adaptation method that leverages local representation similarity and prediction consistency. Additionally, we design a set of sample selection optimization strategies to select samples effective for adaptation and further enhance computational efficiency while maintaining adaptation performance. Our extensive experiments on three datasets demonstrate that our method achieves comparable recognition accuracy to the state-of-the-art source free domain adaptation methods with fewer than 1% of the parameters updated and saves up to 4.99X memory cost.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"1 4","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conversational Localization 对话本地化

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631404

Smitha Sheshadri, Kotaro Hara

We propose a novel sensorless approach to indoor localization by leveraging natural language conversations with users, which we call conversational localization. To show the feasibility of conversational localization, we develop a proof-of-concept system that guides users to describe their surroundings in a chat and estimates their position based on the information they provide. We devised a modular architecture for our system with four modules. First, we construct an entity database with available image-based floor maps. Second, we enable the dynamic identification and scoring of information provided by users through our utterance processing module. Then, we implement a conversational agent that can intelligently strategize and guide the interaction to elicit localizationally valuable information from users. Finally, we employ visibility catchment area and line-of-sight heuristics to generate spatial estimates for the user's location. We conduct two user studies in designing and testing the system. We collect 800 natural language descriptions of unfamiliar indoor spaces in an online crowdsourcing study to learn the feasibility of extracting localizationally useful entities from user utterances. We then conduct a field study with 10 participants at 10 locations to evaluate the feasibility and performance of conversational localization. The results show that conversational localization can achieve within-10 meter localization accuracy at eight out of the ten study sites, showing the technique's utility for classes of indoor location-based services.

我们提出了一种利用与用户的自然语言对话进行室内定位的新型无传感器方法，我们称之为对话定位。为了证明对话定位的可行性，我们开发了一个概念验证系统，引导用户在聊天中描述他们周围的环境，并根据他们提供的信息估计他们的位置。我们为系统设计了一个包含四个模块的模块化架构。首先，我们构建了一个实体数据库，其中包含可用的基于图像的楼层地图。其次，我们通过语句处理模块对用户提供的信息进行动态识别和评分。然后，我们实现了一个会话代理，它可以智能地制定策略并引导交互，从用户那里获取有本地化价值的信息。最后，我们采用能见度覆盖区和视线启发法来生成用户位置的空间估计值。我们在设计和测试系统时进行了两项用户研究。在一项在线众包研究中，我们收集了 800 条关于陌生室内空间的自然语言描述，以了解从用户话语中提取对定位有用的实体的可行性。然后，我们在 10 个地点对 10 名参与者进行了实地研究，以评估会话本地化的可行性和性能。研究结果表明，在 10 个研究地点中，有 8 个地点的会话定位可以达到 10 米以内的定位精度，这表明该技术适用于各类室内定位服务。

{"title":"Conversational Localization","authors":"Smitha Sheshadri, Kotaro Hara","doi":"10.1145/3631404","DOIUrl":"https://doi.org/10.1145/3631404","url":null,"abstract":"We propose a novel sensorless approach to indoor localization by leveraging natural language conversations with users, which we call conversational localization. To show the feasibility of conversational localization, we develop a proof-of-concept system that guides users to describe their surroundings in a chat and estimates their position based on the information they provide. We devised a modular architecture for our system with four modules. First, we construct an entity database with available image-based floor maps. Second, we enable the dynamic identification and scoring of information provided by users through our utterance processing module. Then, we implement a conversational agent that can intelligently strategize and guide the interaction to elicit localizationally valuable information from users. Finally, we employ visibility catchment area and line-of-sight heuristics to generate spatial estimates for the user's location. We conduct two user studies in designing and testing the system. We collect 800 natural language descriptions of unfamiliar indoor spaces in an online crowdsourcing study to learn the feasibility of extracting localizationally useful entities from user utterances. We then conduct a field study with 10 participants at 10 locations to evaluate the feasibility and performance of conversational localization. The results show that conversational localization can achieve within-10 meter localization accuracy at eight out of the ten study sites, showing the technique's utility for classes of indoor location-based services.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 7","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TrackPose TrackPose

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631459

Ke He, Chentao Li, Yongjie Duan, Jianjiang Feng, Jie Zhou

Several studies have explored the estimation of finger pose/angle to enhance the expressiveness of touchscreens. However, the accuracy of previous algorithms is limited by large estimation errors, and the sequential output angles are unstable, making it difficult to meet the demands of practical applications. We believe the defect arises from improper rotation representation, the lack of time-series modeling, and the difficulty in accommodating individual differences among users. To address these issues, we conduct in-depth study of rotation representation for the 2D pose problem by minimizing the errors between representation space and original space. A deep learning model, TrackPose, using a self-attention mechanism is proposed for time-series modeling to improve accuracy and stability of finger pose. A registration application on a mobile phone is developed to collect touchscreen images of each new user without the use of optical tracking device. The combination of the three measures mentioned above has resulted in a 33% reduction in the angle estimation error, 47% for the yaw angle especially. Additionally, the instability of sequential estimations, measured by the proposed metric MAEΔ, is reduced by 62%. User study further confirms the effectiveness of our proposed algorithm.

一些研究探讨了手指姿势/角度的估计，以增强触摸屏的表现力。然而，以往算法的准确性受限于较大的估计误差，且连续输出的角度不稳定，难以满足实际应用的需求。我们认为，造成这种缺陷的原因是旋转表示不当、缺乏时间序列建模以及难以照顾到用户的个体差异。针对这些问题，我们通过最小化表示空间与原始空间之间的误差，对二维姿势问题的旋转表示进行了深入研究。我们提出了一种深度学习模型 TrackPose，它采用自我关注机制进行时间序列建模，以提高手指姿势的准确性和稳定性。在手机上开发了一个注册应用程序，无需使用光学跟踪设备即可收集每个新用户的触摸屏图像。结合上述三种措施，角度估计误差减少了 33%，尤其是偏航角误差减少了 47%。此外，用提出的指标 MAEΔ 来衡量，连续估计的不稳定性降低了 62%。用户研究进一步证实了我们提出的算法的有效性。

{"title":"TrackPose","authors":"Ke He, Chentao Li, Yongjie Duan, Jianjiang Feng, Jie Zhou","doi":"10.1145/3631459","DOIUrl":"https://doi.org/10.1145/3631459","url":null,"abstract":"Several studies have explored the estimation of finger pose/angle to enhance the expressiveness of touchscreens. However, the accuracy of previous algorithms is limited by large estimation errors, and the sequential output angles are unstable, making it difficult to meet the demands of practical applications. We believe the defect arises from improper rotation representation, the lack of time-series modeling, and the difficulty in accommodating individual differences among users. To address these issues, we conduct in-depth study of rotation representation for the 2D pose problem by minimizing the errors between representation space and original space. A deep learning model, TrackPose, using a self-attention mechanism is proposed for time-series modeling to improve accuracy and stability of finger pose. A registration application on a mobile phone is developed to collect touchscreen images of each new user without the use of optical tracking device. The combination of the three measures mentioned above has resulted in a 33% reduction in the angle estimation error, 47% for the yaw angle especially. Additionally, the instability of sequential estimations, measured by the proposed metric MAEΔ, is reduced by 62%. User study further confirms the effectiveness of our proposed algorithm.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 7","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0