Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies最新文献_第2页

Spatial-Temporal Masked Autoencoder for Multi-Device Wearable Human Activity Recognition 用于多设备可穿戴人体活动识别的时空掩码自动编码器

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631415

Shenghuan Miao, Ling Chen, Rong Hu

The widespread adoption of wearable devices has led to a surge in the development of multi-device wearable human activity recognition (WHAR) systems. Nevertheless, the performance of traditional supervised learning-based methods to WHAR is limited by the challenge of collecting ample annotated wearable data. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising solution by first training a competent feature extractor on a substantial quantity of unlabeled data, followed by refining a minimal classifier with a small amount of labeled data. Despite the promise of SSL in WHAR, the majority of studies have not considered missing device scenarios in multi-device WHAR. To bridge this gap, we propose a multi-device SSL WHAR method termed Spatial-Temporal Masked Autoencoder (STMAE). STMAE captures discriminative activity representations by utilizing the asymmetrical encoder-decoder structure and two-stage spatial-temporal masking strategy, which can exploit the spatial-temporal correlations in multi-device data to improve the performance of SSL WHAR, especially on missing device scenarios. Experiments on four real-world datasets demonstrate the efficacy of STMAE in various practical scenarios.

随着可穿戴设备的广泛应用，多设备可穿戴人体活动识别（WHAR）系统的开发也随之激增。然而，传统的基于监督学习的人类活动识别（WHAR）方法由于难以收集到大量带注释的可穿戴设备数据而性能有限。为了克服这一限制，自监督学习（SSL）成为一种很有前景的解决方案，它首先在大量未标注数据上训练一个合格的特征提取器，然后用少量标注数据完善一个最小分类器。尽管 SSL 在 WHAR 中大有可为，但大多数研究都没有考虑多设备 WHAR 中的设备缺失情况。为了弥补这一不足，我们提出了一种多设备 SSL WHAR 方法，称为空间-时间掩码自动编码器（STMAE）。STMAE 利用非对称编码器-解码器结构和两阶段空间-时间掩码策略来捕捉具有区分性的活动表示，从而利用多设备数据中的空间-时间相关性来提高 SSL WHAR 的性能，尤其是在设备缺失的情况下。在四个真实数据集上的实验证明了 STMAE 在各种实际场景中的功效。

{"title":"Spatial-Temporal Masked Autoencoder for Multi-Device Wearable Human Activity Recognition","authors":"Shenghuan Miao, Ling Chen, Rong Hu","doi":"10.1145/3631415","DOIUrl":"https://doi.org/10.1145/3631415","url":null,"abstract":"The widespread adoption of wearable devices has led to a surge in the development of multi-device wearable human activity recognition (WHAR) systems. Nevertheless, the performance of traditional supervised learning-based methods to WHAR is limited by the challenge of collecting ample annotated wearable data. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising solution by first training a competent feature extractor on a substantial quantity of unlabeled data, followed by refining a minimal classifier with a small amount of labeled data. Despite the promise of SSL in WHAR, the majority of studies have not considered missing device scenarios in multi-device WHAR. To bridge this gap, we propose a multi-device SSL WHAR method termed Spatial-Temporal Masked Autoencoder (STMAE). STMAE captures discriminative activity representations by utilizing the asymmetrical encoder-decoder structure and two-stage spatial-temporal masking strategy, which can exploit the spatial-temporal correlations in multi-device data to improve the performance of SSL WHAR, especially on missing device scenarios. Experiments on four real-world datasets demonstrate the efficacy of STMAE in various practical scenarios.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 3","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PASTEL 粉彩

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3633808

F. Elhattab, Sara Bouchenak, Cédric Boscher

Federated Learning (FL) aims to improve machine learning privacy by allowing several data owners in edge and ubiquitous computing systems to collaboratively train a model, while preserving their local training data private, and sharing only model training parameters. However, FL systems remain vulnerable to privacy attacks, and in particular, to membership inference attacks that allow adversaries to determine whether a given data sample belongs to participants' training data, thus, raising a significant threat in sensitive ubiquitous computing systems. Indeed, membership inference attacks are based on a binary classifier that is able to differentiate between member data samples used to train a model and non-member data samples not used for training. In this context, several defense mechanisms, including differential privacy, have been proposed to counter such privacy attacks. However, the main drawback of these methods is that they may reduce model accuracy while incurring non-negligible computational costs. In this paper, we precisely address this problem with PASTEL, a FL privacy-preserving mechanism that is based on a novel multi-objective learning function. On the one hand, PASTEL decreases the generalization gap to reduce the difference between member data and non-member data, and on the other hand, PASTEL reduces model loss and leverages adaptive gradient descent optimization for preserving high model accuracy. Our experimental evaluations conducted on eight widely used datasets and five model architectures show that PASTEL significantly reduces membership inference attack success rates by up to -28%, reaching optimal privacy protection in most cases, with low to no perceptible impact on model accuracy.

联合学习（FL）旨在通过允许边缘和泛在计算系统中的多个数据所有者协同训练一个模型来提高机器学习的隐私性，同时保持其本地训练数据的私密性，并仅共享模型训练参数。然而，FL 系统仍然容易受到隐私攻击，特别是成员推理攻击，这种攻击能让对手确定给定的数据样本是否属于参与者的训练数据，从而对敏感的泛在计算系统构成重大威胁。事实上，成员推断攻击基于二进制分类器，该分类器能够区分用于训练模型的成员数据样本和未用于训练的非成员数据样本。在这种情况下，人们提出了包括差分隐私在内的几种防御机制来应对这种隐私攻击。然而，这些方法的主要缺点是可能会降低模型的准确性，同时产生不可忽略的计算成本。在本文中，我们利用基于新型多目标学习函数的 FL 隐私保护机制 PASTEL 准确地解决了这一问题。一方面，PASTEL 缩小了泛化差距，从而减少了成员数据与非成员数据之间的差异；另一方面，PASTEL 减少了模型损失，并利用自适应梯度下降优化来保持高模型精度。我们在八个广泛使用的数据集和五个模型架构上进行的实验评估表明，PASTEL 显著降低了成员推断攻击成功率，最高可达-28%，在大多数情况下达到了最佳隐私保护效果，而且对模型准确性的影响很小，甚至没有影响。

{"title":"PASTEL","authors":"F. Elhattab, Sara Bouchenak, Cédric Boscher","doi":"10.1145/3633808","DOIUrl":"https://doi.org/10.1145/3633808","url":null,"abstract":"Federated Learning (FL) aims to improve machine learning privacy by allowing several data owners in edge and ubiquitous computing systems to collaboratively train a model, while preserving their local training data private, and sharing only model training parameters. However, FL systems remain vulnerable to privacy attacks, and in particular, to membership inference attacks that allow adversaries to determine whether a given data sample belongs to participants' training data, thus, raising a significant threat in sensitive ubiquitous computing systems. Indeed, membership inference attacks are based on a binary classifier that is able to differentiate between member data samples used to train a model and non-member data samples not used for training. In this context, several defense mechanisms, including differential privacy, have been proposed to counter such privacy attacks. However, the main drawback of these methods is that they may reduce model accuracy while incurring non-negligible computational costs. In this paper, we precisely address this problem with PASTEL, a FL privacy-preserving mechanism that is based on a novel multi-objective learning function. On the one hand, PASTEL decreases the generalization gap to reduce the difference between member data and non-member data, and on the other hand, PASTEL reduces model loss and leverages adaptive gradient descent optimization for preserving high model accuracy. Our experimental evaluations conducted on eight widely used datasets and five model architectures show that PASTEL significantly reduces membership inference attack success rates by up to -28%, reaching optimal privacy protection in most cases, with low to no perceptible impact on model accuracy.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"14 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LocCams 本地摄像头

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631432

Yangyang Gu, Jing Chen, Cong Wu, Kun He, Ziming Zhao, Ruiying Du

Unlawful wireless cameras are often hidden to secretly monitor private activities. However, existing methods to detect and localize these cameras are interactively complex or require expensive specialized hardware. In this paper, we present LocCams, an efficient and robust approach for hidden camera detection and localization using only a commodity device (e.g., a smartphone). By analyzing data packets in the wireless local area network, LocCams passively detects hidden cameras based on the packet transmission rate. Camera localization is achieved by identifying whether the physical channel between our detector and the hidden camera is a Line-of-Sight (LOS) propagation path based on the distribution of channel state information subcarriers, and utilizing a feature extraction approach based on a Convolutional Neural Network (CNN) model for reliable localization. Our extensive experiments, involving various subjects, cameras, distances, user positions, and room configurations, demonstrate LocCams' effectiveness. Additionally, to evaluate the performance of the method in real life, we use subjects, cameras, and rooms that do not appear in the training set to evaluate the transferability of the model. With an overall accuracy of 95.12% within 30 seconds of detection, LocCams provides robust detection and localization of hidden cameras.

非法无线摄像头经常被隐藏起来，以秘密监控私人活动。然而，检测和定位这些摄像头的现有方法交互复杂，或需要昂贵的专用硬件。在本文中，我们介绍了 LocCams，一种仅使用普通设备（如智能手机）就能高效、稳健地检测和定位隐藏摄像头的方法。通过分析无线局域网中的数据包，LocCams 可根据数据包传输速率被动地检测隐藏的摄像头。根据信道状态信息子载波的分布，识别探测器与隐藏摄像头之间的物理信道是否为视距（LOS）传播路径，并利用基于卷积神经网络（CNN）模型的特征提取方法进行可靠定位，从而实现摄像头定位。我们进行了广泛的实验，涉及各种对象、摄像机、距离、用户位置和房间配置，证明了 LocCams 的有效性。此外，为了评估该方法在现实生活中的性能，我们使用了训练集中未出现的主体、摄像头和房间，以评估模型的可转移性。LocCams 在 30 秒检测时间内的总体准确率为 95.12%，能够对隐藏摄像头进行可靠的检测和定位。

{"title":"LocCams","authors":"Yangyang Gu, Jing Chen, Cong Wu, Kun He, Ziming Zhao, Ruiying Du","doi":"10.1145/3631432","DOIUrl":"https://doi.org/10.1145/3631432","url":null,"abstract":"Unlawful wireless cameras are often hidden to secretly monitor private activities. However, existing methods to detect and localize these cameras are interactively complex or require expensive specialized hardware. In this paper, we present LocCams, an efficient and robust approach for hidden camera detection and localization using only a commodity device (e.g., a smartphone). By analyzing data packets in the wireless local area network, LocCams passively detects hidden cameras based on the packet transmission rate. Camera localization is achieved by identifying whether the physical channel between our detector and the hidden camera is a Line-of-Sight (LOS) propagation path based on the distribution of channel state information subcarriers, and utilizing a feature extraction approach based on a Convolutional Neural Network (CNN) model for reliable localization. Our extensive experiments, involving various subjects, cameras, distances, user positions, and room configurations, demonstrate LocCams' effectiveness. Additionally, to evaluate the performance of the method in real life, we use subjects, cameras, and rooms that do not appear in the training set to evaluate the transferability of the model. With an overall accuracy of 95.12% within 30 seconds of detection, LocCams provides robust detection and localization of hidden cameras.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"13 2","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity Recognition 用于野外上下文感知人类活动识别的深度异构对比超图学习

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631444

Wen Ge, Guanyi Mou, Emmanuel O. Agu, Kyumin Lee

Human Activity Recognition (HAR) is a challenging, multi-label classification problem as activities may co-occur and sensor signals corresponding to the same activity may vary in different contexts (e.g., different device placements). This paper proposes a Deep Heterogeneous Contrastive Hyper-Graph Learning (DHC-HGL) framework that captures heterogenous Context-Aware HAR (CA-HAR) hypergraph properties in a message-passing and neighborhood-aggregation fashion. Prior work only explored homogeneous or shallow-node-heterogeneous graphs. DHC-HGL handles heterogeneous CA-HAR data by innovatively 1) Constructing three different types of sub-hypergraphs that are each passed through different custom HyperGraph Convolution (HGC) layers designed to handle edge-heterogeneity and 2) Adopting a contrastive loss function to ensure node-heterogeneity. In rigorous evaluation on two CA-HAR datasets, DHC-HGL significantly outperformed state-of-the-art baselines by 5.8% to 16.7% on Matthews Correlation Coefficient (MCC) and 3.0% to 8.4% on Macro F1 scores. UMAP visualizations of learned CA-HAR node embeddings are also presented to enhance model explainability. Our code is publicly available1 to encourage further research.

人类活动识别（HAR）是一个具有挑战性的多标签分类问题，因为活动可能同时发生，而且在不同的情境下（如不同的设备位置），对应于同一活动的传感器信号也可能不同。本文提出了一种深度异构对比超图学习（DHC-HGL）框架，以消息传递和邻域聚合的方式捕捉异构情境感知 HAR（CA-HAR）超图属性。之前的工作只探索了同构或浅节点异构图。DHC-HGL 处理异构 CA-HAR 数据的创新方法是：1）构建三种不同类型的子超图，分别通过不同的自定义超图卷积（HGC）层来处理边缘异构性；2）采用对比损失函数来确保节点异构性。在两个 CA-HAR 数据集上进行的严格评估中，DHC-HGL 在马修斯相关系数 (Matthews Correlation Coefficient, MCC) 和 Macro F1 分数上分别以 5.8% 至 16.7% 和 3.0% 至 8.4% 的优势明显优于最先进的基线。为了提高模型的可解释性，我们还展示了所学 CA-HAR 节点嵌入的 UMAP 可视化效果。我们的代码是公开的1，以鼓励进一步的研究。

{"title":"Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity Recognition","authors":"Wen Ge, Guanyi Mou, Emmanuel O. Agu, Kyumin Lee","doi":"10.1145/3631444","DOIUrl":"https://doi.org/10.1145/3631444","url":null,"abstract":"Human Activity Recognition (HAR) is a challenging, multi-label classification problem as activities may co-occur and sensor signals corresponding to the same activity may vary in different contexts (e.g., different device placements). This paper proposes a Deep Heterogeneous Contrastive Hyper-Graph Learning (DHC-HGL) framework that captures heterogenous Context-Aware HAR (CA-HAR) hypergraph properties in a message-passing and neighborhood-aggregation fashion. Prior work only explored homogeneous or shallow-node-heterogeneous graphs. DHC-HGL handles heterogeneous CA-HAR data by innovatively 1) Constructing three different types of sub-hypergraphs that are each passed through different custom HyperGraph Convolution (HGC) layers designed to handle edge-heterogeneity and 2) Adopting a contrastive loss function to ensure node-heterogeneity. In rigorous evaluation on two CA-HAR datasets, DHC-HGL significantly outperformed state-of-the-art baselines by 5.8% to 16.7% on Matthews Correlation Coefficient (MCC) and 3.0% to 8.4% on Macro F1 scores. UMAP visualizations of learned CA-HAR node embeddings are also presented to enhance model explainability. Our code is publicly available1 to encourage further research.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 34","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reenvisioning Patient Education with Smart Hospital Patient Rooms 用智能医院病房重新定义患者教育

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631419

Joshua Dawson, K. J. Phanich, Jason Wiese

Smart hospital patient rooms incorporate various smart devices to allow digital control of the entertainment --- such as TV and soundbar --- and the environment --- including lights, blinds, and thermostat. This technology can benefit patients by providing a more accessible, engaging, and personalized approach to their care. Many patients arrive at a rehabilitation hospital because they suffered a life-changing event such as a spinal cord injury or stroke. It can be challenging for patients to learn to cope with the changed abilities that are the new norm in their lives. This study explores ways smart patient rooms can support rehabilitation education to prepare patients for life outside the hospital's care. We conducted 20 contextual inquiries and four interviews with rehabilitation educators as they performed education sessions with patients and informal caregivers. Using thematic analysis, our findings offer insights into how smart patient rooms could revolutionize patient education by fostering better engagement with educational content, reducing interruptions during sessions, providing more agile education content management, and customizing therapy elements for each patient's unique needs. Lastly, we discuss design opportunities for future smart patient room implementations for a better educational experience in any healthcare context.

智能医院病房集成了各种智能设备，可对娱乐设施（如电视和音响）和环境（包括灯光、百叶窗和恒温器）进行数字控制。这项技术可以为患者提供更方便、更吸引人、更个性化的护理，从而使患者受益。许多病人因为脊髓损伤或中风等改变生活的事件而来到康复医院。对于病人来说，学习如何应对能力的改变是他们生活中的新常态，这可能具有挑战性。本研究探讨了智能病房如何支持康复教育，让病人为离开医院后的生活做好准备。我们对康复教育工作者进行了 20 次背景调查和 4 次访谈，当时他们正在与患者和非正式护理人员进行教育。通过主题分析，我们的研究结果深入探讨了智能病房如何通过促进患者更好地参与教育内容、减少教育过程中的干扰、提供更灵活的教育内容管理以及针对每位患者的独特需求定制治疗元素来彻底改变患者教育。最后，我们讨论了未来智能病房实施的设计机会，以便在任何医疗环境中提供更好的教育体验。

{"title":"Reenvisioning Patient Education with Smart Hospital Patient Rooms","authors":"Joshua Dawson, K. J. Phanich, Jason Wiese","doi":"10.1145/3631419","DOIUrl":"https://doi.org/10.1145/3631419","url":null,"abstract":"Smart hospital patient rooms incorporate various smart devices to allow digital control of the entertainment --- such as TV and soundbar --- and the environment --- including lights, blinds, and thermostat. This technology can benefit patients by providing a more accessible, engaging, and personalized approach to their care. Many patients arrive at a rehabilitation hospital because they suffered a life-changing event such as a spinal cord injury or stroke. It can be challenging for patients to learn to cope with the changed abilities that are the new norm in their lives. This study explores ways smart patient rooms can support rehabilitation education to prepare patients for life outside the hospital's care. We conducted 20 contextual inquiries and four interviews with rehabilitation educators as they performed education sessions with patients and informal caregivers. Using thematic analysis, our findings offer insights into how smart patient rooms could revolutionize patient education by fostering better engagement with educational content, reducing interruptions during sessions, providing more agile education content management, and customizing therapy elements for each patient's unique needs. Lastly, we discuss design opportunities for future smart patient room implementations for a better educational experience in any healthcare context.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"2 8","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones 利用移动电话调查基于语音的自杀意念检测的通用性

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631452

Arvind Pillai, Trevor Cohen, Dror Ben-Zeev, Subigya Nepal, Weichen Wang, M. Nemesure, Michael Heinz, George Price, D. Lekkas, Amanda C. Collins, Tess Z Griffin, Benjamin Buck, S. Preum, Dror Nicholas Jacobson

Speech-based diaries from mobile phones can capture paralinguistic patterns that help detect mental illness symptoms such as suicidal ideation. However, previous studies have primarily evaluated machine learning models on a single dataset, making their performance unknown under distribution shifts. In this paper, we investigate the generalizability of speech-based suicidal ideation detection using mobile phones through cross-dataset experiments using four datasets with N=786 individuals experiencing major depressive disorder, auditory verbal hallucinations, persecutory thoughts, and students with suicidal thoughts. Our results show that machine and deep learning methods generalize poorly in many cases. Thus, we evaluate unsupervised domain adaptation (UDA) and semi-supervised domain adaptation (SSDA) to mitigate performance decreases owing to distribution shifts. While SSDA approaches showed superior performance, they are often ineffective, requiring large target datasets with limited labels for adversarial and contrastive training. Therefore, we propose sinusoidal similarity sub-sampling (S3), a method that selects optimal source subsets for the target domain by computing pair-wise scores using sinusoids. Compared to prior approaches, S3 does not use labeled target data or transform features. Fine-tuning using S3 improves the cross-dataset performance of deep models across the datasets, thus having implications in ubiquitous technology, mental health, and machine learning.

基于手机的语音日记可以捕捉副语言模式，帮助检测自杀意念等精神疾病症状。然而，以往的研究主要是在单一数据集上对机器学习模型进行评估，因此无法了解其在分布变化情况下的性能。在本文中，我们通过跨数据集实验研究了使用手机进行基于语音的自杀意念检测的普适性，实验中使用了四个数据集，包括重度抑郁障碍、听觉言语幻觉、受迫害意念和有自杀意念的学生，样本数为 786 人。我们的结果表明，机器学习和深度学习方法在许多情况下的泛化效果不佳。因此，我们对无监督领域适应（UDA）和半监督领域适应（SSDA）进行了评估，以缓解因分布变化而导致的性能下降。虽然 SSDA 方法表现出了卓越的性能，但它们往往效果不佳，因为它们需要具有有限标签的大型目标数据集来进行对抗性和对比性训练。因此，我们提出了正弦波相似性子采样（S3）方法，该方法通过使用正弦波计算成对分数，为目标域选择最佳源子集。与之前的方法相比，S3 不使用标注的目标数据或转换特征。使用 S3 进行微调可提高深度模型在不同数据集之间的跨数据集性能，从而对泛在技术、心理健康和机器学习产生影响。

{"title":"Investigating Generalizability of Speech-based Suicidal Ideation Detection Using Mobile Phones","authors":"Arvind Pillai, Trevor Cohen, Dror Ben-Zeev, Subigya Nepal, Weichen Wang, M. Nemesure, Michael Heinz, George Price, D. Lekkas, Amanda C. Collins, Tess Z Griffin, Benjamin Buck, S. Preum, Dror Nicholas Jacobson","doi":"10.1145/3631452","DOIUrl":"https://doi.org/10.1145/3631452","url":null,"abstract":"Speech-based diaries from mobile phones can capture paralinguistic patterns that help detect mental illness symptoms such as suicidal ideation. However, previous studies have primarily evaluated machine learning models on a single dataset, making their performance unknown under distribution shifts. In this paper, we investigate the generalizability of speech-based suicidal ideation detection using mobile phones through cross-dataset experiments using four datasets with N=786 individuals experiencing major depressive disorder, auditory verbal hallucinations, persecutory thoughts, and students with suicidal thoughts. Our results show that machine and deep learning methods generalize poorly in many cases. Thus, we evaluate unsupervised domain adaptation (UDA) and semi-supervised domain adaptation (SSDA) to mitigate performance decreases owing to distribution shifts. While SSDA approaches showed superior performance, they are often ineffective, requiring large target datasets with limited labels for adversarial and contrastive training. Therefore, we propose sinusoidal similarity sub-sampling (S3), a method that selects optimal source subsets for the target domain by computing pair-wise scores using sinusoids. Compared to prior approaches, S3 does not use labeled target data or transform features. Fine-tuning using S3 improves the cross-dataset performance of deep models across the datasets, thus having implications in ubiquitous technology, mental health, and machine learning.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"2 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EarSE 耳塞

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631447

Di Duan, Yongliang Chen, Weitao Xu, Tianxing Li

Speech enhancement is regarded as the key to the quality of digital communication and is gaining increasing attention in the research field of audio processing. In this paper, we present EarSE, the first robust, hands-free, multi-modal speech enhancement solution using commercial off-the-shelf headphones. The key idea of EarSE is a novel hardware setting---leveraging the form factor of headphones equipped with a boom microphone to establish a stable acoustic sensing field across the user's face. Furthermore, we designed a sensing methodology based on Frequency-Modulated Continuous-Wave, which is an ultrasonic modality sensitive to capture subtle facial articulatory gestures of users when speaking. Moreover, we design a fully attention-based deep neural network to self-adaptively solve the user diversity problem by introducing the Vision Transformer network. We enhance the collaboration between the speech and ultrasonic modalities using a multi-head attention mechanism and a Factorized Bilinear Pooling gate. Extensive experiments demonstrate that EarSE achieves remarkable performance as increasing SiSDR by 14.61 dB and reducing the word error rate of user speech recognition by 22.45--66.41% in real-world application. EarSE not only outperforms seven baselines by 38.0% in SiSNR, 12.4% in STOI, and 20.5% in PESQ on average but also maintains practicality.

语音增强被视为数字通信质量的关键，在音频处理研究领域日益受到关注。在本文中，我们介绍了 EarSE，这是首个使用商用现成耳机的稳健、免提、多模态语音增强解决方案。EarSE 的核心理念是一种新颖的硬件设置--充分利用耳机配备吊杆麦克风的外形优势，在用户面部建立一个稳定的声学感应场。此外，我们还设计了一种基于频率调制连续波的传感方法，这是一种敏感的超声波模式，可以捕捉用户说话时细微的面部发音手势。此外，我们还设计了一种完全基于注意力的深度神经网络，通过引入视觉转换器网络来自适应地解决用户多样性问题。我们利用多头注意力机制和因子化双线性池化门增强了语音和超声波模式之间的协作。广泛的实验证明，EarSE在实际应用中取得了显著的性能，将SiSDR提高了14.61 dB，并将用户语音识别的词错误率降低了22.45%-66.41%。EarSE 不仅在 SiSNR、STOI 和 PESQ 方面平均分别比七种基线方法高出 38.0%、12.4% 和 20.5%，而且保持了实用性。

{"title":"EarSE","authors":"Di Duan, Yongliang Chen, Weitao Xu, Tianxing Li","doi":"10.1145/3631447","DOIUrl":"https://doi.org/10.1145/3631447","url":null,"abstract":"Speech enhancement is regarded as the key to the quality of digital communication and is gaining increasing attention in the research field of audio processing. In this paper, we present EarSE, the first robust, hands-free, multi-modal speech enhancement solution using commercial off-the-shelf headphones. The key idea of EarSE is a novel hardware setting---leveraging the form factor of headphones equipped with a boom microphone to establish a stable acoustic sensing field across the user's face. Furthermore, we designed a sensing methodology based on Frequency-Modulated Continuous-Wave, which is an ultrasonic modality sensitive to capture subtle facial articulatory gestures of users when speaking. Moreover, we design a fully attention-based deep neural network to self-adaptively solve the user diversity problem by introducing the Vision Transformer network. We enhance the collaboration between the speech and ultrasonic modalities using a multi-head attention mechanism and a Factorized Bilinear Pooling gate. Extensive experiments demonstrate that EarSE achieves remarkable performance as increasing SiSDR by 14.61 dB and reducing the word error rate of user speech recognition by 22.45--66.41% in real-world application. EarSE not only outperforms seven baselines by 38.0% in SiSNR, 12.4% in STOI, and 20.5% in PESQ on average but also maintains practicality.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"10 9","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Laser-Powered Vibrotactile Rendering 激光驱动的振动触觉渲染

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631449

Yuning Su, Yuhua Jin, Zhengqing Wang, Yonghao Shi, Da-Yuan Huang, Teng Han, Xing-Dong Yang

We investigate the feasibility of a vibrotactile device that is both battery-free and electronic-free. Our approach leverages lasers as a wireless power transfer and haptic control mechanism, which can drive small actuators commonly used in AR/VR and mobile applications with DC or AC signals. To validate the feasibility of our method, we developed a proof-of-concept prototype that includes low-cost eccentric rotating mass (ERM) motors and linear resonant actuators (LRAs) connected to photovoltaic (PV) cells. This prototype enabled us to capture laser energy from any distance across a room and analyze the impact of critical parameters on the effectiveness of our approach. Through a user study, testing 16 different vibration patterns rendered using either a single motor or two motors, we demonstrate the effectiveness of our approach in generating vibration patterns of comparable quality to a baseline, which rendered the patterns using a signal generator.

我们研究了一种无需电池和电子设备的振动触觉设备的可行性。我们的方法利用激光作为无线电力传输和触觉控制机制，它可以用直流或交流信号驱动 AR/VR 和移动应用中常用的小型致动器。为了验证我们方法的可行性，我们开发了一个概念验证原型，其中包括连接到光伏（PV）电池的低成本偏心旋转质量（ERM）电机和线性谐振致动器（LRA）。该原型使我们能够从房间的任何距离捕捉激光能量，并分析关键参数对我们方法有效性的影响。通过用户研究，测试使用单个电机或两个电机呈现的 16 种不同振动模式，我们证明了我们的方法在生成与使用信号发生器呈现模式的基线质量相当的振动模式方面的有效性。

引用次数: 0

Effects of Uncertain Trajectory Prediction Visualization in Highly Automated Vehicles on Trust, Situation Awareness, and Cognitive Load 高度自动驾驶汽车中的不确定轨迹预测可视化对信任、情景意识和认知负荷的影响

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631408

Mark Colley, Oliver Speidel, Jan Strohbeck, J. Rixen, Janina Belz, Enrico Rukzio

Automated vehicles are expected to improve safety, mobility, and inclusion. User acceptance is required for the successful introduction of this technology. One essential prerequisite for acceptance is appropriately trusting the vehicle's capabilities. System transparency via visualizing internal information could calibrate this trust by enabling the surveillance of the vehicle's detection and prediction capabilities, including its failures. Additionally, concurrently increased situation awareness could improve take-overs in case of emergency. This work reports the results of two online comparative video-based studies on visualizing prediction and maneuver-planning information. Effects on trust, cognitive load, and situation awareness were measured using a simulation (N=280) and state-of-the-art road user prediction and maneuver planning on a pre-recorded real-world video using a real prototype (N=238). Results show that color conveys uncertainty best, that the planned trajectory increased trust, and that the visualization of other predicted trajectories improved perceived safety.

自动驾驶汽车有望提高安全性、机动性和包容性。要成功引入这项技术，用户必须接受。接受的一个基本前提是适当信任车辆的能力。通过可视化内部信息实现系统透明化，可以监控车辆的检测和预测能力，包括其故障，从而校准这种信任。此外，同时增强的态势感知能力可以改善紧急情况下的接管能力。这项工作报告了两项基于视频的在线比较研究结果，研究内容是预测和机动规划信息的可视化。研究使用模拟（280 人）和使用真实原型（238 人）在预先录制的真实世界视频上测量了对信任、认知负荷和态势感知的影响。结果表明，颜色最能体现不确定性，规划的轨迹增加了信任感，其他预测轨迹的可视化提高了感知安全性。

{"title":"Effects of Uncertain Trajectory Prediction Visualization in Highly Automated Vehicles on Trust, Situation Awareness, and Cognitive Load","authors":"Mark Colley, Oliver Speidel, Jan Strohbeck, J. Rixen, Janina Belz, Enrico Rukzio","doi":"10.1145/3631408","DOIUrl":"https://doi.org/10.1145/3631408","url":null,"abstract":"Automated vehicles are expected to improve safety, mobility, and inclusion. User acceptance is required for the successful introduction of this technology. One essential prerequisite for acceptance is appropriately trusting the vehicle's capabilities. System transparency via visualizing internal information could calibrate this trust by enabling the surveillance of the vehicle's detection and prediction capabilities, including its failures. Additionally, concurrently increased situation awareness could improve take-overs in case of emergency. This work reports the results of two online comparative video-based studies on visualizing prediction and maneuver-planning information. Effects on trust, cognitive load, and situation awareness were measured using a simulation (N=280) and state-of-the-art road user prediction and maneuver planning on a pre-recorded real-world video using a real prototype (N=238). Results show that color conveys uncertainty best, that the planned trajectory increased trust, and that the visualization of other predicted trajectories improved perceived safety.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 12","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LoCal LoCal

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2024-01-12 DOI: 10.1145/3631436

Duo Zhang, Xusheng Zhang, Yaxiong Xie, Fusang Zhang, Xuanzhi Wang, Yang Li, Daqing Zhang

Millimeter wave (mmWave) radar excels in accurately estimating the distance, speed, and angle of the signal reflectors relative to the radar. However, for diverse sensing applications reliant on radar's tracking capability, these estimates must be transformed from radar to room coordinates. This transformation hinges on the mmWave radar's location attribute, encompassing its position and orientation in room coordinates. Traditional outdoor calibration solutions for autonomous driving utilize corner reflectors as static reference points to derive the location attribute. When deployed in the indoor environment, it is challenging, even for the mmWave radar with GHz bandwidth and a large antenna array, to separate the static reference points from other multipath reflectors. To tackle the static multipath, we propose to deploy a moving reference point (a moving robot) to fully harness the velocity resolution of mmWave radar. Specifically, we select a SLAM-capable robot to accurately obtain its locations under room coordinates during motion, without requiring human intervention. Accurately pairing the locations of the robot under two coordinate systems requires tight synchronization between the mmWave radar and the robot. We therefore propose a novel trajectory correspondence based calibration algorithm that takes the estimated trajectories of two systems as input, decoupling the operations of two systems to the maximum. Extensive experimental results demonstrate that the proposed calibration solution exhibits very high accuracy (1.74 cm and 0.43° accuracy for location and orientation respectively) and could ensure outstanding performance in three representative applications: fall detection, point cloud fusion, and long-distance human tracking.

毫米波（mmWave）雷达在准确估计信号反射器相对于雷达的距离、速度和角度方面表现出色。然而，对于依赖雷达跟踪能力的各种传感应用来说，这些估计值必须从雷达转换到室内坐标。这种转换取决于毫米波雷达的位置属性，包括其在房间坐标中的位置和方向。传统的自动驾驶室外校准解决方案利用角反射器作为静态参考点来推导位置属性。在室内环境中部署时，即使是具有 GHz 带宽和大型天线阵列的毫米波雷达，要将静态参考点与其他多径反射体分开也是一项挑战。为了解决静态多径问题，我们建议部署一个移动参考点（移动机器人），以充分利用毫米波雷达的速度分辨率。具体来说，我们选择一个具有 SLAM 功能的机器人，以便在运动过程中根据房间坐标准确获取其位置，而无需人工干预。在两个坐标系下精确配对机器人的位置需要毫米波雷达和机器人之间的紧密同步。因此，我们提出了一种基于轨迹对应的新型校准算法，该算法将两个系统的估计轨迹作为输入，最大限度地解耦了两个系统的操作。广泛的实验结果表明，所提出的校准解决方案具有极高的精度（定位和定向精度分别为 1.74 厘米和 0.43°），可确保在跌倒检测、点云融合和远距离人体跟踪这三个具有代表性的应用中发挥出色的性能。

{"title":"LoCal","authors":"Duo Zhang, Xusheng Zhang, Yaxiong Xie, Fusang Zhang, Xuanzhi Wang, Yang Li, Daqing Zhang","doi":"10.1145/3631436","DOIUrl":"https://doi.org/10.1145/3631436","url":null,"abstract":"Millimeter wave (mmWave) radar excels in accurately estimating the distance, speed, and angle of the signal reflectors relative to the radar. However, for diverse sensing applications reliant on radar's tracking capability, these estimates must be transformed from radar to room coordinates. This transformation hinges on the mmWave radar's location attribute, encompassing its position and orientation in room coordinates. Traditional outdoor calibration solutions for autonomous driving utilize corner reflectors as static reference points to derive the location attribute. When deployed in the indoor environment, it is challenging, even for the mmWave radar with GHz bandwidth and a large antenna array, to separate the static reference points from other multipath reflectors. To tackle the static multipath, we propose to deploy a moving reference point (a moving robot) to fully harness the velocity resolution of mmWave radar. Specifically, we select a SLAM-capable robot to accurately obtain its locations under room coordinates during motion, without requiring human intervention. Accurately pairing the locations of the robot under two coordinate systems requires tight synchronization between the mmWave radar and the robot. We therefore propose a novel trajectory correspondence based calibration algorithm that takes the estimated trajectories of two systems as input, decoupling the operations of two systems to the maximum. Extensive experimental results demonstrate that the proposed calibration solution exhibits very high accuracy (1.74 cm and 0.43° accuracy for location and orientation respectively) and could ensure outstanding performance in three representative applications: fall detection, point cloud fusion, and long-distance human tracking.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 42","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0