Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies最新文献_第10页

InfoPrint 信息打印

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610933

Weiwei Jiang, Chaofan Wang, Zhanna Sarsenbayeva, Andrew Irlitti, Jing Wei, Jarrod Knibbe, Tilman Dingler, Jorge Goncalves, Vassilis Kostakos

We present a fully-printable method to embed interactive information inside 3D printed objects. The information is invisible to the human eye and can be read using thermal imaging after temperature transfer through interaction with the objects. Prior methods either modify the surface appearance, require customized devices or not commonly used materials, or embed components that are not fully 3D printable. Such limitations restrict the design space for 3D prints, or cannot be readily applied to the already deployed 3D printing setups. In this paper, we present an information embedding technique using low-cost off-the-shelf dual extruder FDM (Fused Deposition Modeling) 3D printers, common materials (e.g., generic PLA), and a mobile thermal device (e.g., a thermal smartphone), by leveraging the thermal properties of common 3D print materials. In addition, we show our method can also be generalized to conventional near-infrared imaging scenarios. We evaluate our technique against multiple design and fabrication parameters and propose a design guideline for different use cases. Finally, we demonstrate various everyday applications enabled by our method, such as interactive thermal displays, user-activated augmented reality, automating thermal triggered events, and hidden tokens for social activities.

我们提出了一种完全可打印的方法，将交互式信息嵌入到3D打印对象中。这些信息是人眼看不见的，通过与物体的相互作用进行温度传递后，可以使用热成像来读取。先前的方法要么修改表面外观，要么需要定制设备或不常用的材料，要么嵌入不能完全3D打印的组件。这些限制限制了3D打印的设计空间，或者不能很容易地应用于已经部署的3D打印设置。在本文中，我们提出了一种信息嵌入技术，利用低成本的现成双挤出机FDM(熔融沉积建模)3D打印机，常见材料(如通用PLA)和移动热设备(如热智能手机)，利用常见3D打印材料的热特性。此外，我们的方法也可以推广到传统的近红外成像场景。我们针对多种设计和制造参数评估了我们的技术，并针对不同的用例提出了设计指南。最后，我们演示了通过我们的方法启用的各种日常应用程序，例如交互式热显示、用户激活的增强现实、自动热触发事件和用于社交活动的隐藏令牌。

{"title":"InfoPrint","authors":"Weiwei Jiang, Chaofan Wang, Zhanna Sarsenbayeva, Andrew Irlitti, Jing Wei, Jarrod Knibbe, Tilman Dingler, Jorge Goncalves, Vassilis Kostakos","doi":"10.1145/3610933","DOIUrl":"https://doi.org/10.1145/3610933","url":null,"abstract":"We present a fully-printable method to embed interactive information inside 3D printed objects. The information is invisible to the human eye and can be read using thermal imaging after temperature transfer through interaction with the objects. Prior methods either modify the surface appearance, require customized devices or not commonly used materials, or embed components that are not fully 3D printable. Such limitations restrict the design space for 3D prints, or cannot be readily applied to the already deployed 3D printing setups. In this paper, we present an information embedding technique using low-cost off-the-shelf dual extruder FDM (Fused Deposition Modeling) 3D printers, common materials (e.g., generic PLA), and a mobile thermal device (e.g., a thermal smartphone), by leveraging the thermal properties of common 3D print materials. In addition, we show our method can also be generalized to conventional near-infrared imaging scenarios. We evaluate our technique against multiple design and fabrication parameters and propose a design guideline for different use cases. Finally, we demonstrate various everyday applications enabled by our method, such as interactive thermal displays, user-activated augmented reality, automating thermal triggered events, and hidden tokens for social activities.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AttFL 附件

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610917

JaeYeon Park, Kichang Lee, Sungmin Lee, Mi Zhang, JeongGil Ko

This work presents AttFL, a federated learning framework designed to continuously improve a personalized deep neural network for efficiently analyzing time-series data generated from mobile and embedded sensing applications. To better characterize time-series data features and efficiently abstract model parameters, AttFL appends a set of attention modules to the baseline deep learning model and exchanges their feature map information to gather collective knowledge across distributed local devices at the server. The server groups devices with similar contextual goals using cosine similarity, and redistributes updated model parameters for improved inference performance at each local device. Specifically, unlike previously proposed federated learning frameworks, AttFL is designed specifically to perform well for various recurrent neural network (RNN) baseline models, making it suitable for many mobile and embedded sensing applications producing time-series sensing data. We evaluate the performance of AttFL and compare with five state-of-the-art federated learning frameworks using three popular mobile/embedded sensing applications (e.g., physiological signal analysis, human activity recognition, and audio processing). Our results obtained from CPU core-based emulations and a 12-node embedded platform testbed shows that AttFL outperforms all alternative approaches in terms of model accuracy and communication/computational overhead, and is flexible enough to be applied in various application scenarios exploiting different baseline deep learning model architectures.

这项工作提出了AttFL，这是一个联邦学习框架，旨在不断改进个性化深度神经网络，以有效分析从移动和嵌入式传感应用中生成的时间序列数据。为了更好地表征时间序列数据特征并有效地抽象模型参数，AttFL将一组注意力模块附加到基线深度学习模型中，并交换其特征映射信息，以收集服务器上分布式本地设备的集体知识。服务器使用余弦相似性对具有相似上下文目标的设备进行分组，并重新分配更新的模型参数，以便在每个本地设备上提高推理性能。具体来说，与之前提出的联邦学习框架不同，AttFL专门设计用于各种循环神经网络(RNN)基线模型，使其适用于许多产生时间序列传感数据的移动和嵌入式传感应用。我们评估了AttFL的性能，并使用三种流行的移动/嵌入式传感应用(例如，生理信号分析，人类活动识别和音频处理)与五种最先进的联邦学习框架进行了比较。我们从基于CPU内核的仿真和12节点嵌入式平台测试平台获得的结果表明，AttFL在模型精度和通信/计算开销方面优于所有替代方法，并且足够灵活，可以应用于利用不同基线深度学习模型架构的各种应用场景。

{"title":"AttFL","authors":"JaeYeon Park, Kichang Lee, Sungmin Lee, Mi Zhang, JeongGil Ko","doi":"10.1145/3610917","DOIUrl":"https://doi.org/10.1145/3610917","url":null,"abstract":"This work presents AttFL, a federated learning framework designed to continuously improve a personalized deep neural network for efficiently analyzing time-series data generated from mobile and embedded sensing applications. To better characterize time-series data features and efficiently abstract model parameters, AttFL appends a set of attention modules to the baseline deep learning model and exchanges their feature map information to gather collective knowledge across distributed local devices at the server. The server groups devices with similar contextual goals using cosine similarity, and redistributes updated model parameters for improved inference performance at each local device. Specifically, unlike previously proposed federated learning frameworks, AttFL is designed specifically to perform well for various recurrent neural network (RNN) baseline models, making it suitable for many mobile and embedded sensing applications producing time-series sensing data. We evaluate the performance of AttFL and compare with five state-of-the-art federated learning frameworks using three popular mobile/embedded sensing applications (e.g., physiological signal analysis, human activity recognition, and audio processing). Our results obtained from CPU core-based emulations and a 12-node embedded platform testbed shows that AttFL outperforms all alternative approaches in terms of model accuracy and communication/computational overhead, and is flexible enough to be applied in various application scenarios exploiting different baseline deep learning model architectures.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-technology Communication between Visible Light and Battery-free RFIDs 可见光和无电池rfid之间的跨技术通信

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610883

Ge Wang, Lubing Han, Yuance Chang, Yuting Shi, Chen Qian, Cong Zhao, Han Ding, Wei Xi, Cui Zhao, Jizhong Zhao

The ubiquity of illumination facilities enables the versatile development of Visible Light Communication (VLC). VLC-based research achieved high-speed wireless access and decimeter-level indoor localization with complex equipment. However, it is still unclear whether the VLC is applicable for widely-used battery-free Internet-of-Things nodes, e.g., passive RFIDs. This paper proposes LightSign, the first cross-technology system that enables passive RFID tags to receive visible light messages. LightSign is compatible with commercial protocols, transparent to routine RFID communications, and invisible to human eyes. We propose a pseudo-timing instruction to achieve microsecond-level light switching to modulate the VLC message. To make it perceptible to passive RFIDs, we design an augmented RFID tag and prove its effectiveness theoretically and experimentally. With only one reply from an augmented tag, LightSign can decode 100-bit-long VLC messages. We evaluate LightSign in real industry environments and test its performance with two use cases. The results show that LightSign achieves up to 99.2% decoding accuracy in varying scenarios.

无所不在的照明设备使可见光通信(VLC)的多用途发展成为可能。基于vlc的研究实现了复杂设备下的高速无线接入和分米级室内定位。然而，目前尚不清楚VLC是否适用于广泛使用的无电池物联网节点，例如无源rfid。本文提出了LightSign，这是第一个使无源RFID标签能够接收可见光信息的跨技术系统。LightSign兼容商业协议，对常规RFID通信透明，人眼不可见。我们提出了一种伪定时指令来实现微秒级的光开关来调制VLC信息。为了使其能够被无源RFID感知，我们设计了一种增强型RFID标签，并从理论上和实验上证明了其有效性。只需要一个增强标签的回复，LightSign就可以解码100位长的VLC消息。我们在真实的工业环境中评估LightSign，并通过两个用例测试其性能。结果表明，LightSign在不同场景下的解码准确率高达99.2%。

{"title":"Cross-technology Communication between Visible Light and Battery-free RFIDs","authors":"Ge Wang, Lubing Han, Yuance Chang, Yuting Shi, Chen Qian, Cong Zhao, Han Ding, Wei Xi, Cui Zhao, Jizhong Zhao","doi":"10.1145/3610883","DOIUrl":"https://doi.org/10.1145/3610883","url":null,"abstract":"The ubiquity of illumination facilities enables the versatile development of Visible Light Communication (VLC). VLC-based research achieved high-speed wireless access and decimeter-level indoor localization with complex equipment. However, it is still unclear whether the VLC is applicable for widely-used battery-free Internet-of-Things nodes, e.g., passive RFIDs. This paper proposes LightSign, the first cross-technology system that enables passive RFID tags to receive visible light messages. LightSign is compatible with commercial protocols, transparent to routine RFID communications, and invisible to human eyes. We propose a pseudo-timing instruction to achieve microsecond-level light switching to modulate the VLC message. To make it perceptible to passive RFIDs, we design an augmented RFID tag and prove its effectiveness theoretically and experimentally. With only one reply from an augmented tag, LightSign can decode 100-bit-long VLC messages. We evaluate LightSign in real industry environments and test its performance with two use cases. The results show that LightSign achieves up to 99.2% decoding accuracy in varying scenarios.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MI-Poser MI-Poser

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610891

Riku Arakawa, Bing Zhou, Gurunandan Krishnan, Mayank Goel, Shree K. Nayar

Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.

使用可穿戴传感器对人体姿势进行由内到外的跟踪，在AR/VR应用中具有巨大的潜力，例如通过具有表达肢体语言的3D化身进行远程通信。目前由内而外的系统通常依赖于基于视觉的方法，利用手持控制器或结合密集分布的身体穿戴式IMU传感器。前者限制了免提和遮挡健壮的交互，而后者则受到精度不足和抖动的困扰。我们介绍了一种新颖的身体跟踪系统MI-Poser，它使用AR眼镜和两个手腕佩戴的电磁场(EMF)传感器来实现高保真的上半身姿势估计，同时减少金属干扰。我们的轻量级系统显示了最小的误差(6.6厘米的平均关节位置误差)，从10名参与者收集的真实世界数据。它对各种上半身运动保持稳健，并在60赫兹下有效运行。此外，通过将IMU传感器与EMF传感器结合在一起，MI-Poser提供了抵消金属干扰影响的解决方案，金属干扰在跟踪过程中固有地破坏了EMF信号。我们的评估有效地展示了使用我们的EMF-IMU融合方法在不同金属轮廓的环境中成功检测和纠正干扰。最终，MI-Poser提供了一个实用的姿势跟踪系统，特别适合于以身体为中心的AR应用。

{"title":"MI-Poser","authors":"Riku Arakawa, Bing Zhou, Gurunandan Krishnan, Mayank Goel, Shree K. Nayar","doi":"10.1145/3610891","DOIUrl":"https://doi.org/10.1145/3610891","url":null,"abstract":"Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SignRing SignRing

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610881

Jiyang Li, Lin Huang, Siddharth Shah, Sean J. Jones, Yincheng Jin, Dingran Wang, Adam Russell, Seokmin Choi, Yang Gao, Junsong Yuan, Zhanpeng Jin

Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.

手语是聋人、重听人广泛使用的一种自然语言。先进的可穿戴设备可以自动识别手语。然而，由于缺乏标记数据的限制，即使在数据收集方面付出了艰苦的努力，词汇量也很少，性能也不理想。SignRing是一个基于IMU的系统，它突破了传统的数据增强方法，利用在线视频生成虚拟IMU (v-IMU)数据，突破了基于可穿戴系统的边界，词汇量达到934，句子最多16种。v-IMU数据是通过从两视图视频中重建3D手部运动并计算3轴加速度数据生成的，通过这种方法，我们能够在一半v-IMU和一半IMU训练数据(各2339个样本)混合的情况下实现6.3%的单词错误率(WER)， 100% v-IMU训练数据(6048个样本)的错误率为14.7%，相比之下，8.3%的WER(用2339个IMU数据样本训练)的基线性能。我们对v-IMU和IMU数据进行了比较，以证明v-IMU数据的可靠性和通用性。这项跨学科的工作涵盖了可穿戴传感器开发、计算机视觉技术、深度学习和语言学等各个领域，可以为具有类似研究目标的研究人员提供有价值的见解。

{"title":"SignRing","authors":"Jiyang Li, Lin Huang, Siddharth Shah, Sean J. Jones, Yincheng Jin, Dingran Wang, Adam Russell, Seokmin Choi, Yang Gao, Junsong Yuan, Zhanpeng Jin","doi":"10.1145/3610881","DOIUrl":"https://doi.org/10.1145/3610881","url":null,"abstract":"Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PulmoListener PulmoListener

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610889

Sejal Bhalla, Salaar Liaqat, Robert Wu, Andrea S. Gershon, Eyal de Lara, Alex Mariakakis

Prior work has shown the utility of acoustic analysis in controlled settings for assessing chronic obstructive pulmonary disease (COPD) --- one of the most common respiratory diseases that impacts millions of people worldwide. However, such assessments require active user input and may not represent the true characteristics of a patient's voice. We propose PulmoListener, an end-to-end speech processing pipeline that identifies segments of the patient's speech from smartwatch audio collected during daily living and analyzes them to classify COPD symptom severity. To evaluate our approach, we conducted a study with 8 COPD patients over 164 ± 92 days on average. We found that PulmoListener achieved an average sensitivity of 0.79 ± 0.03 and a specificity of 0.83 ± 0.05 per patient when classifying their symptom severity on the same day. PulmoListener can also predict the severity level up to 4 days in advance with an average sensitivity of 0.75 ± 0.02 and a specificity of 0.74 ± 0.07. The results of our study demonstrate the feasibility of leveraging natural speech for monitoring COPD in real-world settings, offering a promising solution for disease management and even diagnosis.

先前的工作表明，声学分析在受控环境中用于评估慢性阻塞性肺疾病(COPD)的效用，慢性阻塞性肺疾病是影响全球数百万人的最常见的呼吸系统疾病之一。然而，这种评估需要用户主动输入，可能不能代表患者声音的真实特征。我们提出PulmoListener，一个端到端语音处理管道，从日常生活中收集的智能手表音频中识别患者的语音片段，并对其进行分析，以分类COPD症状的严重程度。为了评估我们的方法，我们对8名COPD患者进行了一项研究，平均随访时间为164±92天。我们发现PulmoListener在对患者同一天的症状严重程度进行分类时，平均灵敏度为0.79±0.03，特异性为0.83±0.05。PulmoListener还可以提前4天预测严重程度，平均灵敏度为0.75±0.02，特异性为0.74±0.07。我们的研究结果证明了在现实环境中利用自然语音监测COPD的可行性，为疾病管理甚至诊断提供了一个有希望的解决方案。

{"title":"PulmoListener","authors":"Sejal Bhalla, Salaar Liaqat, Robert Wu, Andrea S. Gershon, Eyal de Lara, Alex Mariakakis","doi":"10.1145/3610889","DOIUrl":"https://doi.org/10.1145/3610889","url":null,"abstract":"Prior work has shown the utility of acoustic analysis in controlled settings for assessing chronic obstructive pulmonary disease (COPD) --- one of the most common respiratory diseases that impacts millions of people worldwide. However, such assessments require active user input and may not represent the true characteristics of a patient's voice. We propose PulmoListener, an end-to-end speech processing pipeline that identifies segments of the patient's speech from smartwatch audio collected during daily living and analyzes them to classify COPD symptom severity. To evaluate our approach, we conducted a study with 8 COPD patients over 164 ± 92 days on average. We found that PulmoListener achieved an average sensitivity of 0.79 ± 0.03 and a specificity of 0.83 ± 0.05 per patient when classifying their symptom severity on the same day. PulmoListener can also predict the severity level up to 4 days in advance with an average sensitivity of 0.75 ± 0.02 and a specificity of 0.74 ± 0.07. The results of our study demonstrate the feasibility of leveraging natural speech for monitoring COPD in real-world settings, offering a promising solution for disease management and even diagnosis.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Radio Map Construction with Domain Disentangled Learning for Wireless Localization 基于领域解纠缠学习的无线定位快速无线电地图构建

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610922

Weina Jiang, Lin Shi, Qun Niu, Ning Liu

The accuracy of wireless fingerprint-based indoor localization largely depends on the precision and density of radio maps. Although many research efforts have been devoted to incremental updating of radio maps, few consider the laborious initial construction of a new site. In this work, we propose an accurate and generalizable framework for efficient radio map construction, which takes advantage of readily-available fine-grained radio maps and constructs fine-grained radio maps of a new site with a small proportion of measurements in it. Specifically, we regard radio maps as domains and propose a Radio Map construction approach based on Domain Adaptation (RMDA). We first employ the domain disentanglement feature extractor to learn domain-invariant features for aligning the source domains (available radio maps) with the target domain (initial radio map) in the domain-invariant latent space. Furthermore, we propose a dynamic weighting strategy, which learns the relevancy of the source and target domain in the domain adaptation. Then, we extract the domain-specific features based on the site's floorplan and use them to constrain the super-resolution of the domain-invariant features. Experimental results demonstrate that RMDA constructs a fine-grained initial radio map of a target site efficiently with a limited number of measurements. Meanwhile, the localization accuracy of the refined radio map with RMDA significantly improved by about 41.35% after construction and is comparable with the dense surveyed radio map (the reduction is less than 8%).

基于无线指纹的室内定位的精度很大程度上取决于无线地图的精度和密度。虽然许多研究工作都致力于逐步更新无线电地图，但很少有人考虑到一个新站点的艰苦初始建设。在这项工作中，我们提出了一个精确和可推广的框架，用于高效的无线电地图构建，该框架利用现成的细粒度无线电地图，并在其中构建具有小比例测量的新站点的细粒度无线电地图。具体而言，我们将无线电地图视为域，提出了一种基于域自适应(RMDA)的无线电地图构建方法。我们首先使用域解纠缠特征提取器来学习域不变特征，以便在域不变潜在空间中将源域(可用无线电波图)与目标域(初始无线电波图)对齐。此外，我们提出了一种动态加权策略，该策略在域适应中学习源域和目标域的相关性。然后，我们根据场地平面图提取特定领域的特征，并利用它们来约束领域不变特征的超分辨率。实验结果表明，RMDA可以在有限的测量次数下有效地构建目标位置的细粒度初始无线电地图。同时，RMDA优化后的射电图定位精度提高了约41.35%，与密集调查射电图的定位精度相当(降低幅度小于8%)。

{"title":"Fast Radio Map Construction with Domain Disentangled Learning for Wireless Localization","authors":"Weina Jiang, Lin Shi, Qun Niu, Ning Liu","doi":"10.1145/3610922","DOIUrl":"https://doi.org/10.1145/3610922","url":null,"abstract":"The accuracy of wireless fingerprint-based indoor localization largely depends on the precision and density of radio maps. Although many research efforts have been devoted to incremental updating of radio maps, few consider the laborious initial construction of a new site. In this work, we propose an accurate and generalizable framework for efficient radio map construction, which takes advantage of readily-available fine-grained radio maps and constructs fine-grained radio maps of a new site with a small proportion of measurements in it. Specifically, we regard radio maps as domains and propose a Radio Map construction approach based on Domain Adaptation (RMDA). We first employ the domain disentanglement feature extractor to learn domain-invariant features for aligning the source domains (available radio maps) with the target domain (initial radio map) in the domain-invariant latent space. Furthermore, we propose a dynamic weighting strategy, which learns the relevancy of the source and target domain in the domain adaptation. Then, we extract the domain-specific features based on the site's floorplan and use them to constrain the super-resolution of the domain-invariant features. Experimental results demonstrate that RMDA constructs a fine-grained initial radio map of a target site efficiently with a limited number of measurements. Meanwhile, the localization accuracy of the refined radio map with RMDA significantly improved by about 41.35% after construction and is comparable with the dense surveyed radio map (the reduction is less than 8%).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Interaction Harvesting 交互收获

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610880

John Mamish, Amy Guo, Thomas Cohen, Julian Richey, Yang Zhang, Josiah Hester

Whenever a user interacts with a device, mechanical work is performed to actuate the user interface elements; the resulting energy is typically wasted, dissipated as sound and heat. Previous work has shown that many devices can be powered entirely from this otherwise wasted user interface energy. For these devices, wires and batteries, along with the related hassles of replacement and charging, become unnecessary and onerous. So far, these works have been restricted to proof-of-concept demonstrations; a specific bespoke harvesting and sensing circuit is constructed for the application at hand. The challenge of harvesting energy while simultaneously sensing fine-grained input signals from diverse modalities makes prototyping new devices difficult. To fill this gap, we present a hardware toolkit which provides a common electrical interface for harvesting energy from user interface elements. This facilitates exploring the composability, utility, and breadth of enabled applications of interaction-powered smart devices. We design a set of "energy as input" harvesting circuits, a standard connective interface with 3D printed enclosures, and software libraries to enable the exploration of devices where the user action generates the energy needed to perform the device's primary function. This exploration culminated in a demonstration campaign where we prototype several exemplar popular toys and gadgets, including battery-free Bop-It--- a popular 90s rhythm game, an electronic Etch-a-sketch, a "Simon-Says"-style memory game, and a service rating device. We run exploratory user studies to understand how generativity, creativity, and composability are hampered or facilitated by these devices. These demonstrations, user study takeaways, and the toolkit itself provide a foundation for building interactive and user-focused gadgets whose usability is not affected by battery charge and whose service lifetime is not limited by battery wear.

每当用户与设备交互时，执行机械工作以驱动用户界面元素;产生的能量通常被浪费，以声音和热量的形式消散。先前的工作表明，许多设备可以完全由这种浪费的用户界面能量供电。对于这些设备，电线和电池，以及更换和充电的相关麻烦，变得不必要和繁重。到目前为止，这些工作仅限于概念验证演示;为手头的应用构建了一个特定的定制采集和传感电路。在收集能量的同时，还要感知来自不同模式的细粒度输入信号，这一挑战使得新设备的原型设计变得困难。为了填补这一空白，我们提出了一个硬件工具包，它提供了一个从用户界面元素收集能量的通用电气接口。这有助于探索交互式智能设备的可组合性、实用性和应用的广度。我们设计了一套“能量作为输入”的收集电路，一个带有3D打印外壳的标准连接接口，以及软件库，以便探索用户动作产生执行设备主要功能所需能量的设备。这一探索在演示活动中达到了高潮，我们制作了几个典型的流行玩具和小工具的原型，包括无电池的Bop-It——一款流行的90年代节奏游戏，电子蚀刻草图，“西蒙说”式的记忆游戏，以及服务评级设备。我们进行探索性用户研究，以了解这些设备如何阻碍或促进生成性、创造性和可组合性。这些演示、用户研究要点和工具包本身为构建以用户为中心的交互式小工具提供了基础，这些小工具的可用性不受电池充电的影响，使用寿命不受电池磨损的限制。

{"title":"Interaction Harvesting","authors":"John Mamish, Amy Guo, Thomas Cohen, Julian Richey, Yang Zhang, Josiah Hester","doi":"10.1145/3610880","DOIUrl":"https://doi.org/10.1145/3610880","url":null,"abstract":"Whenever a user interacts with a device, mechanical work is performed to actuate the user interface elements; the resulting energy is typically wasted, dissipated as sound and heat. Previous work has shown that many devices can be powered entirely from this otherwise wasted user interface energy. For these devices, wires and batteries, along with the related hassles of replacement and charging, become unnecessary and onerous. So far, these works have been restricted to proof-of-concept demonstrations; a specific bespoke harvesting and sensing circuit is constructed for the application at hand. The challenge of harvesting energy while simultaneously sensing fine-grained input signals from diverse modalities makes prototyping new devices difficult. To fill this gap, we present a hardware toolkit which provides a common electrical interface for harvesting energy from user interface elements. This facilitates exploring the composability, utility, and breadth of enabled applications of interaction-powered smart devices. We design a set of \"energy as input\" harvesting circuits, a standard connective interface with 3D printed enclosures, and software libraries to enable the exploration of devices where the user action generates the energy needed to perform the device's primary function. This exploration culminated in a demonstration campaign where we prototype several exemplar popular toys and gadgets, including battery-free Bop-It--- a popular 90s rhythm game, an electronic Etch-a-sketch, a \"Simon-Says\"-style memory game, and a service rating device. We run exploratory user studies to understand how generativity, creativity, and composability are hampered or facilitated by these devices. These demonstrations, user study takeaways, and the toolkit itself provide a foundation for building interactive and user-focused gadgets whose usability is not affected by battery charge and whose service lifetime is not limited by battery wear.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

I Know Your Intent 我知道你的意图

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610906

Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, Yong Jiang

With the booming of smart home market, intelligent Internet of Things (IoT) devices have been increasingly involved in home life. To improve the user experience of smart homes, some prior works have explored how to use machine learning for predicting interactions between users and devices. However, the existing solutions have inferior User Device Interaction (UDI) prediction accuracy, as they ignore three key factors: routine, intent and multi-level periodicity of human behaviors. In this paper, we present SmartUDI, a novel accurate UDI prediction approach for smart homes. First, we propose a Message-Passing-based Routine Extraction (MPRE) algorithm to mine routine behaviors, then the contrastive loss is applied to narrow representations among behaviors from the same routines and alienate representations among behaviors from different routines. Second, we propose an Intent-aware Capsule Graph Attention Network (ICGAT) to encode multiple intents of users while considering complex transitions between different behaviors. Third, we design a Cluster-based Historical Attention Mechanism (CHAM) to capture the multi-level periodicity by aggregating the current sequence and the semantically nearest historical sequence representations through the attention mechanism. SmartUDI can be seamlessly deployed on cloud infrastructures of IoT device vendors and edge nodes, enabling the delivery of personalized device service recommendations to users. Comprehensive experiments on four real-world datasets show that SmartUDI consistently outperforms the state-of-the-art baselines with more accurate and highly interpretable results.

随着智能家居市场的蓬勃发展，智能物联网设备越来越多地参与到家庭生活中。为了改善智能家居的用户体验，一些先前的工作已经探索了如何使用机器学习来预测用户和设备之间的交互。然而，现有的解决方案忽略了三个关键因素:人类行为的常规性、意图性和多层次周期性，导致UDI预测精度较低。在本文中，我们提出了一种新的智能家居精确UDI预测方法SmartUDI。首先，我们提出了一种基于消息传递的例程提取(MPRE)算法来挖掘例程行为，然后应用对比损失来缩小来自相同例程的行为之间的表示，并疏远来自不同例程的行为之间的表示。其次，我们提出了一个意图感知的胶囊图注意网络(ICGAT)来编码用户的多个意图，同时考虑不同行为之间的复杂转换。第三，设计了基于聚类的历史关注机制(CHAM)，通过关注机制将当前序列和语义上最近的历史序列表示聚合在一起，捕获多层次的周期性。SmartUDI可以无缝部署在物联网设备供应商和边缘节点的云基础设施上，为用户提供个性化的设备服务推荐。在四个真实数据集上的综合实验表明，SmartUDI始终优于最先进的基线，具有更准确和高度可解释性的结果。

{"title":"I Know Your Intent","authors":"Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, Yong Jiang","doi":"10.1145/3610906","DOIUrl":"https://doi.org/10.1145/3610906","url":null,"abstract":"With the booming of smart home market, intelligent Internet of Things (IoT) devices have been increasingly involved in home life. To improve the user experience of smart homes, some prior works have explored how to use machine learning for predicting interactions between users and devices. However, the existing solutions have inferior User Device Interaction (UDI) prediction accuracy, as they ignore three key factors: routine, intent and multi-level periodicity of human behaviors. In this paper, we present SmartUDI, a novel accurate UDI prediction approach for smart homes. First, we propose a Message-Passing-based Routine Extraction (MPRE) algorithm to mine routine behaviors, then the contrastive loss is applied to narrow representations among behaviors from the same routines and alienate representations among behaviors from different routines. Second, we propose an Intent-aware Capsule Graph Attention Network (ICGAT) to encode multiple intents of users while considering complex transitions between different behaviors. Third, we design a Cluster-based Historical Attention Mechanism (CHAM) to capture the multi-level periodicity by aggregating the current sequence and the semantically nearest historical sequence representations through the attention mechanism. SmartUDI can be seamlessly deployed on cloud infrastructures of IoT device vendors and edge nodes, enabling the delivery of personalized device service recommendations to users. Comprehensive experiments on four real-world datasets show that SmartUDI consistently outperforms the state-of-the-art baselines with more accurate and highly interpretable results.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-Modality Graph-based Language and Sensor Data Co-Learning of Human-Mobility Interaction 基于跨模态图语言和传感器数据的人-移动交互协同学习

Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Pub Date : 2023-09-27 DOI: 10.1145/3610904

Mahan Tabatabaie, Suining He, Kang G. Shin

Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both "spoken language" (e.g., human textual annotations) and "unspoken language" (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken "languages", we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.

在交互场景中学习人机交互(HMI)(例如，车辆如何在十字路口转弯以响应交通灯和其他迎面而至的车辆)可以提高智能移动系统(例如自动驾驶汽车)和许多其他无处不在的计算应用程序的安全性、效率和弹性。为了实现无所不在和可理解的HMI学习，本文从现实HMI场景的信息模态角度考虑了“口头语言”(如人类文本注释)和“非口头语言”(如与HMI场景相关的基于视觉和传感器的行为移动信息)。我们的目标是通过一种新的人类语言和传感器数据共同学习设计，从文本注释(由人类注释者提供)中提取重要但可能隐含的HMI概念(作为命名实体)。为此，我们提出了CG-HMI，一种新的跨模态图融合方法，用于从文本注释以及视觉和行为传感器数据的共同学习中提取重要的人类移动交互概念。为了融合非言语和口头的“语言”，我们为与HMI场景相关的每种模态设计了一个统一的表示，称为人-移动交互图(HMIG)，即文本注释、视觉视频帧和行为传感器时间序列(例如，来自车载或智能手机惯性测量单元)。这些模式中的HMIG节点对应于与HMI概念、检测到的交通参与者/环境类别以及从行为传感器时间序列确定的车辆机动行为类型相关的文本单词(为便于处理而标记)。为了提取HMIG中模态间和模态内的语义对应和交互，我们设计了一种基于可微池的图注意的图交互融合方法。然后对生成的图嵌入进行处理，以识别和检索注释中的HMI概念，这有利于下游的人机交互和无处不在的计算应用程序。我们已经将CG-HMI开发并实现为系统原型，并对三个现实世界的HMI数据集(两个关于汽车驾驶，第三个关于电动滑板车骑)进行了广泛的研究。我们已经证实了CG-HMI通过跨模态学习在识别和提取重要HMI概念方面的优异性能(在精度、召回率和F1测量方面平均比其他基线高出13.11%)和有效性。我们的CG-HMI研究还提供了驾驶员和其他交通参与者之间互动的现实意义(例如，道路安全和驾驶行为)。

{"title":"Cross-Modality Graph-based Language and Sensor Data Co-Learning of Human-Mobility Interaction","authors":"Mahan Tabatabaie, Suining He, Kang G. Shin","doi":"10.1145/3610904","DOIUrl":"https://doi.org/10.1145/3610904","url":null,"abstract":"Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both \"spoken language\" (e.g., human textual annotations) and \"unspoken language\" (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken \"languages\", we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0