Ge Wang, Lubing Han, Yuance Chang, Yuting Shi, Chen Qian, Cong Zhao, Han Ding, Wei Xi, Cui Zhao, Jizhong Zhao
The ubiquity of illumination facilities enables the versatile development of Visible Light Communication (VLC). VLC-based research achieved high-speed wireless access and decimeter-level indoor localization with complex equipment. However, it is still unclear whether the VLC is applicable for widely-used battery-free Internet-of-Things nodes, e.g., passive RFIDs. This paper proposes LightSign, the first cross-technology system that enables passive RFID tags to receive visible light messages. LightSign is compatible with commercial protocols, transparent to routine RFID communications, and invisible to human eyes. We propose a pseudo-timing instruction to achieve microsecond-level light switching to modulate the VLC message. To make it perceptible to passive RFIDs, we design an augmented RFID tag and prove its effectiveness theoretically and experimentally. With only one reply from an augmented tag, LightSign can decode 100-bit-long VLC messages. We evaluate LightSign in real industry environments and test its performance with two use cases. The results show that LightSign achieves up to 99.2% decoding accuracy in varying scenarios.
{"title":"Cross-technology Communication between Visible Light and Battery-free RFIDs","authors":"Ge Wang, Lubing Han, Yuance Chang, Yuting Shi, Chen Qian, Cong Zhao, Han Ding, Wei Xi, Cui Zhao, Jizhong Zhao","doi":"10.1145/3610883","DOIUrl":"https://doi.org/10.1145/3610883","url":null,"abstract":"The ubiquity of illumination facilities enables the versatile development of Visible Light Communication (VLC). VLC-based research achieved high-speed wireless access and decimeter-level indoor localization with complex equipment. However, it is still unclear whether the VLC is applicable for widely-used battery-free Internet-of-Things nodes, e.g., passive RFIDs. This paper proposes LightSign, the first cross-technology system that enables passive RFID tags to receive visible light messages. LightSign is compatible with commercial protocols, transparent to routine RFID communications, and invisible to human eyes. We propose a pseudo-timing instruction to achieve microsecond-level light switching to modulate the VLC message. To make it perceptible to passive RFIDs, we design an augmented RFID tag and prove its effectiveness theoretically and experimentally. With only one reply from an augmented tag, LightSign can decode 100-bit-long VLC messages. We evaluate LightSign in real industry environments and test its performance with two use cases. The results show that LightSign achieves up to 99.2% decoding accuracy in varying scenarios.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.
{"title":"MI-Poser","authors":"Riku Arakawa, Bing Zhou, Gurunandan Krishnan, Mayank Goel, Shree K. Nayar","doi":"10.1145/3610891","DOIUrl":"https://doi.org/10.1145/3610891","url":null,"abstract":"Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyang Li, Lin Huang, Siddharth Shah, Sean J. Jones, Yincheng Jin, Dingran Wang, Adam Russell, Seokmin Choi, Yang Gao, Junsong Yuan, Zhanpeng Jin
Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.
{"title":"SignRing","authors":"Jiyang Li, Lin Huang, Siddharth Shah, Sean J. Jones, Yincheng Jin, Dingran Wang, Adam Russell, Seokmin Choi, Yang Gao, Junsong Yuan, Zhanpeng Jin","doi":"10.1145/3610881","DOIUrl":"https://doi.org/10.1145/3610881","url":null,"abstract":"Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sejal Bhalla, Salaar Liaqat, Robert Wu, Andrea S. Gershon, Eyal de Lara, Alex Mariakakis
Prior work has shown the utility of acoustic analysis in controlled settings for assessing chronic obstructive pulmonary disease (COPD) --- one of the most common respiratory diseases that impacts millions of people worldwide. However, such assessments require active user input and may not represent the true characteristics of a patient's voice. We propose PulmoListener, an end-to-end speech processing pipeline that identifies segments of the patient's speech from smartwatch audio collected during daily living and analyzes them to classify COPD symptom severity. To evaluate our approach, we conducted a study with 8 COPD patients over 164 ± 92 days on average. We found that PulmoListener achieved an average sensitivity of 0.79 ± 0.03 and a specificity of 0.83 ± 0.05 per patient when classifying their symptom severity on the same day. PulmoListener can also predict the severity level up to 4 days in advance with an average sensitivity of 0.75 ± 0.02 and a specificity of 0.74 ± 0.07. The results of our study demonstrate the feasibility of leveraging natural speech for monitoring COPD in real-world settings, offering a promising solution for disease management and even diagnosis.
{"title":"PulmoListener","authors":"Sejal Bhalla, Salaar Liaqat, Robert Wu, Andrea S. Gershon, Eyal de Lara, Alex Mariakakis","doi":"10.1145/3610889","DOIUrl":"https://doi.org/10.1145/3610889","url":null,"abstract":"Prior work has shown the utility of acoustic analysis in controlled settings for assessing chronic obstructive pulmonary disease (COPD) --- one of the most common respiratory diseases that impacts millions of people worldwide. However, such assessments require active user input and may not represent the true characteristics of a patient's voice. We propose PulmoListener, an end-to-end speech processing pipeline that identifies segments of the patient's speech from smartwatch audio collected during daily living and analyzes them to classify COPD symptom severity. To evaluate our approach, we conducted a study with 8 COPD patients over 164 ± 92 days on average. We found that PulmoListener achieved an average sensitivity of 0.79 ± 0.03 and a specificity of 0.83 ± 0.05 per patient when classifying their symptom severity on the same day. PulmoListener can also predict the severity level up to 4 days in advance with an average sensitivity of 0.75 ± 0.02 and a specificity of 0.74 ± 0.07. The results of our study demonstrate the feasibility of leveraging natural speech for monitoring COPD in real-world settings, offering a promising solution for disease management and even diagnosis.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The accuracy of wireless fingerprint-based indoor localization largely depends on the precision and density of radio maps. Although many research efforts have been devoted to incremental updating of radio maps, few consider the laborious initial construction of a new site. In this work, we propose an accurate and generalizable framework for efficient radio map construction, which takes advantage of readily-available fine-grained radio maps and constructs fine-grained radio maps of a new site with a small proportion of measurements in it. Specifically, we regard radio maps as domains and propose a Radio Map construction approach based on Domain Adaptation (RMDA). We first employ the domain disentanglement feature extractor to learn domain-invariant features for aligning the source domains (available radio maps) with the target domain (initial radio map) in the domain-invariant latent space. Furthermore, we propose a dynamic weighting strategy, which learns the relevancy of the source and target domain in the domain adaptation. Then, we extract the domain-specific features based on the site's floorplan and use them to constrain the super-resolution of the domain-invariant features. Experimental results demonstrate that RMDA constructs a fine-grained initial radio map of a target site efficiently with a limited number of measurements. Meanwhile, the localization accuracy of the refined radio map with RMDA significantly improved by about 41.35% after construction and is comparable with the dense surveyed radio map (the reduction is less than 8%).
{"title":"Fast Radio Map Construction with Domain Disentangled Learning for Wireless Localization","authors":"Weina Jiang, Lin Shi, Qun Niu, Ning Liu","doi":"10.1145/3610922","DOIUrl":"https://doi.org/10.1145/3610922","url":null,"abstract":"The accuracy of wireless fingerprint-based indoor localization largely depends on the precision and density of radio maps. Although many research efforts have been devoted to incremental updating of radio maps, few consider the laborious initial construction of a new site. In this work, we propose an accurate and generalizable framework for efficient radio map construction, which takes advantage of readily-available fine-grained radio maps and constructs fine-grained radio maps of a new site with a small proportion of measurements in it. Specifically, we regard radio maps as domains and propose a Radio Map construction approach based on Domain Adaptation (RMDA). We first employ the domain disentanglement feature extractor to learn domain-invariant features for aligning the source domains (available radio maps) with the target domain (initial radio map) in the domain-invariant latent space. Furthermore, we propose a dynamic weighting strategy, which learns the relevancy of the source and target domain in the domain adaptation. Then, we extract the domain-specific features based on the site's floorplan and use them to constrain the super-resolution of the domain-invariant features. Experimental results demonstrate that RMDA constructs a fine-grained initial radio map of a target site efficiently with a limited number of measurements. Meanwhile, the localization accuracy of the refined radio map with RMDA significantly improved by about 41.35% after construction and is comparable with the dense surveyed radio map (the reduction is less than 8%).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Mamish, Amy Guo, Thomas Cohen, Julian Richey, Yang Zhang, Josiah Hester
Whenever a user interacts with a device, mechanical work is performed to actuate the user interface elements; the resulting energy is typically wasted, dissipated as sound and heat. Previous work has shown that many devices can be powered entirely from this otherwise wasted user interface energy. For these devices, wires and batteries, along with the related hassles of replacement and charging, become unnecessary and onerous. So far, these works have been restricted to proof-of-concept demonstrations; a specific bespoke harvesting and sensing circuit is constructed for the application at hand. The challenge of harvesting energy while simultaneously sensing fine-grained input signals from diverse modalities makes prototyping new devices difficult. To fill this gap, we present a hardware toolkit which provides a common electrical interface for harvesting energy from user interface elements. This facilitates exploring the composability, utility, and breadth of enabled applications of interaction-powered smart devices. We design a set of "energy as input" harvesting circuits, a standard connective interface with 3D printed enclosures, and software libraries to enable the exploration of devices where the user action generates the energy needed to perform the device's primary function. This exploration culminated in a demonstration campaign where we prototype several exemplar popular toys and gadgets, including battery-free Bop-It--- a popular 90s rhythm game, an electronic Etch-a-sketch, a "Simon-Says"-style memory game, and a service rating device. We run exploratory user studies to understand how generativity, creativity, and composability are hampered or facilitated by these devices. These demonstrations, user study takeaways, and the toolkit itself provide a foundation for building interactive and user-focused gadgets whose usability is not affected by battery charge and whose service lifetime is not limited by battery wear.
{"title":"Interaction Harvesting","authors":"John Mamish, Amy Guo, Thomas Cohen, Julian Richey, Yang Zhang, Josiah Hester","doi":"10.1145/3610880","DOIUrl":"https://doi.org/10.1145/3610880","url":null,"abstract":"Whenever a user interacts with a device, mechanical work is performed to actuate the user interface elements; the resulting energy is typically wasted, dissipated as sound and heat. Previous work has shown that many devices can be powered entirely from this otherwise wasted user interface energy. For these devices, wires and batteries, along with the related hassles of replacement and charging, become unnecessary and onerous. So far, these works have been restricted to proof-of-concept demonstrations; a specific bespoke harvesting and sensing circuit is constructed for the application at hand. The challenge of harvesting energy while simultaneously sensing fine-grained input signals from diverse modalities makes prototyping new devices difficult. To fill this gap, we present a hardware toolkit which provides a common electrical interface for harvesting energy from user interface elements. This facilitates exploring the composability, utility, and breadth of enabled applications of interaction-powered smart devices. We design a set of \"energy as input\" harvesting circuits, a standard connective interface with 3D printed enclosures, and software libraries to enable the exploration of devices where the user action generates the energy needed to perform the device's primary function. This exploration culminated in a demonstration campaign where we prototype several exemplar popular toys and gadgets, including battery-free Bop-It--- a popular 90s rhythm game, an electronic Etch-a-sketch, a \"Simon-Says\"-style memory game, and a service rating device. We run exploratory user studies to understand how generativity, creativity, and composability are hampered or facilitated by these devices. These demonstrations, user study takeaways, and the toolkit itself provide a foundation for building interactive and user-focused gadgets whose usability is not affected by battery charge and whose service lifetime is not limited by battery wear.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, Yong Jiang
With the booming of smart home market, intelligent Internet of Things (IoT) devices have been increasingly involved in home life. To improve the user experience of smart homes, some prior works have explored how to use machine learning for predicting interactions between users and devices. However, the existing solutions have inferior User Device Interaction (UDI) prediction accuracy, as they ignore three key factors: routine, intent and multi-level periodicity of human behaviors. In this paper, we present SmartUDI, a novel accurate UDI prediction approach for smart homes. First, we propose a Message-Passing-based Routine Extraction (MPRE) algorithm to mine routine behaviors, then the contrastive loss is applied to narrow representations among behaviors from the same routines and alienate representations among behaviors from different routines. Second, we propose an Intent-aware Capsule Graph Attention Network (ICGAT) to encode multiple intents of users while considering complex transitions between different behaviors. Third, we design a Cluster-based Historical Attention Mechanism (CHAM) to capture the multi-level periodicity by aggregating the current sequence and the semantically nearest historical sequence representations through the attention mechanism. SmartUDI can be seamlessly deployed on cloud infrastructures of IoT device vendors and edge nodes, enabling the delivery of personalized device service recommendations to users. Comprehensive experiments on four real-world datasets show that SmartUDI consistently outperforms the state-of-the-art baselines with more accurate and highly interpretable results.
{"title":"I Know Your Intent","authors":"Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, Yong Jiang","doi":"10.1145/3610906","DOIUrl":"https://doi.org/10.1145/3610906","url":null,"abstract":"With the booming of smart home market, intelligent Internet of Things (IoT) devices have been increasingly involved in home life. To improve the user experience of smart homes, some prior works have explored how to use machine learning for predicting interactions between users and devices. However, the existing solutions have inferior User Device Interaction (UDI) prediction accuracy, as they ignore three key factors: routine, intent and multi-level periodicity of human behaviors. In this paper, we present SmartUDI, a novel accurate UDI prediction approach for smart homes. First, we propose a Message-Passing-based Routine Extraction (MPRE) algorithm to mine routine behaviors, then the contrastive loss is applied to narrow representations among behaviors from the same routines and alienate representations among behaviors from different routines. Second, we propose an Intent-aware Capsule Graph Attention Network (ICGAT) to encode multiple intents of users while considering complex transitions between different behaviors. Third, we design a Cluster-based Historical Attention Mechanism (CHAM) to capture the multi-level periodicity by aggregating the current sequence and the semantically nearest historical sequence representations through the attention mechanism. SmartUDI can be seamlessly deployed on cloud infrastructures of IoT device vendors and edge nodes, enabling the delivery of personalized device service recommendations to users. Comprehensive experiments on four real-world datasets show that SmartUDI consistently outperforms the state-of-the-art baselines with more accurate and highly interpretable results.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both "spoken language" (e.g., human textual annotations) and "unspoken language" (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken "languages", we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.
{"title":"Cross-Modality Graph-based Language and Sensor Data Co-Learning of Human-Mobility Interaction","authors":"Mahan Tabatabaie, Suining He, Kang G. Shin","doi":"10.1145/3610904","DOIUrl":"https://doi.org/10.1145/3610904","url":null,"abstract":"Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both \"spoken language\" (e.g., human textual annotations) and \"unspoken language\" (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken \"languages\", we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Translating fine-grained activity detection (e.g., phone ring, talking interspersed with silence and walking) into semantically meaningful and richer contextual information (e.g., on a phone call for 20 minutes while exercising) is essential towards enabling a range of healthcare and human-computer interaction applications. Prior work has proposed building ontologies or temporal analysis of activity patterns with limited success in capturing complex real-world context patterns. We present TAO, a hybrid system that leverages OWL-based ontologies and temporal clustering approaches to detect high-level contexts from human activities. TAO can characterize sequential activities that happen one after the other and activities that are interleaved or occur in parallel to detect a richer set of contexts more accurately than prior work. We evaluate TAO on real-world activity datasets (Casas and Extrasensory) and show that our system achieves, on average, 87% and 80% accuracy for context detection, respectively. We deploy and evaluate TAO in a real-world setting with eight participants using our system for three hours each, demonstrating TAO's ability to capture semantically meaningful contexts in the real world. Finally, to showcase the usefulness of contexts, we prototype wellness applications that assess productivity and stress and show that the wellness metrics calculated using contexts provided by TAO are much closer to the ground truth (on average within 1.1%), as compared to the baseline approach (on average within 30%).
{"title":"TAO","authors":"Sudershan Boovaraghavan, Prasoon Patidar, Yuvraj Agarwal","doi":"10.1145/3610896","DOIUrl":"https://doi.org/10.1145/3610896","url":null,"abstract":"Translating fine-grained activity detection (e.g., phone ring, talking interspersed with silence and walking) into semantically meaningful and richer contextual information (e.g., on a phone call for 20 minutes while exercising) is essential towards enabling a range of healthcare and human-computer interaction applications. Prior work has proposed building ontologies or temporal analysis of activity patterns with limited success in capturing complex real-world context patterns. We present TAO, a hybrid system that leverages OWL-based ontologies and temporal clustering approaches to detect high-level contexts from human activities. TAO can characterize sequential activities that happen one after the other and activities that are interleaved or occur in parallel to detect a richer set of contexts more accurately than prior work. We evaluate TAO on real-world activity datasets (Casas and Extrasensory) and show that our system achieves, on average, 87% and 80% accuracy for context detection, respectively. We deploy and evaluate TAO in a real-world setting with eight participants using our system for three hours each, demonstrating TAO's ability to capture semantically meaningful contexts in the real world. Finally, to showcase the usefulness of contexts, we prototype wellness applications that assess productivity and stress and show that the wellness metrics calculated using contexts provided by TAO are much closer to the ground truth (on average within 1.1%), as compared to the baseline approach (on average within 30%).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The increase of distributed embedded systems has enabled pervasive sensing, actuation, and information displays across buildings and surrounding environments, yet also entreats huge cost expenditure for energy and human labor for maintenance. Our daily interactions, from opening a window to closing a drawer to twisting a doorknob, are great potential sources of energy but are often neglected. Existing commercial devices to harvest energy from these ambient sources are unaffordable, and DIY solutions are left with inaccessibility for non-experts preventing fully imbuing daily innovations in end-users. We present E3D, an end-to-end fabrication toolkit to customize self-powered smart devices at low cost. We contribute to a taxonomy of everyday kinetic activities that are potential sources of energy, a library of parametric mechanisms to harvest energy from manual operations of kinetic objects, and a holistic design system for end-user developers to capture design requirements by demonstrations then customize augmentation devices to harvest energy that meets unique lifestyle.
{"title":"E3D","authors":"Abul Al Arabi, Xue Wang, Yang Zhang, Jeeeun Kim","doi":"10.1145/3610897","DOIUrl":"https://doi.org/10.1145/3610897","url":null,"abstract":"The increase of distributed embedded systems has enabled pervasive sensing, actuation, and information displays across buildings and surrounding environments, yet also entreats huge cost expenditure for energy and human labor for maintenance. Our daily interactions, from opening a window to closing a drawer to twisting a doorknob, are great potential sources of energy but are often neglected. Existing commercial devices to harvest energy from these ambient sources are unaffordable, and DIY solutions are left with inaccessibility for non-experts preventing fully imbuing daily innovations in end-users. We present E3D, an end-to-end fabrication toolkit to customize self-powered smart devices at low cost. We contribute to a taxonomy of everyday kinetic activities that are potential sources of energy, a library of parametric mechanisms to harvest energy from manual operations of kinetic objects, and a holistic design system for end-user developers to capture design requirements by demonstrations then customize augmentation devices to harvest energy that meets unique lifestyle.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}