Sejal Bhalla, Salaar Liaqat, Robert Wu, Andrea S. Gershon, Eyal de Lara, Alex Mariakakis
Prior work has shown the utility of acoustic analysis in controlled settings for assessing chronic obstructive pulmonary disease (COPD) --- one of the most common respiratory diseases that impacts millions of people worldwide. However, such assessments require active user input and may not represent the true characteristics of a patient's voice. We propose PulmoListener, an end-to-end speech processing pipeline that identifies segments of the patient's speech from smartwatch audio collected during daily living and analyzes them to classify COPD symptom severity. To evaluate our approach, we conducted a study with 8 COPD patients over 164 ± 92 days on average. We found that PulmoListener achieved an average sensitivity of 0.79 ± 0.03 and a specificity of 0.83 ± 0.05 per patient when classifying their symptom severity on the same day. PulmoListener can also predict the severity level up to 4 days in advance with an average sensitivity of 0.75 ± 0.02 and a specificity of 0.74 ± 0.07. The results of our study demonstrate the feasibility of leveraging natural speech for monitoring COPD in real-world settings, offering a promising solution for disease management and even diagnosis.
{"title":"PulmoListener","authors":"Sejal Bhalla, Salaar Liaqat, Robert Wu, Andrea S. Gershon, Eyal de Lara, Alex Mariakakis","doi":"10.1145/3610889","DOIUrl":"https://doi.org/10.1145/3610889","url":null,"abstract":"Prior work has shown the utility of acoustic analysis in controlled settings for assessing chronic obstructive pulmonary disease (COPD) --- one of the most common respiratory diseases that impacts millions of people worldwide. However, such assessments require active user input and may not represent the true characteristics of a patient's voice. We propose PulmoListener, an end-to-end speech processing pipeline that identifies segments of the patient's speech from smartwatch audio collected during daily living and analyzes them to classify COPD symptom severity. To evaluate our approach, we conducted a study with 8 COPD patients over 164 ± 92 days on average. We found that PulmoListener achieved an average sensitivity of 0.79 ± 0.03 and a specificity of 0.83 ± 0.05 per patient when classifying their symptom severity on the same day. PulmoListener can also predict the severity level up to 4 days in advance with an average sensitivity of 0.75 ± 0.02 and a specificity of 0.74 ± 0.07. The results of our study demonstrate the feasibility of leveraging natural speech for monitoring COPD in real-world settings, offering a promising solution for disease management and even diagnosis.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The accuracy of wireless fingerprint-based indoor localization largely depends on the precision and density of radio maps. Although many research efforts have been devoted to incremental updating of radio maps, few consider the laborious initial construction of a new site. In this work, we propose an accurate and generalizable framework for efficient radio map construction, which takes advantage of readily-available fine-grained radio maps and constructs fine-grained radio maps of a new site with a small proportion of measurements in it. Specifically, we regard radio maps as domains and propose a Radio Map construction approach based on Domain Adaptation (RMDA). We first employ the domain disentanglement feature extractor to learn domain-invariant features for aligning the source domains (available radio maps) with the target domain (initial radio map) in the domain-invariant latent space. Furthermore, we propose a dynamic weighting strategy, which learns the relevancy of the source and target domain in the domain adaptation. Then, we extract the domain-specific features based on the site's floorplan and use them to constrain the super-resolution of the domain-invariant features. Experimental results demonstrate that RMDA constructs a fine-grained initial radio map of a target site efficiently with a limited number of measurements. Meanwhile, the localization accuracy of the refined radio map with RMDA significantly improved by about 41.35% after construction and is comparable with the dense surveyed radio map (the reduction is less than 8%).
{"title":"Fast Radio Map Construction with Domain Disentangled Learning for Wireless Localization","authors":"Weina Jiang, Lin Shi, Qun Niu, Ning Liu","doi":"10.1145/3610922","DOIUrl":"https://doi.org/10.1145/3610922","url":null,"abstract":"The accuracy of wireless fingerprint-based indoor localization largely depends on the precision and density of radio maps. Although many research efforts have been devoted to incremental updating of radio maps, few consider the laborious initial construction of a new site. In this work, we propose an accurate and generalizable framework for efficient radio map construction, which takes advantage of readily-available fine-grained radio maps and constructs fine-grained radio maps of a new site with a small proportion of measurements in it. Specifically, we regard radio maps as domains and propose a Radio Map construction approach based on Domain Adaptation (RMDA). We first employ the domain disentanglement feature extractor to learn domain-invariant features for aligning the source domains (available radio maps) with the target domain (initial radio map) in the domain-invariant latent space. Furthermore, we propose a dynamic weighting strategy, which learns the relevancy of the source and target domain in the domain adaptation. Then, we extract the domain-specific features based on the site's floorplan and use them to constrain the super-resolution of the domain-invariant features. Experimental results demonstrate that RMDA constructs a fine-grained initial radio map of a target site efficiently with a limited number of measurements. Meanwhile, the localization accuracy of the refined radio map with RMDA significantly improved by about 41.35% after construction and is comparable with the dense surveyed radio map (the reduction is less than 8%).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Mamish, Amy Guo, Thomas Cohen, Julian Richey, Yang Zhang, Josiah Hester
Whenever a user interacts with a device, mechanical work is performed to actuate the user interface elements; the resulting energy is typically wasted, dissipated as sound and heat. Previous work has shown that many devices can be powered entirely from this otherwise wasted user interface energy. For these devices, wires and batteries, along with the related hassles of replacement and charging, become unnecessary and onerous. So far, these works have been restricted to proof-of-concept demonstrations; a specific bespoke harvesting and sensing circuit is constructed for the application at hand. The challenge of harvesting energy while simultaneously sensing fine-grained input signals from diverse modalities makes prototyping new devices difficult. To fill this gap, we present a hardware toolkit which provides a common electrical interface for harvesting energy from user interface elements. This facilitates exploring the composability, utility, and breadth of enabled applications of interaction-powered smart devices. We design a set of "energy as input" harvesting circuits, a standard connective interface with 3D printed enclosures, and software libraries to enable the exploration of devices where the user action generates the energy needed to perform the device's primary function. This exploration culminated in a demonstration campaign where we prototype several exemplar popular toys and gadgets, including battery-free Bop-It--- a popular 90s rhythm game, an electronic Etch-a-sketch, a "Simon-Says"-style memory game, and a service rating device. We run exploratory user studies to understand how generativity, creativity, and composability are hampered or facilitated by these devices. These demonstrations, user study takeaways, and the toolkit itself provide a foundation for building interactive and user-focused gadgets whose usability is not affected by battery charge and whose service lifetime is not limited by battery wear.
{"title":"Interaction Harvesting","authors":"John Mamish, Amy Guo, Thomas Cohen, Julian Richey, Yang Zhang, Josiah Hester","doi":"10.1145/3610880","DOIUrl":"https://doi.org/10.1145/3610880","url":null,"abstract":"Whenever a user interacts with a device, mechanical work is performed to actuate the user interface elements; the resulting energy is typically wasted, dissipated as sound and heat. Previous work has shown that many devices can be powered entirely from this otherwise wasted user interface energy. For these devices, wires and batteries, along with the related hassles of replacement and charging, become unnecessary and onerous. So far, these works have been restricted to proof-of-concept demonstrations; a specific bespoke harvesting and sensing circuit is constructed for the application at hand. The challenge of harvesting energy while simultaneously sensing fine-grained input signals from diverse modalities makes prototyping new devices difficult. To fill this gap, we present a hardware toolkit which provides a common electrical interface for harvesting energy from user interface elements. This facilitates exploring the composability, utility, and breadth of enabled applications of interaction-powered smart devices. We design a set of \"energy as input\" harvesting circuits, a standard connective interface with 3D printed enclosures, and software libraries to enable the exploration of devices where the user action generates the energy needed to perform the device's primary function. This exploration culminated in a demonstration campaign where we prototype several exemplar popular toys and gadgets, including battery-free Bop-It--- a popular 90s rhythm game, an electronic Etch-a-sketch, a \"Simon-Says\"-style memory game, and a service rating device. We run exploratory user studies to understand how generativity, creativity, and composability are hampered or facilitated by these devices. These demonstrations, user study takeaways, and the toolkit itself provide a foundation for building interactive and user-focused gadgets whose usability is not affected by battery charge and whose service lifetime is not limited by battery wear.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, Yong Jiang
With the booming of smart home market, intelligent Internet of Things (IoT) devices have been increasingly involved in home life. To improve the user experience of smart homes, some prior works have explored how to use machine learning for predicting interactions between users and devices. However, the existing solutions have inferior User Device Interaction (UDI) prediction accuracy, as they ignore three key factors: routine, intent and multi-level periodicity of human behaviors. In this paper, we present SmartUDI, a novel accurate UDI prediction approach for smart homes. First, we propose a Message-Passing-based Routine Extraction (MPRE) algorithm to mine routine behaviors, then the contrastive loss is applied to narrow representations among behaviors from the same routines and alienate representations among behaviors from different routines. Second, we propose an Intent-aware Capsule Graph Attention Network (ICGAT) to encode multiple intents of users while considering complex transitions between different behaviors. Third, we design a Cluster-based Historical Attention Mechanism (CHAM) to capture the multi-level periodicity by aggregating the current sequence and the semantically nearest historical sequence representations through the attention mechanism. SmartUDI can be seamlessly deployed on cloud infrastructures of IoT device vendors and edge nodes, enabling the delivery of personalized device service recommendations to users. Comprehensive experiments on four real-world datasets show that SmartUDI consistently outperforms the state-of-the-art baselines with more accurate and highly interpretable results.
{"title":"I Know Your Intent","authors":"Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, Yong Jiang","doi":"10.1145/3610906","DOIUrl":"https://doi.org/10.1145/3610906","url":null,"abstract":"With the booming of smart home market, intelligent Internet of Things (IoT) devices have been increasingly involved in home life. To improve the user experience of smart homes, some prior works have explored how to use machine learning for predicting interactions between users and devices. However, the existing solutions have inferior User Device Interaction (UDI) prediction accuracy, as they ignore three key factors: routine, intent and multi-level periodicity of human behaviors. In this paper, we present SmartUDI, a novel accurate UDI prediction approach for smart homes. First, we propose a Message-Passing-based Routine Extraction (MPRE) algorithm to mine routine behaviors, then the contrastive loss is applied to narrow representations among behaviors from the same routines and alienate representations among behaviors from different routines. Second, we propose an Intent-aware Capsule Graph Attention Network (ICGAT) to encode multiple intents of users while considering complex transitions between different behaviors. Third, we design a Cluster-based Historical Attention Mechanism (CHAM) to capture the multi-level periodicity by aggregating the current sequence and the semantically nearest historical sequence representations through the attention mechanism. SmartUDI can be seamlessly deployed on cloud infrastructures of IoT device vendors and edge nodes, enabling the delivery of personalized device service recommendations to users. Comprehensive experiments on four real-world datasets show that SmartUDI consistently outperforms the state-of-the-art baselines with more accurate and highly interpretable results.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both "spoken language" (e.g., human textual annotations) and "unspoken language" (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken "languages", we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.
{"title":"Cross-Modality Graph-based Language and Sensor Data Co-Learning of Human-Mobility Interaction","authors":"Mahan Tabatabaie, Suining He, Kang G. Shin","doi":"10.1145/3610904","DOIUrl":"https://doi.org/10.1145/3610904","url":null,"abstract":"Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both \"spoken language\" (e.g., human textual annotations) and \"unspoken language\" (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design. To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken \"languages\", we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Translating fine-grained activity detection (e.g., phone ring, talking interspersed with silence and walking) into semantically meaningful and richer contextual information (e.g., on a phone call for 20 minutes while exercising) is essential towards enabling a range of healthcare and human-computer interaction applications. Prior work has proposed building ontologies or temporal analysis of activity patterns with limited success in capturing complex real-world context patterns. We present TAO, a hybrid system that leverages OWL-based ontologies and temporal clustering approaches to detect high-level contexts from human activities. TAO can characterize sequential activities that happen one after the other and activities that are interleaved or occur in parallel to detect a richer set of contexts more accurately than prior work. We evaluate TAO on real-world activity datasets (Casas and Extrasensory) and show that our system achieves, on average, 87% and 80% accuracy for context detection, respectively. We deploy and evaluate TAO in a real-world setting with eight participants using our system for three hours each, demonstrating TAO's ability to capture semantically meaningful contexts in the real world. Finally, to showcase the usefulness of contexts, we prototype wellness applications that assess productivity and stress and show that the wellness metrics calculated using contexts provided by TAO are much closer to the ground truth (on average within 1.1%), as compared to the baseline approach (on average within 30%).
{"title":"TAO","authors":"Sudershan Boovaraghavan, Prasoon Patidar, Yuvraj Agarwal","doi":"10.1145/3610896","DOIUrl":"https://doi.org/10.1145/3610896","url":null,"abstract":"Translating fine-grained activity detection (e.g., phone ring, talking interspersed with silence and walking) into semantically meaningful and richer contextual information (e.g., on a phone call for 20 minutes while exercising) is essential towards enabling a range of healthcare and human-computer interaction applications. Prior work has proposed building ontologies or temporal analysis of activity patterns with limited success in capturing complex real-world context patterns. We present TAO, a hybrid system that leverages OWL-based ontologies and temporal clustering approaches to detect high-level contexts from human activities. TAO can characterize sequential activities that happen one after the other and activities that are interleaved or occur in parallel to detect a richer set of contexts more accurately than prior work. We evaluate TAO on real-world activity datasets (Casas and Extrasensory) and show that our system achieves, on average, 87% and 80% accuracy for context detection, respectively. We deploy and evaluate TAO in a real-world setting with eight participants using our system for three hours each, demonstrating TAO's ability to capture semantically meaningful contexts in the real world. Finally, to showcase the usefulness of contexts, we prototype wellness applications that assess productivity and stress and show that the wellness metrics calculated using contexts provided by TAO are much closer to the ground truth (on average within 1.1%), as compared to the baseline approach (on average within 30%).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The increase of distributed embedded systems has enabled pervasive sensing, actuation, and information displays across buildings and surrounding environments, yet also entreats huge cost expenditure for energy and human labor for maintenance. Our daily interactions, from opening a window to closing a drawer to twisting a doorknob, are great potential sources of energy but are often neglected. Existing commercial devices to harvest energy from these ambient sources are unaffordable, and DIY solutions are left with inaccessibility for non-experts preventing fully imbuing daily innovations in end-users. We present E3D, an end-to-end fabrication toolkit to customize self-powered smart devices at low cost. We contribute to a taxonomy of everyday kinetic activities that are potential sources of energy, a library of parametric mechanisms to harvest energy from manual operations of kinetic objects, and a holistic design system for end-user developers to capture design requirements by demonstrations then customize augmentation devices to harvest energy that meets unique lifestyle.
{"title":"E3D","authors":"Abul Al Arabi, Xue Wang, Yang Zhang, Jeeeun Kim","doi":"10.1145/3610897","DOIUrl":"https://doi.org/10.1145/3610897","url":null,"abstract":"The increase of distributed embedded systems has enabled pervasive sensing, actuation, and information displays across buildings and surrounding environments, yet also entreats huge cost expenditure for energy and human labor for maintenance. Our daily interactions, from opening a window to closing a drawer to twisting a doorknob, are great potential sources of energy but are often neglected. Existing commercial devices to harvest energy from these ambient sources are unaffordable, and DIY solutions are left with inaccessibility for non-experts preventing fully imbuing daily innovations in end-users. We present E3D, an end-to-end fabrication toolkit to customize self-powered smart devices at low cost. We contribute to a taxonomy of everyday kinetic activities that are potential sources of energy, a library of parametric mechanisms to harvest energy from manual operations of kinetic objects, and a holistic design system for end-user developers to capture design requirements by demonstrations then customize augmentation devices to harvest energy that meets unique lifestyle.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Josh Urban Davis, Hongwei Wang, Parmit K. Chilana, Xing-Dong Yang
As video conferencing (VC) has become necessary for many professional, educational, and social tasks, people who are d/Deaf and hard of hearing (DHH) face distinct accessibility barriers. We conducted studies to understand the challenges faced by DHH people during VCs and found that they struggled to easily present or communicate effectively due to accessibility limitations of VC platforms. These limitations include the lack of tools for DHH speakers to discreetly communicate their accommodation needs to the group. Based on these findings, we prototyped a suite of tools, called Erato that enables DHH speakers to be aware of their performance while speaking and remind participants of proper etiquette. We evaluated Erato by running a mock classroom case study over VC for three sessions. All participants felt more confident in their speaking ability and paid closer attention to making the classroom more inclusive while using our tool. We share implications of these results for the design of VC interfaces and human-the-the-loop assistive systems that can support users who are DHH to communicate effectively and advocate for their accessibility needs.
{"title":"\"It's Not an Issue of Malice, but of Ignorance\"","authors":"Josh Urban Davis, Hongwei Wang, Parmit K. Chilana, Xing-Dong Yang","doi":"10.1145/3610901","DOIUrl":"https://doi.org/10.1145/3610901","url":null,"abstract":"As video conferencing (VC) has become necessary for many professional, educational, and social tasks, people who are d/Deaf and hard of hearing (DHH) face distinct accessibility barriers. We conducted studies to understand the challenges faced by DHH people during VCs and found that they struggled to easily present or communicate effectively due to accessibility limitations of VC platforms. These limitations include the lack of tools for DHH speakers to discreetly communicate their accommodation needs to the group. Based on these findings, we prototyped a suite of tools, called Erato that enables DHH speakers to be aware of their performance while speaking and remind participants of proper etiquette. We evaluated Erato by running a mock classroom case study over VC for three sessions. All participants felt more confident in their speaking ability and paid closer attention to making the classroom more inclusive while using our tool. We share implications of these results for the design of VC interfaces and human-the-the-loop assistive systems that can support users who are DHH to communicate effectively and advocate for their accessibility needs.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Use of virtual reality while seated is common, but studies on seated interaction beyond the use of controllers or hand gestures have been sparse. This work present LapTouch, which makes use of the lap as a touch interface and includes two user studies to inform the design of direct and indirect touch interaction using the lap with visual feedback that guides the user touch, as well as eye-free interaction in which users are not provided with such visual feedback. The first study suggests that direct interaction can provide effective layouts with 95% accuracy with up to a 4×4 layout and a shorter completion time, while indirect interaction can provide effective layouts with up to a 4×5 layout but a longer completion time. Considering user experience, which revealed that 4-row and 5-column layouts are not preferred, it is recommended to use both direct and indirect interaction with a maximum of a 3×4 layout. According to the second study, increasing the eye-free interaction with support vector machine (SVM) allows for a 2×2 layout with a generalized model and 2×2, 2×3 and 3×2 layouts with personalized models.
{"title":"LapTouch","authors":"Tzu-Wei Mi, Jia-Jun Wang, Liwei Chan","doi":"10.1145/3610878","DOIUrl":"https://doi.org/10.1145/3610878","url":null,"abstract":"Use of virtual reality while seated is common, but studies on seated interaction beyond the use of controllers or hand gestures have been sparse. This work present LapTouch, which makes use of the lap as a touch interface and includes two user studies to inform the design of direct and indirect touch interaction using the lap with visual feedback that guides the user touch, as well as eye-free interaction in which users are not provided with such visual feedback. The first study suggests that direct interaction can provide effective layouts with 95% accuracy with up to a 4×4 layout and a shorter completion time, while indirect interaction can provide effective layouts with up to a 4×5 layout but a longer completion time. Considering user experience, which revealed that 4-row and 5-column layouts are not preferred, it is recommended to use both direct and indirect interaction with a maximum of a 3×4 layout. According to the second study, increasing the eye-free interaction with support vector machine (SVM) allows for a 2×2 layout with a generalized model and 2×2, 2×3 and 3×2 layouts with personalized models.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jannis Strecker, Khakim Akhunov, Federico Carbone, Kimberly García, Kenan Bektaş, Andres Gomez, Simon Mayer, Kasim Sinan Yildirim
The increasing number of objects in ubiquitous computing environments creates a need for effective object detection and identification mechanisms that permit users to intuitively initiate interactions with these objects. While multiple approaches to such object detection -- including through visual object detection, fiducial markers, relative localization, or absolute spatial referencing -- are available, each of these suffers from drawbacks that limit their applicability. In this paper, we propose ODIF, an architecture that permits the fusion of object situation information from such heterogeneous sources and that remains vertically and horizontally modular to allow extending and upgrading systems that are constructed accordingly. We furthermore present BLEARVIS, a prototype system that builds on the proposed architecture and integrates computer-vision (CV) based object detection with radio-frequency (RF) angle of arrival (AoA) estimation to identify BLE-tagged objects. In our system, the front camera of a Mixed Reality (MR) head-mounted display (HMD) provides a live image stream to a vision-based object detection module, while an antenna array that is mounted on the HMD collects AoA information from ambient devices. In this way, BLEARVIS is able to differentiate between visually identical objects in the same environment and can provide an MR overlay of information (data and controls) that relates to them. We include experimental evaluations of both, the CV-based object detection and the RF-based AoA estimation, and discuss the applicability of the combined RF and CV pipelines in different ubiquitous computing scenarios. This research can form a starting point to spawn the integration of diverse object detection, identification, and interaction approaches that function across the electromagnetic spectrum, and beyond.
{"title":"MR Object Identification and Interaction","authors":"Jannis Strecker, Khakim Akhunov, Federico Carbone, Kimberly García, Kenan Bektaş, Andres Gomez, Simon Mayer, Kasim Sinan Yildirim","doi":"10.1145/3610879","DOIUrl":"https://doi.org/10.1145/3610879","url":null,"abstract":"The increasing number of objects in ubiquitous computing environments creates a need for effective object detection and identification mechanisms that permit users to intuitively initiate interactions with these objects. While multiple approaches to such object detection -- including through visual object detection, fiducial markers, relative localization, or absolute spatial referencing -- are available, each of these suffers from drawbacks that limit their applicability. In this paper, we propose ODIF, an architecture that permits the fusion of object situation information from such heterogeneous sources and that remains vertically and horizontally modular to allow extending and upgrading systems that are constructed accordingly. We furthermore present BLEARVIS, a prototype system that builds on the proposed architecture and integrates computer-vision (CV) based object detection with radio-frequency (RF) angle of arrival (AoA) estimation to identify BLE-tagged objects. In our system, the front camera of a Mixed Reality (MR) head-mounted display (HMD) provides a live image stream to a vision-based object detection module, while an antenna array that is mounted on the HMD collects AoA information from ambient devices. In this way, BLEARVIS is able to differentiate between visually identical objects in the same environment and can provide an MR overlay of information (data and controls) that relates to them. We include experimental evaluations of both, the CV-based object detection and the RF-based AoA estimation, and discuss the applicability of the combined RF and CV pipelines in different ubiquitous computing scenarios. This research can form a starting point to spawn the integration of diverse object detection, identification, and interaction approaches that function across the electromagnetic spectrum, and beyond.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}