Juexing Wang, Guangjing Wang, Xiao Zhang, Li Liu, Huacheng Zeng, Li Xiao, Zhichao Cao, Lin Gu, Tianxing Li
Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.
{"title":"PATCH","authors":"Juexing Wang, Guangjing Wang, Xiao Zhang, Li Liu, Huacheng Zeng, Li Xiao, Zhichao Cao, Lin Gu, Tianxing Li","doi":"10.1145/3610885","DOIUrl":"https://doi.org/10.1145/3610885","url":null,"abstract":"Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dennis Stanke, Tim Duente, Kerem Can Demir, Michael Rohs
The earlobe is a well-known location for wearing jewelry, but might also be promising for electronic output, such as presenting notifications. This work elaborates the pros and cons of different notification channels for the earlobe. Notifications on the earlobe can be private (only noticeable by the wearer) as well as public (noticeable in the immediate vicinity in a given social situation). A user study with 18 participants showed that the reaction times for the private channels (Poke, Vibration, Private Sound, Electrotactile) were on average less than 1 s with an error rate (missed notifications) of less than 1 %. Thermal Warm and Cold took significantly longer and Cold was least reliable (26 % error rate). The participants preferred Electrotactile and Vibration. Among the public channels the recognition time did not differ significantly between Sound (738 ms) and LED (828 ms), but Display took much longer (3175 ms). At 22 % the error rate of Display was highest. The participants generally felt comfortable wearing notification devices on their earlobe. The results show that the earlobe indeed is a suitable location for wearable technology, if properly miniaturized, which is possible for Electrotactile and LED. We present application scenarios and discuss design considerations. A small field study in a fitness center demonstrates the suitability of the earlobe notification concept in a sports context.
{"title":"Can You Ear Me?","authors":"Dennis Stanke, Tim Duente, Kerem Can Demir, Michael Rohs","doi":"10.1145/3610925","DOIUrl":"https://doi.org/10.1145/3610925","url":null,"abstract":"The earlobe is a well-known location for wearing jewelry, but might also be promising for electronic output, such as presenting notifications. This work elaborates the pros and cons of different notification channels for the earlobe. Notifications on the earlobe can be private (only noticeable by the wearer) as well as public (noticeable in the immediate vicinity in a given social situation). A user study with 18 participants showed that the reaction times for the private channels (Poke, Vibration, Private Sound, Electrotactile) were on average less than 1 s with an error rate (missed notifications) of less than 1 %. Thermal Warm and Cold took significantly longer and Cold was least reliable (26 % error rate). The participants preferred Electrotactile and Vibration. Among the public channels the recognition time did not differ significantly between Sound (738 ms) and LED (828 ms), but Display took much longer (3175 ms). At 22 % the error rate of Display was highest. The participants generally felt comfortable wearing notification devices on their earlobe. The results show that the earlobe indeed is a suitable location for wearable technology, if properly miniaturized, which is possible for Electrotactile and LED. We present application scenarios and discuss design considerations. A small field study in a fitness center demonstrates the suitability of the earlobe notification concept in a sports context.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nod and shake of one's head are intuitive and universal gestures in communication. As smartwatches become increasingly intelligent through advances in user activity sensing technologies, many use scenarios of smartwatches demand quick responses from users in confirmation dialogs, to accept or dismiss proposed actions. Such proposed actions include making emergency calls, taking service recommendations, and starting or stopping exercise timers. Head gestures in these scenarios could be preferable to touch interactions for being hands-free and easy to perform. We propose Headar to recognize these gestures on smartwatches using wearable millimeter wave sensing. We first surveyed head gestures to understand how they are performed in conversational settings. We then investigated positions and orientations to which users raise their smartwatches. Insights from these studies guided the implementation of Headar. Additionally, we conducted modeling and simulation to verify our sensing principle. We developed a real-time sensing and inference pipeline using contemporary deep learning techniques, and proved the feasibility of our proposed approach with a user study (n=15) and a live test (n=8). Our evaluation yielded an average accuracy of 84.0% in the user study across 9 classes including nod and shake as well as seven other signals -- still, speech, touch interaction, and four non-gestural head motions (i.e., head up, left, right, and down). Furthermore, we obtained an accuracy of 72.6% in the live test which reveals rich insights into the performance of our approach in various realistic conditions.
{"title":"Headar","authors":"Xiaoying Yang, Xue Wang, Gaofeng Dong, Zihan Yan, Mani Srivastava, Eiji Hayashi, Yang Zhang","doi":"10.1145/3610900","DOIUrl":"https://doi.org/10.1145/3610900","url":null,"abstract":"Nod and shake of one's head are intuitive and universal gestures in communication. As smartwatches become increasingly intelligent through advances in user activity sensing technologies, many use scenarios of smartwatches demand quick responses from users in confirmation dialogs, to accept or dismiss proposed actions. Such proposed actions include making emergency calls, taking service recommendations, and starting or stopping exercise timers. Head gestures in these scenarios could be preferable to touch interactions for being hands-free and easy to perform. We propose Headar to recognize these gestures on smartwatches using wearable millimeter wave sensing. We first surveyed head gestures to understand how they are performed in conversational settings. We then investigated positions and orientations to which users raise their smartwatches. Insights from these studies guided the implementation of Headar. Additionally, we conducted modeling and simulation to verify our sensing principle. We developed a real-time sensing and inference pipeline using contemporary deep learning techniques, and proved the feasibility of our proposed approach with a user study (n=15) and a live test (n=8). Our evaluation yielded an average accuracy of 84.0% in the user study across 9 classes including nod and shake as well as seven other signals -- still, speech, touch interaction, and four non-gestural head motions (i.e., head up, left, right, and down). Furthermore, we obtained an accuracy of 72.6% in the live test which reveals rich insights into the performance of our approach in various realistic conditions.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tieqi Shou, Zhuohan Ye, Yayao Hong, Zhiyuan Wang, Hang Zhu, Zhihan Jiang, Dingqi Yang, Binbin Zhou, Cheng Wang, Longbiao Chen
Hospital Emergency Departments (EDs) are essential for providing emergency medical services, yet often overwhelmed due to increasing healthcare demand. Current methods for monitoring ED queue states, such as manual monitoring, video surveillance, and front-desk registration are inefficient, invasive, and delayed to provide real-time updates. To address these challenges, this paper proposes a novel framework, CrowdQ, which harnesses spatiotemporal crowdsensing data for real-time ED demand sensing, queue state modeling, and prediction. By utilizing vehicle trajectory and urban geographic environment data, CrowdQ can accurately estimate emergency visits from noisy traffic flows. Furthermore, it employs queueing theory to model the complex emergency service process with medical service data, effectively considering spatiotemporal dependencies and event context impact on ED queue states. Experiments conducted on large-scale crowdsensing urban traffic datasets and hospital information system datasets from Xiamen City demonstrate the framework's effectiveness. It achieves an F1 score of 0.93 in ED demand identification, effectively models the ED queue state of key hospitals, and reduces the error in queue state prediction by 18.5%-71.3% compared to baseline methods. CrowdQ, therefore, offers valuable alternatives for public emergency treatment information disclosure and maximized medical resource allocation.
{"title":"CrowdQ","authors":"Tieqi Shou, Zhuohan Ye, Yayao Hong, Zhiyuan Wang, Hang Zhu, Zhihan Jiang, Dingqi Yang, Binbin Zhou, Cheng Wang, Longbiao Chen","doi":"10.1145/3610875","DOIUrl":"https://doi.org/10.1145/3610875","url":null,"abstract":"Hospital Emergency Departments (EDs) are essential for providing emergency medical services, yet often overwhelmed due to increasing healthcare demand. Current methods for monitoring ED queue states, such as manual monitoring, video surveillance, and front-desk registration are inefficient, invasive, and delayed to provide real-time updates. To address these challenges, this paper proposes a novel framework, CrowdQ, which harnesses spatiotemporal crowdsensing data for real-time ED demand sensing, queue state modeling, and prediction. By utilizing vehicle trajectory and urban geographic environment data, CrowdQ can accurately estimate emergency visits from noisy traffic flows. Furthermore, it employs queueing theory to model the complex emergency service process with medical service data, effectively considering spatiotemporal dependencies and event context impact on ED queue states. Experiments conducted on large-scale crowdsensing urban traffic datasets and hospital information system datasets from Xiamen City demonstrate the framework's effectiveness. It achieves an F1 score of 0.93 in ED demand identification, effectively models the ED queue state of key hospitals, and reduces the error in queue state prediction by 18.5%-71.3% compared to baseline methods. CrowdQ, therefore, offers valuable alternatives for public emergency treatment information disclosure and maximized medical resource allocation.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Device-free indoor localization and tracking using commercial millimeter wave radars have attracted much interest lately due to their non-intrusive nature and high spatial resolution. However, it is challenging to achieve high tracking accuracy due to rich multipath reflection and occlusion in indoor environments. Static objects with non-negligible reflectance of mmWave signals interact with moving human subjects and generate time-varying multipath ghosts and shadow ghosts, which can be easily confused as real subjects. To characterize the complex interactions, we first develop a geometric model that estimates the location of multipath ghosts given the locations of humans and static reflectors. Based on this model, the locations of static reflectors that form a reflection map are automatically estimated from received radar signals as a single person traverses the environment along arbitrary trajectories. The reflection map allows for the elimination of multipath and shadow ghost interference as well as the augmentation of weakly reflected human subjects in occluded areas. The proposed environment-aware multi-person tracking system can generate reflection maps with a mean error of 15.5cm and a 90-percentile error of 30.3cm, and achieve multi-person tracking accuracy with a mean error of 8.6cm and a 90-percentile error of 17.5cm, in four representative indoor spaces with diverse subjects using a single mmWave radar.
{"title":"Environment-aware Multi-person Tracking in Indoor Environments with MmWave Radars","authors":"Weiyan Chen, Hongliu Yang, Xiaoyang Bi, Rong Zheng, Fusang Zhang, Peng Bao, Zhaoxin Chang, Xujun Ma, Daqing Zhang","doi":"10.1145/3610902","DOIUrl":"https://doi.org/10.1145/3610902","url":null,"abstract":"Device-free indoor localization and tracking using commercial millimeter wave radars have attracted much interest lately due to their non-intrusive nature and high spatial resolution. However, it is challenging to achieve high tracking accuracy due to rich multipath reflection and occlusion in indoor environments. Static objects with non-negligible reflectance of mmWave signals interact with moving human subjects and generate time-varying multipath ghosts and shadow ghosts, which can be easily confused as real subjects. To characterize the complex interactions, we first develop a geometric model that estimates the location of multipath ghosts given the locations of humans and static reflectors. Based on this model, the locations of static reflectors that form a reflection map are automatically estimated from received radar signals as a single person traverses the environment along arbitrary trajectories. The reflection map allows for the elimination of multipath and shadow ghost interference as well as the augmentation of weakly reflected human subjects in occluded areas. The proposed environment-aware multi-person tracking system can generate reflection maps with a mean error of 15.5cm and a 90-percentile error of 30.3cm, and achieve multi-person tracking accuracy with a mean error of 8.6cm and a 90-percentile error of 17.5cm, in four representative indoor spaces with diverse subjects using a single mmWave radar.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matias Laporte, Martin Gjoreski, Marc Langheinrich
The latest developments in wearable sensors have resulted in a wide range of devices available to consumers, allowing users to monitor and improve their physical activity, sleep patterns, cognitive load, and stress levels. However, the lack of out-of-the-lab labelled data hinders the development of advanced machine learning models for predicting affective states. Furthermore, to the best of our knowledge, there are no publicly available datasets in the area of Human Memory Augmentation. This paper presents a dataset we collected during a 13-week study in a university setting. The dataset, named LAUREATE, contains the physiological data of 42 students during 26 classes (including exams), daily self-reports asking the students about their lifestyle habits (e.g. studying hours, physical activity, and sleep quality) and their performance across multiple examinations. In addition to the raw data, we provide expert features from the physiological data, and baseline machine learning models for estimating self-reported affect, models for recognising classes vs breaks, and models for user identification. Besides the use cases presented in this paper, among which Human Memory Augmentation, the dataset represents a rich resource for the UbiComp community in various domains, including affect recognition, behaviour modelling, user privacy, and activity and context recognition.
{"title":"LAUREATE","authors":"Matias Laporte, Martin Gjoreski, Marc Langheinrich","doi":"10.1145/3610892","DOIUrl":"https://doi.org/10.1145/3610892","url":null,"abstract":"The latest developments in wearable sensors have resulted in a wide range of devices available to consumers, allowing users to monitor and improve their physical activity, sleep patterns, cognitive load, and stress levels. However, the lack of out-of-the-lab labelled data hinders the development of advanced machine learning models for predicting affective states. Furthermore, to the best of our knowledge, there are no publicly available datasets in the area of Human Memory Augmentation. This paper presents a dataset we collected during a 13-week study in a university setting. The dataset, named LAUREATE, contains the physiological data of 42 students during 26 classes (including exams), daily self-reports asking the students about their lifestyle habits (e.g. studying hours, physical activity, and sleep quality) and their performance across multiple examinations. In addition to the raw data, we provide expert features from the physiological data, and baseline machine learning models for estimating self-reported affect, models for recognising classes vs breaks, and models for user identification. Besides the use cases presented in this paper, among which Human Memory Augmentation, the dataset represents a rich resource for the UbiComp community in various domains, including affect recognition, behaviour modelling, user privacy, and activity and context recognition.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).
{"title":"VAX","authors":"Prasoon Patidar, Mayank Goel, Yuvraj Agarwal","doi":"10.1145/3610907","DOIUrl":"https://doi.org/10.1145/3610907","url":null,"abstract":"The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saif Mahmud, Ke Li, Guilin Hu, Hao Chen, Richard Jin, Ruidong Zhang, François Guimbretière, Cheng Zhang
In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications.
{"title":"PoseSonic","authors":"Saif Mahmud, Ke Li, Guilin Hu, Hao Chen, Richard Jin, Ruidong Zhang, François Guimbretière, Cheng Zhang","doi":"10.1145/3610895","DOIUrl":"https://doi.org/10.1145/3610895","url":null,"abstract":"In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a fully-printable method to embed interactive information inside 3D printed objects. The information is invisible to the human eye and can be read using thermal imaging after temperature transfer through interaction with the objects. Prior methods either modify the surface appearance, require customized devices or not commonly used materials, or embed components that are not fully 3D printable. Such limitations restrict the design space for 3D prints, or cannot be readily applied to the already deployed 3D printing setups. In this paper, we present an information embedding technique using low-cost off-the-shelf dual extruder FDM (Fused Deposition Modeling) 3D printers, common materials (e.g., generic PLA), and a mobile thermal device (e.g., a thermal smartphone), by leveraging the thermal properties of common 3D print materials. In addition, we show our method can also be generalized to conventional near-infrared imaging scenarios. We evaluate our technique against multiple design and fabrication parameters and propose a design guideline for different use cases. Finally, we demonstrate various everyday applications enabled by our method, such as interactive thermal displays, user-activated augmented reality, automating thermal triggered events, and hidden tokens for social activities.
{"title":"InfoPrint","authors":"Weiwei Jiang, Chaofan Wang, Zhanna Sarsenbayeva, Andrew Irlitti, Jing Wei, Jarrod Knibbe, Tilman Dingler, Jorge Goncalves, Vassilis Kostakos","doi":"10.1145/3610933","DOIUrl":"https://doi.org/10.1145/3610933","url":null,"abstract":"We present a fully-printable method to embed interactive information inside 3D printed objects. The information is invisible to the human eye and can be read using thermal imaging after temperature transfer through interaction with the objects. Prior methods either modify the surface appearance, require customized devices or not commonly used materials, or embed components that are not fully 3D printable. Such limitations restrict the design space for 3D prints, or cannot be readily applied to the already deployed 3D printing setups. In this paper, we present an information embedding technique using low-cost off-the-shelf dual extruder FDM (Fused Deposition Modeling) 3D printers, common materials (e.g., generic PLA), and a mobile thermal device (e.g., a thermal smartphone), by leveraging the thermal properties of common 3D print materials. In addition, we show our method can also be generalized to conventional near-infrared imaging scenarios. We evaluate our technique against multiple design and fabrication parameters and propose a design guideline for different use cases. Finally, we demonstrate various everyday applications enabled by our method, such as interactive thermal displays, user-activated augmented reality, automating thermal triggered events, and hidden tokens for social activities.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JaeYeon Park, Kichang Lee, Sungmin Lee, Mi Zhang, JeongGil Ko
This work presents AttFL, a federated learning framework designed to continuously improve a personalized deep neural network for efficiently analyzing time-series data generated from mobile and embedded sensing applications. To better characterize time-series data features and efficiently abstract model parameters, AttFL appends a set of attention modules to the baseline deep learning model and exchanges their feature map information to gather collective knowledge across distributed local devices at the server. The server groups devices with similar contextual goals using cosine similarity, and redistributes updated model parameters for improved inference performance at each local device. Specifically, unlike previously proposed federated learning frameworks, AttFL is designed specifically to perform well for various recurrent neural network (RNN) baseline models, making it suitable for many mobile and embedded sensing applications producing time-series sensing data. We evaluate the performance of AttFL and compare with five state-of-the-art federated learning frameworks using three popular mobile/embedded sensing applications (e.g., physiological signal analysis, human activity recognition, and audio processing). Our results obtained from CPU core-based emulations and a 12-node embedded platform testbed shows that AttFL outperforms all alternative approaches in terms of model accuracy and communication/computational overhead, and is flexible enough to be applied in various application scenarios exploiting different baseline deep learning model architectures.
{"title":"AttFL","authors":"JaeYeon Park, Kichang Lee, Sungmin Lee, Mi Zhang, JeongGil Ko","doi":"10.1145/3610917","DOIUrl":"https://doi.org/10.1145/3610917","url":null,"abstract":"This work presents AttFL, a federated learning framework designed to continuously improve a personalized deep neural network for efficiently analyzing time-series data generated from mobile and embedded sensing applications. To better characterize time-series data features and efficiently abstract model parameters, AttFL appends a set of attention modules to the baseline deep learning model and exchanges their feature map information to gather collective knowledge across distributed local devices at the server. The server groups devices with similar contextual goals using cosine similarity, and redistributes updated model parameters for improved inference performance at each local device. Specifically, unlike previously proposed federated learning frameworks, AttFL is designed specifically to perform well for various recurrent neural network (RNN) baseline models, making it suitable for many mobile and embedded sensing applications producing time-series sensing data. We evaluate the performance of AttFL and compare with five state-of-the-art federated learning frameworks using three popular mobile/embedded sensing applications (e.g., physiological signal analysis, human activity recognition, and audio processing). Our results obtained from CPU core-based emulations and a 12-node embedded platform testbed shows that AttFL outperforms all alternative approaches in terms of model accuracy and communication/computational overhead, and is flexible enough to be applied in various application scenarios exploiting different baseline deep learning model architectures.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}