Tieqi Shou, Zhuohan Ye, Yayao Hong, Zhiyuan Wang, Hang Zhu, Zhihan Jiang, Dingqi Yang, Binbin Zhou, Cheng Wang, Longbiao Chen
Hospital Emergency Departments (EDs) are essential for providing emergency medical services, yet often overwhelmed due to increasing healthcare demand. Current methods for monitoring ED queue states, such as manual monitoring, video surveillance, and front-desk registration are inefficient, invasive, and delayed to provide real-time updates. To address these challenges, this paper proposes a novel framework, CrowdQ, which harnesses spatiotemporal crowdsensing data for real-time ED demand sensing, queue state modeling, and prediction. By utilizing vehicle trajectory and urban geographic environment data, CrowdQ can accurately estimate emergency visits from noisy traffic flows. Furthermore, it employs queueing theory to model the complex emergency service process with medical service data, effectively considering spatiotemporal dependencies and event context impact on ED queue states. Experiments conducted on large-scale crowdsensing urban traffic datasets and hospital information system datasets from Xiamen City demonstrate the framework's effectiveness. It achieves an F1 score of 0.93 in ED demand identification, effectively models the ED queue state of key hospitals, and reduces the error in queue state prediction by 18.5%-71.3% compared to baseline methods. CrowdQ, therefore, offers valuable alternatives for public emergency treatment information disclosure and maximized medical resource allocation.
{"title":"CrowdQ","authors":"Tieqi Shou, Zhuohan Ye, Yayao Hong, Zhiyuan Wang, Hang Zhu, Zhihan Jiang, Dingqi Yang, Binbin Zhou, Cheng Wang, Longbiao Chen","doi":"10.1145/3610875","DOIUrl":"https://doi.org/10.1145/3610875","url":null,"abstract":"Hospital Emergency Departments (EDs) are essential for providing emergency medical services, yet often overwhelmed due to increasing healthcare demand. Current methods for monitoring ED queue states, such as manual monitoring, video surveillance, and front-desk registration are inefficient, invasive, and delayed to provide real-time updates. To address these challenges, this paper proposes a novel framework, CrowdQ, which harnesses spatiotemporal crowdsensing data for real-time ED demand sensing, queue state modeling, and prediction. By utilizing vehicle trajectory and urban geographic environment data, CrowdQ can accurately estimate emergency visits from noisy traffic flows. Furthermore, it employs queueing theory to model the complex emergency service process with medical service data, effectively considering spatiotemporal dependencies and event context impact on ED queue states. Experiments conducted on large-scale crowdsensing urban traffic datasets and hospital information system datasets from Xiamen City demonstrate the framework's effectiveness. It achieves an F1 score of 0.93 in ED demand identification, effectively models the ED queue state of key hospitals, and reduces the error in queue state prediction by 18.5%-71.3% compared to baseline methods. CrowdQ, therefore, offers valuable alternatives for public emergency treatment information disclosure and maximized medical resource allocation.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Device-free indoor localization and tracking using commercial millimeter wave radars have attracted much interest lately due to their non-intrusive nature and high spatial resolution. However, it is challenging to achieve high tracking accuracy due to rich multipath reflection and occlusion in indoor environments. Static objects with non-negligible reflectance of mmWave signals interact with moving human subjects and generate time-varying multipath ghosts and shadow ghosts, which can be easily confused as real subjects. To characterize the complex interactions, we first develop a geometric model that estimates the location of multipath ghosts given the locations of humans and static reflectors. Based on this model, the locations of static reflectors that form a reflection map are automatically estimated from received radar signals as a single person traverses the environment along arbitrary trajectories. The reflection map allows for the elimination of multipath and shadow ghost interference as well as the augmentation of weakly reflected human subjects in occluded areas. The proposed environment-aware multi-person tracking system can generate reflection maps with a mean error of 15.5cm and a 90-percentile error of 30.3cm, and achieve multi-person tracking accuracy with a mean error of 8.6cm and a 90-percentile error of 17.5cm, in four representative indoor spaces with diverse subjects using a single mmWave radar.
{"title":"Environment-aware Multi-person Tracking in Indoor Environments with MmWave Radars","authors":"Weiyan Chen, Hongliu Yang, Xiaoyang Bi, Rong Zheng, Fusang Zhang, Peng Bao, Zhaoxin Chang, Xujun Ma, Daqing Zhang","doi":"10.1145/3610902","DOIUrl":"https://doi.org/10.1145/3610902","url":null,"abstract":"Device-free indoor localization and tracking using commercial millimeter wave radars have attracted much interest lately due to their non-intrusive nature and high spatial resolution. However, it is challenging to achieve high tracking accuracy due to rich multipath reflection and occlusion in indoor environments. Static objects with non-negligible reflectance of mmWave signals interact with moving human subjects and generate time-varying multipath ghosts and shadow ghosts, which can be easily confused as real subjects. To characterize the complex interactions, we first develop a geometric model that estimates the location of multipath ghosts given the locations of humans and static reflectors. Based on this model, the locations of static reflectors that form a reflection map are automatically estimated from received radar signals as a single person traverses the environment along arbitrary trajectories. The reflection map allows for the elimination of multipath and shadow ghost interference as well as the augmentation of weakly reflected human subjects in occluded areas. The proposed environment-aware multi-person tracking system can generate reflection maps with a mean error of 15.5cm and a 90-percentile error of 30.3cm, and achieve multi-person tracking accuracy with a mean error of 8.6cm and a 90-percentile error of 17.5cm, in four representative indoor spaces with diverse subjects using a single mmWave radar.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matias Laporte, Martin Gjoreski, Marc Langheinrich
The latest developments in wearable sensors have resulted in a wide range of devices available to consumers, allowing users to monitor and improve their physical activity, sleep patterns, cognitive load, and stress levels. However, the lack of out-of-the-lab labelled data hinders the development of advanced machine learning models for predicting affective states. Furthermore, to the best of our knowledge, there are no publicly available datasets in the area of Human Memory Augmentation. This paper presents a dataset we collected during a 13-week study in a university setting. The dataset, named LAUREATE, contains the physiological data of 42 students during 26 classes (including exams), daily self-reports asking the students about their lifestyle habits (e.g. studying hours, physical activity, and sleep quality) and their performance across multiple examinations. In addition to the raw data, we provide expert features from the physiological data, and baseline machine learning models for estimating self-reported affect, models for recognising classes vs breaks, and models for user identification. Besides the use cases presented in this paper, among which Human Memory Augmentation, the dataset represents a rich resource for the UbiComp community in various domains, including affect recognition, behaviour modelling, user privacy, and activity and context recognition.
{"title":"LAUREATE","authors":"Matias Laporte, Martin Gjoreski, Marc Langheinrich","doi":"10.1145/3610892","DOIUrl":"https://doi.org/10.1145/3610892","url":null,"abstract":"The latest developments in wearable sensors have resulted in a wide range of devices available to consumers, allowing users to monitor and improve their physical activity, sleep patterns, cognitive load, and stress levels. However, the lack of out-of-the-lab labelled data hinders the development of advanced machine learning models for predicting affective states. Furthermore, to the best of our knowledge, there are no publicly available datasets in the area of Human Memory Augmentation. This paper presents a dataset we collected during a 13-week study in a university setting. The dataset, named LAUREATE, contains the physiological data of 42 students during 26 classes (including exams), daily self-reports asking the students about their lifestyle habits (e.g. studying hours, physical activity, and sleep quality) and their performance across multiple examinations. In addition to the raw data, we provide expert features from the physiological data, and baseline machine learning models for estimating self-reported affect, models for recognising classes vs breaks, and models for user identification. Besides the use cases presented in this paper, among which Human Memory Augmentation, the dataset represents a rich resource for the UbiComp community in various domains, including affect recognition, behaviour modelling, user privacy, and activity and context recognition.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).
{"title":"VAX","authors":"Prasoon Patidar, Mayank Goel, Yuvraj Agarwal","doi":"10.1145/3610907","DOIUrl":"https://doi.org/10.1145/3610907","url":null,"abstract":"The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saif Mahmud, Ke Li, Guilin Hu, Hao Chen, Richard Jin, Ruidong Zhang, François Guimbretière, Cheng Zhang
In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications.
{"title":"PoseSonic","authors":"Saif Mahmud, Ke Li, Guilin Hu, Hao Chen, Richard Jin, Ruidong Zhang, François Guimbretière, Cheng Zhang","doi":"10.1145/3610895","DOIUrl":"https://doi.org/10.1145/3610895","url":null,"abstract":"In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a fully-printable method to embed interactive information inside 3D printed objects. The information is invisible to the human eye and can be read using thermal imaging after temperature transfer through interaction with the objects. Prior methods either modify the surface appearance, require customized devices or not commonly used materials, or embed components that are not fully 3D printable. Such limitations restrict the design space for 3D prints, or cannot be readily applied to the already deployed 3D printing setups. In this paper, we present an information embedding technique using low-cost off-the-shelf dual extruder FDM (Fused Deposition Modeling) 3D printers, common materials (e.g., generic PLA), and a mobile thermal device (e.g., a thermal smartphone), by leveraging the thermal properties of common 3D print materials. In addition, we show our method can also be generalized to conventional near-infrared imaging scenarios. We evaluate our technique against multiple design and fabrication parameters and propose a design guideline for different use cases. Finally, we demonstrate various everyday applications enabled by our method, such as interactive thermal displays, user-activated augmented reality, automating thermal triggered events, and hidden tokens for social activities.
{"title":"InfoPrint","authors":"Weiwei Jiang, Chaofan Wang, Zhanna Sarsenbayeva, Andrew Irlitti, Jing Wei, Jarrod Knibbe, Tilman Dingler, Jorge Goncalves, Vassilis Kostakos","doi":"10.1145/3610933","DOIUrl":"https://doi.org/10.1145/3610933","url":null,"abstract":"We present a fully-printable method to embed interactive information inside 3D printed objects. The information is invisible to the human eye and can be read using thermal imaging after temperature transfer through interaction with the objects. Prior methods either modify the surface appearance, require customized devices or not commonly used materials, or embed components that are not fully 3D printable. Such limitations restrict the design space for 3D prints, or cannot be readily applied to the already deployed 3D printing setups. In this paper, we present an information embedding technique using low-cost off-the-shelf dual extruder FDM (Fused Deposition Modeling) 3D printers, common materials (e.g., generic PLA), and a mobile thermal device (e.g., a thermal smartphone), by leveraging the thermal properties of common 3D print materials. In addition, we show our method can also be generalized to conventional near-infrared imaging scenarios. We evaluate our technique against multiple design and fabrication parameters and propose a design guideline for different use cases. Finally, we demonstrate various everyday applications enabled by our method, such as interactive thermal displays, user-activated augmented reality, automating thermal triggered events, and hidden tokens for social activities.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JaeYeon Park, Kichang Lee, Sungmin Lee, Mi Zhang, JeongGil Ko
This work presents AttFL, a federated learning framework designed to continuously improve a personalized deep neural network for efficiently analyzing time-series data generated from mobile and embedded sensing applications. To better characterize time-series data features and efficiently abstract model parameters, AttFL appends a set of attention modules to the baseline deep learning model and exchanges their feature map information to gather collective knowledge across distributed local devices at the server. The server groups devices with similar contextual goals using cosine similarity, and redistributes updated model parameters for improved inference performance at each local device. Specifically, unlike previously proposed federated learning frameworks, AttFL is designed specifically to perform well for various recurrent neural network (RNN) baseline models, making it suitable for many mobile and embedded sensing applications producing time-series sensing data. We evaluate the performance of AttFL and compare with five state-of-the-art federated learning frameworks using three popular mobile/embedded sensing applications (e.g., physiological signal analysis, human activity recognition, and audio processing). Our results obtained from CPU core-based emulations and a 12-node embedded platform testbed shows that AttFL outperforms all alternative approaches in terms of model accuracy and communication/computational overhead, and is flexible enough to be applied in various application scenarios exploiting different baseline deep learning model architectures.
{"title":"AttFL","authors":"JaeYeon Park, Kichang Lee, Sungmin Lee, Mi Zhang, JeongGil Ko","doi":"10.1145/3610917","DOIUrl":"https://doi.org/10.1145/3610917","url":null,"abstract":"This work presents AttFL, a federated learning framework designed to continuously improve a personalized deep neural network for efficiently analyzing time-series data generated from mobile and embedded sensing applications. To better characterize time-series data features and efficiently abstract model parameters, AttFL appends a set of attention modules to the baseline deep learning model and exchanges their feature map information to gather collective knowledge across distributed local devices at the server. The server groups devices with similar contextual goals using cosine similarity, and redistributes updated model parameters for improved inference performance at each local device. Specifically, unlike previously proposed federated learning frameworks, AttFL is designed specifically to perform well for various recurrent neural network (RNN) baseline models, making it suitable for many mobile and embedded sensing applications producing time-series sensing data. We evaluate the performance of AttFL and compare with five state-of-the-art federated learning frameworks using three popular mobile/embedded sensing applications (e.g., physiological signal analysis, human activity recognition, and audio processing). Our results obtained from CPU core-based emulations and a 12-node embedded platform testbed shows that AttFL outperforms all alternative approaches in terms of model accuracy and communication/computational overhead, and is flexible enough to be applied in various application scenarios exploiting different baseline deep learning model architectures.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ge Wang, Lubing Han, Yuance Chang, Yuting Shi, Chen Qian, Cong Zhao, Han Ding, Wei Xi, Cui Zhao, Jizhong Zhao
The ubiquity of illumination facilities enables the versatile development of Visible Light Communication (VLC). VLC-based research achieved high-speed wireless access and decimeter-level indoor localization with complex equipment. However, it is still unclear whether the VLC is applicable for widely-used battery-free Internet-of-Things nodes, e.g., passive RFIDs. This paper proposes LightSign, the first cross-technology system that enables passive RFID tags to receive visible light messages. LightSign is compatible with commercial protocols, transparent to routine RFID communications, and invisible to human eyes. We propose a pseudo-timing instruction to achieve microsecond-level light switching to modulate the VLC message. To make it perceptible to passive RFIDs, we design an augmented RFID tag and prove its effectiveness theoretically and experimentally. With only one reply from an augmented tag, LightSign can decode 100-bit-long VLC messages. We evaluate LightSign in real industry environments and test its performance with two use cases. The results show that LightSign achieves up to 99.2% decoding accuracy in varying scenarios.
{"title":"Cross-technology Communication between Visible Light and Battery-free RFIDs","authors":"Ge Wang, Lubing Han, Yuance Chang, Yuting Shi, Chen Qian, Cong Zhao, Han Ding, Wei Xi, Cui Zhao, Jizhong Zhao","doi":"10.1145/3610883","DOIUrl":"https://doi.org/10.1145/3610883","url":null,"abstract":"The ubiquity of illumination facilities enables the versatile development of Visible Light Communication (VLC). VLC-based research achieved high-speed wireless access and decimeter-level indoor localization with complex equipment. However, it is still unclear whether the VLC is applicable for widely-used battery-free Internet-of-Things nodes, e.g., passive RFIDs. This paper proposes LightSign, the first cross-technology system that enables passive RFID tags to receive visible light messages. LightSign is compatible with commercial protocols, transparent to routine RFID communications, and invisible to human eyes. We propose a pseudo-timing instruction to achieve microsecond-level light switching to modulate the VLC message. To make it perceptible to passive RFIDs, we design an augmented RFID tag and prove its effectiveness theoretically and experimentally. With only one reply from an augmented tag, LightSign can decode 100-bit-long VLC messages. We evaluate LightSign in real industry environments and test its performance with two use cases. The results show that LightSign achieves up to 99.2% decoding accuracy in varying scenarios.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.
{"title":"MI-Poser","authors":"Riku Arakawa, Bing Zhou, Gurunandan Krishnan, Mayank Goel, Shree K. Nayar","doi":"10.1145/3610891","DOIUrl":"https://doi.org/10.1145/3610891","url":null,"abstract":"Inside-out tracking of human body poses using wearable sensors holds significant potential for AR/VR applications, such as remote communication through 3D avatars with expressive body language. Current inside-out systems often rely on vision-based methods utilizing handheld controllers or incorporating densely distributed body-worn IMU sensors. The former limits hands-free and occlusion-robust interactions, while the latter is plagued by inadequate accuracy and jittering. We introduce a novel body tracking system, MI-Poser, which employs AR glasses and two wrist-worn electromagnetic field (EMF) sensors to achieve high-fidelity upper-body pose estimation while mitigating metal interference. Our lightweight system demonstrates a minimal error (6.6 cm mean joint position error) with real-world data collected from 10 participants. It remains robust against various upper-body movements and operates efficiently at 60 Hz. Furthermore, by incorporating an IMU sensor co-located with the EMF sensor, MI-Poser presents solutions to counteract the effects of metal interference, which inherently disrupts the EMF signal during tracking. Our evaluation effectively showcases the successful detection and correction of interference using our EMF-IMU fusion approach across environments with diverse metal profiles. Ultimately, MI-Poser offers a practical pose tracking system, particularly suited for body-centric AR applications.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyang Li, Lin Huang, Siddharth Shah, Sean J. Jones, Yincheng Jin, Dingran Wang, Adam Russell, Seokmin Choi, Yang Gao, Junsong Yuan, Zhanpeng Jin
Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.
{"title":"SignRing","authors":"Jiyang Li, Lin Huang, Siddharth Shah, Sean J. Jones, Yincheng Jin, Dingran Wang, Adam Russell, Seokmin Choi, Yang Gao, Junsong Yuan, Zhanpeng Jin","doi":"10.1145/3610881","DOIUrl":"https://doi.org/10.1145/3610881","url":null,"abstract":"Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}