Shuhan Zhong, Sizhe Song, Tianhao Tang, Fei Nie, Xinrui Zhou, Yankun Zhao, Yizhe Zhao, Kuen Fung Sin, S.-H. Gary Chan
Identifying early a person with dyslexia, a learning disorder with reading and writing, is critical for effective treatment. As accredited specialists for clinical diagnosis of dyslexia are costly and undersupplied, we research and develop a computer-assisted approach to efficiently prescreen dyslexic Chinese children so that timely resources can be channelled to those at higher risk. Previous works in this area are mostly for English and other alphabetic languages, tailored narrowly for the reading disorder, or require costly specialized equipment. To overcome that, we present DYPA, a novel DYslexia Prescreening mobile Application for Chinese children. DYPA collects multimodal data from children through a set of specially designed interactive reading and writing tests in Chinese, and comprehensively analyzes their cognitive-linguistic skills with machine learning. To better account for the dyslexia-associated features in handwritten characters, DYPA employs a deep learning based multilevel Chinese handwriting analysis framework to extract features across the stroke, radical and character levels. We have implemented and installed DYPA in tablets, and our extensive trials with more than 200 pupils in Hong Kong validate its high predictive accuracy (81.14%), sensitivity (74.27%) and specificity (82.71%).
{"title":"DYPA","authors":"Shuhan Zhong, Sizhe Song, Tianhao Tang, Fei Nie, Xinrui Zhou, Yankun Zhao, Yizhe Zhao, Kuen Fung Sin, S.-H. Gary Chan","doi":"10.1145/3610908","DOIUrl":"https://doi.org/10.1145/3610908","url":null,"abstract":"Identifying early a person with dyslexia, a learning disorder with reading and writing, is critical for effective treatment. As accredited specialists for clinical diagnosis of dyslexia are costly and undersupplied, we research and develop a computer-assisted approach to efficiently prescreen dyslexic Chinese children so that timely resources can be channelled to those at higher risk. Previous works in this area are mostly for English and other alphabetic languages, tailored narrowly for the reading disorder, or require costly specialized equipment. To overcome that, we present DYPA, a novel DYslexia Prescreening mobile Application for Chinese children. DYPA collects multimodal data from children through a set of specially designed interactive reading and writing tests in Chinese, and comprehensively analyzes their cognitive-linguistic skills with machine learning. To better account for the dyslexia-associated features in handwritten characters, DYPA employs a deep learning based multilevel Chinese handwriting analysis framework to extract features across the stroke, radical and character levels. We have implemented and installed DYPA in tablets, and our extensive trials with more than 200 pupils in Hong Kong validate its high predictive accuracy (81.14%), sensitivity (74.27%) and specificity (82.71%).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyuan Wang, Maria A. Larrazabal, Mark Rucker, Emma R. Toner, Katharine E. Daniel, Shashwat Kumar, Mehdi Boukhechba, Bethany A. Teachman, Laura E. Barnes
Mobile sensing is a ubiquitous and useful tool to make inferences about individuals' mental health based on physiology and behavior patterns. Along with sensing features directly associated with mental health, it can be valuable to detect different features of social contexts to learn about social interaction patterns over time and across different environments. This can provide insight into diverse communities' academic, work and social lives, and their social networks. We posit that passively detecting social contexts can be particularly useful for social anxiety research, as it may ultimately help identify changes in social anxiety status and patterns of social avoidance and withdrawal. To this end, we recruited a sample of highly socially anxious undergraduate students (N=46) to examine whether we could detect the presence of experimentally manipulated virtual social contexts via wristband sensors. Using a multitask machine learning pipeline, we leveraged passively sensed biobehavioral streams to detect contexts relevant to social anxiety, including (1) whether people were in a social situation, (2) size of the social group, (3) degree of social evaluation, and (4) phase of social situation (anticipating, actively experiencing, or had just participated in an experience). Results demonstrated the feasibility of detecting most virtual social contexts, with stronger predictive accuracy when detecting whether individuals were in a social situation or not and the phase of the situation, and weaker predictive accuracy when detecting the level of social evaluation. They also indicated that sensing streams are differentially important to prediction based on the context being predicted. Our findings also provide useful information regarding design elements relevant to passive context detection, including optimal sensing duration, the utility of different sensing modalities, and the need for personalization. We discuss implications of these findings for future work on context detection (e.g., just-in-time adaptive intervention development).
{"title":"Detecting Social Contexts from Mobile Sensing Indicators in Virtual Interactions with Socially Anxious Individuals","authors":"Zhiyuan Wang, Maria A. Larrazabal, Mark Rucker, Emma R. Toner, Katharine E. Daniel, Shashwat Kumar, Mehdi Boukhechba, Bethany A. Teachman, Laura E. Barnes","doi":"10.1145/3610916","DOIUrl":"https://doi.org/10.1145/3610916","url":null,"abstract":"Mobile sensing is a ubiquitous and useful tool to make inferences about individuals' mental health based on physiology and behavior patterns. Along with sensing features directly associated with mental health, it can be valuable to detect different features of social contexts to learn about social interaction patterns over time and across different environments. This can provide insight into diverse communities' academic, work and social lives, and their social networks. We posit that passively detecting social contexts can be particularly useful for social anxiety research, as it may ultimately help identify changes in social anxiety status and patterns of social avoidance and withdrawal. To this end, we recruited a sample of highly socially anxious undergraduate students (N=46) to examine whether we could detect the presence of experimentally manipulated virtual social contexts via wristband sensors. Using a multitask machine learning pipeline, we leveraged passively sensed biobehavioral streams to detect contexts relevant to social anxiety, including (1) whether people were in a social situation, (2) size of the social group, (3) degree of social evaluation, and (4) phase of social situation (anticipating, actively experiencing, or had just participated in an experience). Results demonstrated the feasibility of detecting most virtual social contexts, with stronger predictive accuracy when detecting whether individuals were in a social situation or not and the phase of the situation, and weaker predictive accuracy when detecting the level of social evaluation. They also indicated that sensing streams are differentially important to prediction based on the context being predicted. Our findings also provide useful information regarding design elements relevant to passive context detection, including optimal sensing duration, the utility of different sensing modalities, and the need for personalization. We discuss implications of these findings for future work on context detection (e.g., just-in-time adaptive intervention development).","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinmaey Shende, Soumyashree Sahoo, Stephen Sam, Parit Patel, Reynaldo Morillo, Xinyu Wang, Shweta Ware, Jinbo Bi, Jayesh Kamath, Alexander Russell, Dongjin Song, Bing Wang
Depression is a serious mental illness. The current best guideline in depression treatment is closely monitoring patients and adjusting treatment as needed. Close monitoring of patients through physician-administered follow-ups or self-administered questionnaires, however, is difficult in clinical settings due to high cost, lack of trained professionals, and burden to the patients. Sensory data collected from mobile devices has been shown to provide a promising direction for long-term monitoring of depression symptoms. Most existing studies in this direction, however, focus on depression detection; the few studies that are on predicting changes in depression are not in clinical settings. In this paper, we investigate using one type of sensory data, sleep data, collected from wearables to predict improvement of depression symptoms over time after a patient initiates a new pharmacological treatment. We apply sleep trend filtering to noisy sleep sensory data to extract high-level sleep characteristics and develop a family of machine learning models that use simple sleep features (mean and variation of sleep duration) to predict symptom improvement. Our results show that using such simple sleep features can already lead to validation F1 score up to 0.68, indicating that using sensory data for predicting depression improvement during treatment is a promising direction.
{"title":"Predicting Symptom Improvement During Depression Treatment Using Sleep Sensory Data","authors":"Chinmaey Shende, Soumyashree Sahoo, Stephen Sam, Parit Patel, Reynaldo Morillo, Xinyu Wang, Shweta Ware, Jinbo Bi, Jayesh Kamath, Alexander Russell, Dongjin Song, Bing Wang","doi":"10.1145/3610932","DOIUrl":"https://doi.org/10.1145/3610932","url":null,"abstract":"Depression is a serious mental illness. The current best guideline in depression treatment is closely monitoring patients and adjusting treatment as needed. Close monitoring of patients through physician-administered follow-ups or self-administered questionnaires, however, is difficult in clinical settings due to high cost, lack of trained professionals, and burden to the patients. Sensory data collected from mobile devices has been shown to provide a promising direction for long-term monitoring of depression symptoms. Most existing studies in this direction, however, focus on depression detection; the few studies that are on predicting changes in depression are not in clinical settings. In this paper, we investigate using one type of sensory data, sleep data, collected from wearables to predict improvement of depression symptoms over time after a patient initiates a new pharmacological treatment. We apply sleep trend filtering to noisy sleep sensory data to extract high-level sleep characteristics and develop a family of machine learning models that use simple sleep features (mean and variation of sleep duration) to predict symptom improvement. Our results show that using such simple sleep features can already lead to validation F1 score up to 0.68, indicating that using sensory data for predicting depression improvement during treatment is a promising direction.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Likun Fang, Timo Müller, Erik Pescara, Nikola Fischer, Yiran Huang, Michael Beigl
Passive Haptic Learning (PHL) is a method by which users are able to learn motor skills without paying active attention. In past research, vibration is widely applied in PHL as the signal delivered on the participant's skin. The human somatosensory system provides not only discriminative input (the perception of pressure, vibration, slip, and texture, etc.) to the brain but also an affective input (sliding, tapping and stroking, etc.). The former is often described as being mediated by low-threshold mechanosensitive (LTM) units with rapidly conducting large myelinated (Aᵬ) afferents, while the latter is mediated by a class of LTM afferents called C-tactile afferents (CTs). We investigated whether different tactile sensations (tapping, light stroking, and vibration) influence the learning effect of PHL in this work. We built three wearable systems corresponding to the three sensations respectively. 17 participants were invited to learn to play three different note sequences passively via three different systems. The subjects were then tested on their remembered note sequences after each learning session. Our results indicate that the sensations of tapping or stroking are as effective as the vibration system in passive haptic learning of piano songs, providing viable alternatives to the vibration sensations that have been used so far. We also found that participants on average made up to 1.06 errors less when using affective inputs, namely tapping or stroking. As the first work exploring the differences in multiple types of tactile sensations in PHL, we offer our design to the readers and hope they may employ our works for further research of PHL.
{"title":"Investigating Passive Haptic Learning of Piano Songs Using Three Tactile Sensations of Vibration, Stroking and Tapping","authors":"Likun Fang, Timo Müller, Erik Pescara, Nikola Fischer, Yiran Huang, Michael Beigl","doi":"10.1145/3610899","DOIUrl":"https://doi.org/10.1145/3610899","url":null,"abstract":"Passive Haptic Learning (PHL) is a method by which users are able to learn motor skills without paying active attention. In past research, vibration is widely applied in PHL as the signal delivered on the participant's skin. The human somatosensory system provides not only discriminative input (the perception of pressure, vibration, slip, and texture, etc.) to the brain but also an affective input (sliding, tapping and stroking, etc.). The former is often described as being mediated by low-threshold mechanosensitive (LTM) units with rapidly conducting large myelinated (Aᵬ) afferents, while the latter is mediated by a class of LTM afferents called C-tactile afferents (CTs). We investigated whether different tactile sensations (tapping, light stroking, and vibration) influence the learning effect of PHL in this work. We built three wearable systems corresponding to the three sensations respectively. 17 participants were invited to learn to play three different note sequences passively via three different systems. The subjects were then tested on their remembered note sequences after each learning session. Our results indicate that the sensations of tapping or stroking are as effective as the vibration system in passive haptic learning of piano songs, providing viable alternatives to the vibration sensations that have been used so far. We also found that participants on average made up to 1.06 errors less when using affective inputs, namely tapping or stroking. As the first work exploring the differences in multiple types of tactile sensations in PHL, we offer our design to the readers and hope they may employ our works for further research of PHL.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guanzhou Zhu, Dong Zhao, Kuo Tian, Zhengyuan Zhang, Rui Yuan, Huadong Ma
Energy disaggregation is a key enabling technology for residential power usage monitoring, which benefits various applications such as carbon emission monitoring and human activity recognition. However, existing methods are difficult to balance the accuracy and usage burden (device costs, data labeling and prior knowledge). As the high penetration of smart speakers offers a low-cost way for sound-assisted residential power usage monitoring, this work aims to combine a smart speaker and a smart meter in a house to liberate the system from a high usage burden. However, it is still challenging to extract and leverage the consistent/complementary information (two types of relationships between acoustic and power features) from acoustic and power data without data labeling or prior knowledge. To this end, we design COMFORT, a cross-modality system for self-supervised power usage monitoring, including (i) a cross-modality learning component to automatically learn the consistent and complementary information, and (ii) a cross-modality inference component to utilize the consistent and complementary information. We implement and evaluate COMFORT with a self-collected dataset from six houses in 14 days, demonstrating that COMFORT finds the most appliances (98%), improves the appliance recognition performance in F-measure by at least 41.1%, and reduces the Mean Absolute Error (MAE) of energy disaggregation by at least 30.4% over other alternative solutions.
{"title":"Combining Smart Speaker and Smart Meter to Infer Your Residential Power Usage by Self-supervised Cross-modal Learning","authors":"Guanzhou Zhu, Dong Zhao, Kuo Tian, Zhengyuan Zhang, Rui Yuan, Huadong Ma","doi":"10.1145/3610905","DOIUrl":"https://doi.org/10.1145/3610905","url":null,"abstract":"Energy disaggregation is a key enabling technology for residential power usage monitoring, which benefits various applications such as carbon emission monitoring and human activity recognition. However, existing methods are difficult to balance the accuracy and usage burden (device costs, data labeling and prior knowledge). As the high penetration of smart speakers offers a low-cost way for sound-assisted residential power usage monitoring, this work aims to combine a smart speaker and a smart meter in a house to liberate the system from a high usage burden. However, it is still challenging to extract and leverage the consistent/complementary information (two types of relationships between acoustic and power features) from acoustic and power data without data labeling or prior knowledge. To this end, we design COMFORT, a cross-modality system for self-supervised power usage monitoring, including (i) a cross-modality learning component to automatically learn the consistent and complementary information, and (ii) a cross-modality inference component to utilize the consistent and complementary information. We implement and evaluate COMFORT with a self-collected dataset from six houses in 14 days, demonstrating that COMFORT finds the most appliances (98%), improves the appliance recognition performance in F-measure by at least 41.1%, and reduces the Mean Absolute Error (MAE) of energy disaggregation by at least 30.4% over other alternative solutions.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Designing an extensive set of mid-air gestures that are both easy to learn and perform quickly presents a significant challenge. Further complicating this challenge is achieving high-accuracy detection of such gestures using commonly available hardware, like a 2D commodity camera. Previous work often proposed smaller, application-specific gesture sets, requiring specialized hardware and struggling with adaptability across diverse environments. Addressing these limitations, this paper introduces Abacus Gestures, a comprehensive collection of 100 mid-air gestures. Drawing on the metaphor of Finger Abacus counting, gestures are formed from various combinations of open and closed fingers, each assigned different values. We developed an algorithm using an off-the-shelf computer vision library capable of detecting these gestures from a 2D commodity camera feed with an accuracy exceeding 98% for palms facing the camera and 95% for palms facing the body. We assessed the detection accuracy, ease of learning, and usability of these gestures in a user study involving 20 participants. The study found that participants could learn Abacus Gestures within five minutes after executing just 15 gestures and could recall them after a four-month interval. Additionally, most participants developed motor memory for these gestures after performing 100 gestures. Most of the gestures were easy to execute with the designated finger combinations, and the flexibility in executing the gestures using multiple finger combinations further enhanced the usability. Based on these findings, we created a taxonomy that categorizes Abacus Gestures into five groups based on motor memory development and three difficulty levels according to their ease of execution. Finally, we provided design guidelines and proposed potential use cases for Abacus Gestures in the realm of mid-air interaction.
{"title":"Abacus Gestures","authors":"Md Ehtesham-Ul-Haque, Syed Masum Billah","doi":"10.1145/3610898","DOIUrl":"https://doi.org/10.1145/3610898","url":null,"abstract":"Designing an extensive set of mid-air gestures that are both easy to learn and perform quickly presents a significant challenge. Further complicating this challenge is achieving high-accuracy detection of such gestures using commonly available hardware, like a 2D commodity camera. Previous work often proposed smaller, application-specific gesture sets, requiring specialized hardware and struggling with adaptability across diverse environments. Addressing these limitations, this paper introduces Abacus Gestures, a comprehensive collection of 100 mid-air gestures. Drawing on the metaphor of Finger Abacus counting, gestures are formed from various combinations of open and closed fingers, each assigned different values. We developed an algorithm using an off-the-shelf computer vision library capable of detecting these gestures from a 2D commodity camera feed with an accuracy exceeding 98% for palms facing the camera and 95% for palms facing the body. We assessed the detection accuracy, ease of learning, and usability of these gestures in a user study involving 20 participants. The study found that participants could learn Abacus Gestures within five minutes after executing just 15 gestures and could recall them after a four-month interval. Additionally, most participants developed motor memory for these gestures after performing 100 gestures. Most of the gestures were easy to execute with the designated finger combinations, and the flexibility in executing the gestures using multiple finger combinations further enhanced the usability. Based on these findings, we created a taxonomy that categorizes Abacus Gestures into five groups based on motor memory development and three difficulty levels according to their ease of execution. Finally, we provided design guidelines and proposed potential use cases for Abacus Gestures in the realm of mid-air interaction.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The primary focus of this research is the discreet and subtle everyday contact interactions between mobile phones and their surrounding surfaces. Such interactions are anticipated to facilitate mobile context awareness, encompassing aspects such as dispensing medication updates, intelligently switching modes (e.g., silent mode), or initiating commands (e.g., deactivating an alarm). We introduce MicroCam, a contact-based sensing system that employs smartphone IMU data to detect the routine state of phone placement and utilizes a built-in microscope camera to capture intricate surface details. In particular, a natural dataset is collected to acquire authentic surface textures in situ for training and testing. Moreover, we optimize the deep neural network component of the algorithm, based on continual learning, to accurately discriminate between object categories (e.g., tables) and material constituents (e.g., wood). Experimental results highlight the superior accuracy, robustness and generalization of the proposed method. Lastly, we conducted a comprehensive discussion centered on our prototype, encompassing topics such as system performance and potential applications and scenarios.
{"title":"MicroCam","authors":"Yongquan Hu, Hui-Shyong Yeo, Mingyue Yuan, Haoran Fan, Don Samitha Elvitigala, Wen Hu, Aaron Quigley","doi":"10.1145/3610921","DOIUrl":"https://doi.org/10.1145/3610921","url":null,"abstract":"The primary focus of this research is the discreet and subtle everyday contact interactions between mobile phones and their surrounding surfaces. Such interactions are anticipated to facilitate mobile context awareness, encompassing aspects such as dispensing medication updates, intelligently switching modes (e.g., silent mode), or initiating commands (e.g., deactivating an alarm). We introduce MicroCam, a contact-based sensing system that employs smartphone IMU data to detect the routine state of phone placement and utilizes a built-in microscope camera to capture intricate surface details. In particular, a natural dataset is collected to acquire authentic surface textures in situ for training and testing. Moreover, we optimize the deep neural network component of the algorithm, based on continual learning, to accurately discriminate between object categories (e.g., tables) and material constituents (e.g., wood). Experimental results highlight the superior accuracy, robustness and generalization of the proposed method. Lastly, we conducted a comprehensive discussion centered on our prototype, encompassing topics such as system performance and potential applications and scenarios.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juexing Wang, Guangjing Wang, Xiao Zhang, Li Liu, Huacheng Zeng, Li Xiao, Zhichao Cao, Lin Gu, Tianxing Li
Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.
{"title":"PATCH","authors":"Juexing Wang, Guangjing Wang, Xiao Zhang, Li Liu, Huacheng Zeng, Li Xiao, Zhichao Cao, Lin Gu, Tianxing Li","doi":"10.1145/3610885","DOIUrl":"https://doi.org/10.1145/3610885","url":null,"abstract":"Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dennis Stanke, Tim Duente, Kerem Can Demir, Michael Rohs
The earlobe is a well-known location for wearing jewelry, but might also be promising for electronic output, such as presenting notifications. This work elaborates the pros and cons of different notification channels for the earlobe. Notifications on the earlobe can be private (only noticeable by the wearer) as well as public (noticeable in the immediate vicinity in a given social situation). A user study with 18 participants showed that the reaction times for the private channels (Poke, Vibration, Private Sound, Electrotactile) were on average less than 1 s with an error rate (missed notifications) of less than 1 %. Thermal Warm and Cold took significantly longer and Cold was least reliable (26 % error rate). The participants preferred Electrotactile and Vibration. Among the public channels the recognition time did not differ significantly between Sound (738 ms) and LED (828 ms), but Display took much longer (3175 ms). At 22 % the error rate of Display was highest. The participants generally felt comfortable wearing notification devices on their earlobe. The results show that the earlobe indeed is a suitable location for wearable technology, if properly miniaturized, which is possible for Electrotactile and LED. We present application scenarios and discuss design considerations. A small field study in a fitness center demonstrates the suitability of the earlobe notification concept in a sports context.
{"title":"Can You Ear Me?","authors":"Dennis Stanke, Tim Duente, Kerem Can Demir, Michael Rohs","doi":"10.1145/3610925","DOIUrl":"https://doi.org/10.1145/3610925","url":null,"abstract":"The earlobe is a well-known location for wearing jewelry, but might also be promising for electronic output, such as presenting notifications. This work elaborates the pros and cons of different notification channels for the earlobe. Notifications on the earlobe can be private (only noticeable by the wearer) as well as public (noticeable in the immediate vicinity in a given social situation). A user study with 18 participants showed that the reaction times for the private channels (Poke, Vibration, Private Sound, Electrotactile) were on average less than 1 s with an error rate (missed notifications) of less than 1 %. Thermal Warm and Cold took significantly longer and Cold was least reliable (26 % error rate). The participants preferred Electrotactile and Vibration. Among the public channels the recognition time did not differ significantly between Sound (738 ms) and LED (828 ms), but Display took much longer (3175 ms). At 22 % the error rate of Display was highest. The participants generally felt comfortable wearing notification devices on their earlobe. The results show that the earlobe indeed is a suitable location for wearable technology, if properly miniaturized, which is possible for Electrotactile and LED. We present application scenarios and discuss design considerations. A small field study in a fitness center demonstrates the suitability of the earlobe notification concept in a sports context.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135536454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nod and shake of one's head are intuitive and universal gestures in communication. As smartwatches become increasingly intelligent through advances in user activity sensing technologies, many use scenarios of smartwatches demand quick responses from users in confirmation dialogs, to accept or dismiss proposed actions. Such proposed actions include making emergency calls, taking service recommendations, and starting or stopping exercise timers. Head gestures in these scenarios could be preferable to touch interactions for being hands-free and easy to perform. We propose Headar to recognize these gestures on smartwatches using wearable millimeter wave sensing. We first surveyed head gestures to understand how they are performed in conversational settings. We then investigated positions and orientations to which users raise their smartwatches. Insights from these studies guided the implementation of Headar. Additionally, we conducted modeling and simulation to verify our sensing principle. We developed a real-time sensing and inference pipeline using contemporary deep learning techniques, and proved the feasibility of our proposed approach with a user study (n=15) and a live test (n=8). Our evaluation yielded an average accuracy of 84.0% in the user study across 9 classes including nod and shake as well as seven other signals -- still, speech, touch interaction, and four non-gestural head motions (i.e., head up, left, right, and down). Furthermore, we obtained an accuracy of 72.6% in the live test which reveals rich insights into the performance of our approach in various realistic conditions.
{"title":"Headar","authors":"Xiaoying Yang, Xue Wang, Gaofeng Dong, Zihan Yan, Mani Srivastava, Eiji Hayashi, Yang Zhang","doi":"10.1145/3610900","DOIUrl":"https://doi.org/10.1145/3610900","url":null,"abstract":"Nod and shake of one's head are intuitive and universal gestures in communication. As smartwatches become increasingly intelligent through advances in user activity sensing technologies, many use scenarios of smartwatches demand quick responses from users in confirmation dialogs, to accept or dismiss proposed actions. Such proposed actions include making emergency calls, taking service recommendations, and starting or stopping exercise timers. Head gestures in these scenarios could be preferable to touch interactions for being hands-free and easy to perform. We propose Headar to recognize these gestures on smartwatches using wearable millimeter wave sensing. We first surveyed head gestures to understand how they are performed in conversational settings. We then investigated positions and orientations to which users raise their smartwatches. Insights from these studies guided the implementation of Headar. Additionally, we conducted modeling and simulation to verify our sensing principle. We developed a real-time sensing and inference pipeline using contemporary deep learning techniques, and proved the feasibility of our proposed approach with a user study (n=15) and a live test (n=8). Our evaluation yielded an average accuracy of 84.0% in the user study across 9 classes including nod and shake as well as seven other signals -- still, speech, touch interaction, and four non-gestural head motions (i.e., head up, left, right, and down). Furthermore, we obtained an accuracy of 72.6% in the live test which reveals rich insights into the performance of our approach in various realistic conditions.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135535368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}