Human Activity Recognition (HAR) based on embedded sensor data has become a popular research topic in ubiquitous computing, which has a wide range of practical applications in various fields such as human-computer interaction, healthcare, and motion tracking. Due to the difficulties of annotating sensing data, unsupervised and semi-supervised HAR methods are extensively studied, but their performance gap to the fully-supervised methods is notable. In this paper, we proposed a novel cross-modal co-learning approach called TS2ACT to achieve few-shot HAR. It introduces a cross-modal dataset augmentation method that uses the semantic-rich label text to search for human activity images to form an augmented dataset consisting of partially-labeled time series and fully-labeled images. Then it adopts a pre-trained CLIP image encoder to jointly train with a time series encoder using contrastive learning, where the time series and images are brought closer in feature space if they belong to the same activity class. For inference, the feature extracted from the input time series is compared with the embedding of a pre-trained CLIP text encoder using prompt learning, and the best match is output as the HAR classification results. We conducted extensive experiments on four public datasets to evaluate the performance of the proposed method. The numerical results show that TS2ACT significantly outperforms the state-of-the-art HAR methods, and it achieves performance close to or better than the fully supervised methods even using as few as 1% labeled data for model training. The source codes of TS2ACT are publicly available on GitHub1.
基于嵌入式传感器数据的人类活动识别(HAR)已成为泛在计算领域的热门研究课题,在人机交互、医疗保健和运动跟踪等多个领域有着广泛的实际应用。由于感知数据注释的困难,无监督和半监督 HAR 方法被广泛研究,但其性能与全监督方法相比差距明显。在本文中,我们提出了一种名为 TS2ACT 的新型跨模态协同学习方法,以实现少点 HAR。它引入了一种跨模态数据集增强方法,利用语义丰富的标签文本搜索人类活动图像,形成一个由部分标签时间序列和完全标签图像组成的增强数据集。然后,它采用对比学习方法,将预先训练好的 CLIP 图像编码器与时间序列编码器联合训练,如果时间序列和图像属于同一活动类别,则在特征空间中将它们拉近。在推理过程中,从输入时间序列中提取的特征会与预先训练好的 CLIP 文本编码器的嵌入进行比较,然后输出最佳匹配结果作为 HAR 分类结果。我们在四个公共数据集上进行了大量实验,以评估所提出方法的性能。数值结果表明,TS2ACT 的性能明显优于最先进的 HAR 方法,即使只使用 1% 的标注数据进行模型训练,它也能达到接近或优于完全监督方法的性能。TS2ACT 的源代码可在 GitHub 上公开获取1。
{"title":"TS2ACT","authors":"Kang Xia, Wenzhong Li, Shiwei Gan, Sanglu Lu","doi":"10.1145/3631445","DOIUrl":"https://doi.org/10.1145/3631445","url":null,"abstract":"Human Activity Recognition (HAR) based on embedded sensor data has become a popular research topic in ubiquitous computing, which has a wide range of practical applications in various fields such as human-computer interaction, healthcare, and motion tracking. Due to the difficulties of annotating sensing data, unsupervised and semi-supervised HAR methods are extensively studied, but their performance gap to the fully-supervised methods is notable. In this paper, we proposed a novel cross-modal co-learning approach called TS2ACT to achieve few-shot HAR. It introduces a cross-modal dataset augmentation method that uses the semantic-rich label text to search for human activity images to form an augmented dataset consisting of partially-labeled time series and fully-labeled images. Then it adopts a pre-trained CLIP image encoder to jointly train with a time series encoder using contrastive learning, where the time series and images are brought closer in feature space if they belong to the same activity class. For inference, the feature extracted from the input time series is compared with the embedding of a pre-trained CLIP text encoder using prompt learning, and the best match is output as the HAR classification results. We conducted extensive experiments on four public datasets to evaluate the performance of the proposed method. The numerical results show that TS2ACT significantly outperforms the state-of-the-art HAR methods, and it achieves performance close to or better than the fully supervised methods even using as few as 1% labeled data for model training. The source codes of TS2ACT are publicly available on GitHub1.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"1 10","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mayara Costa Figueiredo, Elizabeth A. Ankrah, Jacquelyn E. Powell, Daniel A. Epstein, Yunan Chen
Recently, there has been a proliferation of personal health applications describing to use Artificial Intelligence (AI) to assist health consumers in making health decisions based on their data and algorithmic outputs. However, it is still unclear how such descriptions influence individuals' perceptions of such apps and their recommendations. We therefore investigate how current AI descriptions influence individuals' attitudes towards algorithmic recommendations in fertility self-tracking through a simulated study using three versions of a fertility app. We found that participants preferred AI descriptions with explanation, which they perceived as more accurate and trustworthy. Nevertheless, they were unwilling to rely on these apps for high-stakes goals because of the potential consequences of a failure. We then discuss the importance of health goals for AI acceptance, how literacy and assumptions influence perceptions of AI descriptions and explanations, and the limitations of transparency in the context of algorithmic decision-making for personal health.
{"title":"Powered by AI","authors":"Mayara Costa Figueiredo, Elizabeth A. Ankrah, Jacquelyn E. Powell, Daniel A. Epstein, Yunan Chen","doi":"10.1145/3631414","DOIUrl":"https://doi.org/10.1145/3631414","url":null,"abstract":"Recently, there has been a proliferation of personal health applications describing to use Artificial Intelligence (AI) to assist health consumers in making health decisions based on their data and algorithmic outputs. However, it is still unclear how such descriptions influence individuals' perceptions of such apps and their recommendations. We therefore investigate how current AI descriptions influence individuals' attitudes towards algorithmic recommendations in fertility self-tracking through a simulated study using three versions of a fertility app. We found that participants preferred AI descriptions with explanation, which they perceived as more accurate and trustworthy. Nevertheless, they were unwilling to rely on these apps for high-stakes goals because of the potential consequences of a failure. We then discuss the importance of health goals for AI acceptance, how literacy and assumptions influence perceptions of AI descriptions and explanations, and the limitations of transparency in the context of algorithmic decision-making for personal health.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"9 11","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early screening for dry eye disease (DED) is crucial to identify and provide timely intervention to high-risk susceptible populations. Currently, clinical methods for diagnosing DED include the tear break-up time test, meibomian gland analysis, tear osmolarity test, and tear river height test, which require in-hospital detection. Unfortunately, there is no convenient way to screen for DED yet. In this paper, we propose SDE, a contactless, convenient, and ubiquitous DED screening system based on RF signals. To extract biomarkers for early screening of DED from RF signals, we construct frame chirps variance and extract fine-grained spontaneous blinking action. SDE is carefully designed to remove interference in RF signals and refine the characterization of biomarkers that denote the symptoms of DED. To endow SDE with the ability to adapt to new users, we develop a deep learning-based model of unsupervised domain adaptation to remove the influence of different users and environments in local and global two-level feature spaces. We conduct extensive experiments to evaluate SDE with 54 volunteers in 4 scenes. The experimental results confirm that SDE can accurately screen for DED in a new user in real environments such as eye examination rooms, clinics, offices, and homes.
{"title":"SDE","authors":"Meng Xue, Yuyang Zeng, Shengkang Gu, Qian Zhang, Bowei Tian, Changzheng Chen","doi":"10.1145/3631438","DOIUrl":"https://doi.org/10.1145/3631438","url":null,"abstract":"Early screening for dry eye disease (DED) is crucial to identify and provide timely intervention to high-risk susceptible populations. Currently, clinical methods for diagnosing DED include the tear break-up time test, meibomian gland analysis, tear osmolarity test, and tear river height test, which require in-hospital detection. Unfortunately, there is no convenient way to screen for DED yet. In this paper, we propose SDE, a contactless, convenient, and ubiquitous DED screening system based on RF signals. To extract biomarkers for early screening of DED from RF signals, we construct frame chirps variance and extract fine-grained spontaneous blinking action. SDE is carefully designed to remove interference in RF signals and refine the characterization of biomarkers that denote the symptoms of DED. To endow SDE with the ability to adapt to new users, we develop a deep learning-based model of unsupervised domain adaptation to remove the influence of different users and environments in local and global two-level feature spaces. We conduct extensive experiments to evaluate SDE with 54 volunteers in 4 scenes. The experimental results confirm that SDE can accurately screen for DED in a new user in real environments such as eye examination rooms, clinics, offices, and homes.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"1 2","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tracking the angular movement of body joints has been a critical enabler for various applications, such as virtual and augmented reality, sports monitoring, and medical rehabilitation. Despite the strong demand for accurate joint tracking, existing techniques, such as cameras, IMUs, and flex sensors, suffer from major limitations that include occlusion, cumulative error, and high cost. These issues collectively undermine the practicality of joint tracking. We introduce MagDot, a new magnetic-based joint tracking method that enables high-accuracy, drift-free, and wearable joint angle tracking. To overcome the limitations of existing techniques, MagDot employs a novel tracking scheme that compensates for various real-world impacts, achieving high tracking accuracy. We tested MagDot on eight participants with a professional motion capture system, i.e., Qualisys motion capture system with nine Arqus A12 cameras. The results indicate MagDot can accurately track major body joints. For example, MagDot can achieve tracking accuracy of 2.72°, 4.14°, and 4.61° for elbow, knee, and shoulder, respectively. With a power consumption of only 98 mW, MagDot can support one-day usage with a small battery pack.
{"title":"MagDot","authors":"Dongyao Chen, Qing Luo, Xiaomeng Chen, Xinbing Wang, Chenghui Zhou","doi":"10.1145/3631423","DOIUrl":"https://doi.org/10.1145/3631423","url":null,"abstract":"Tracking the angular movement of body joints has been a critical enabler for various applications, such as virtual and augmented reality, sports monitoring, and medical rehabilitation. Despite the strong demand for accurate joint tracking, existing techniques, such as cameras, IMUs, and flex sensors, suffer from major limitations that include occlusion, cumulative error, and high cost. These issues collectively undermine the practicality of joint tracking. We introduce MagDot, a new magnetic-based joint tracking method that enables high-accuracy, drift-free, and wearable joint angle tracking. To overcome the limitations of existing techniques, MagDot employs a novel tracking scheme that compensates for various real-world impacts, achieving high tracking accuracy. We tested MagDot on eight participants with a professional motion capture system, i.e., Qualisys motion capture system with nine Arqus A12 cameras. The results indicate MagDot can accurately track major body joints. For example, MagDot can achieve tracking accuracy of 2.72°, 4.14°, and 4.61° for elbow, knee, and shoulder, respectively. With a power consumption of only 98 mW, MagDot can support one-day usage with a small battery pack.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"8 8","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139438005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiushi Zhou, B. V. Syiem, Beier Li, Eduardo Velloso
We propose Reflected Reality: a new dimension for augmented reality that expands the augmented physical space into mirror reflections. By synchronously tracking the physical space in front of the mirror and the reflection behind it using an AR headset and an optional smart mirror component, reflected reality enables novel AR interactions that allow users to use their physical and reflected bodies to find and interact with virtual objects. We propose a design space for AR interaction with mirror reflections, and instantiate it using a prototype system featuring a HoloLens 2 and a smart mirror. We explore the design space along the following dimensions: the user's perspective of input, the spatial frame of reference, and the direction of the mirror space relative to the physical space. Using our prototype, we visualise a use case scenario that traverses the design space to demonstrate its interaction affordances in a practical context. To understand how users perceive the intuitiveness and ease of reflected reality interaction, we conducted an exploratory and a formal user evaluation studies to characterise user performance of AR interaction tasks in reflected reality. We discuss the unique interaction affordances that reflected reality offers, and outline possibilities of its future applications.
我们提出了 "反射现实"(Reflected Reality):增强现实的一个新维度,它将增强物理空间扩展到镜面反射中。通过使用 AR 头显和可选的智能镜子组件同步跟踪镜子前的物理空间和镜子后的反射,反射现实可以实现新颖的 AR 互动,让用户可以使用他们的物理和反射身体来找到虚拟对象并与之互动。我们提出了利用镜面反射进行 AR 互动的设计空间,并利用 HoloLens 2 和智能镜子的原型系统将其实例化。我们沿着以下维度探索设计空间:用户的输入视角、空间参照系以及镜像空间相对于物理空间的方向。利用我们的原型,我们可视化了一个穿越设计空间的用例场景,以展示其在实际环境中的交互能力。为了了解用户如何感知反射现实交互的直观性和易用性,我们进行了一项探索性和正式的用户评估研究,以描述用户在反射现实中执行 AR 交互任务的表现。我们讨论了反射现实所提供的独特交互能力,并概述了其未来应用的可能性。
{"title":"Reflected Reality","authors":"Qiushi Zhou, B. V. Syiem, Beier Li, Eduardo Velloso","doi":"10.1145/3631431","DOIUrl":"https://doi.org/10.1145/3631431","url":null,"abstract":"We propose Reflected Reality: a new dimension for augmented reality that expands the augmented physical space into mirror reflections. By synchronously tracking the physical space in front of the mirror and the reflection behind it using an AR headset and an optional smart mirror component, reflected reality enables novel AR interactions that allow users to use their physical and reflected bodies to find and interact with virtual objects. We propose a design space for AR interaction with mirror reflections, and instantiate it using a prototype system featuring a HoloLens 2 and a smart mirror. We explore the design space along the following dimensions: the user's perspective of input, the spatial frame of reference, and the direction of the mirror space relative to the physical space. Using our prototype, we visualise a use case scenario that traverses the design space to demonstrate its interaction affordances in a practical context. To understand how users perceive the intuitiveness and ease of reflected reality interaction, we conducted an exploratory and a formal user evaluation studies to characterise user performance of AR interaction tasks in reflected reality. We discuss the unique interaction affordances that reflected reality offers, and outline possibilities of its future applications.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"2 4","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139438023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present PyroSense, the first-of-its-kind system that enables fine-grained 3D posture reconstruction using ubiquitous COTS passive infrared sensor (PIR sensor). PyroSense senses heat signals generated by the human body and airflow due to body movement to reconstruct the corresponding human postures in real time. PyroSense greatly advances the prior PIR-based sensing design by improving the sensitivity of COTS PIR sensor to body movement, increasing spatial resolution without additional deployment overhead, and designing intellectual algorithms to adapt to diverse environmental factors. We build a low-cost PyroSense prototype using off-the-shelf hardware components. The experimental findings indicate that PyroSense not only attains a classification accuracy of 99.46% across 15 classes, but it also registers a mean joint distance error of less than 16 cm for 14 body joints for posture reconstruction in challenging environments.
{"title":"PyroSense","authors":"Huaili Zeng, Gen Li, Tianxing Li","doi":"10.1145/3631435","DOIUrl":"https://doi.org/10.1145/3631435","url":null,"abstract":"We present PyroSense, the first-of-its-kind system that enables fine-grained 3D posture reconstruction using ubiquitous COTS passive infrared sensor (PIR sensor). PyroSense senses heat signals generated by the human body and airflow due to body movement to reconstruct the corresponding human postures in real time. PyroSense greatly advances the prior PIR-based sensing design by improving the sensitivity of COTS PIR sensor to body movement, increasing spatial resolution without additional deployment overhead, and designing intellectual algorithms to adapt to diverse environmental factors. We build a low-cost PyroSense prototype using off-the-shelf hardware components. The experimental findings indicate that PyroSense not only attains a classification accuracy of 99.46% across 15 classes, but it also registers a mean joint distance error of less than 16 cm for 14 body joints for posture reconstruction in challenging environments.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"13 5","pages":"1 - 32"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentao Xie, Huangxun Chen, Jing Wei, Jin Zhang, Qian Zhang
Smart eyewear's interaction mode has attracted significant research attention. While most commercial devices have adopted touch panels situated on the temple front of eyeglasses for interaction, this paper identifies a drawback stemming from the unparalleled plane between the touch panel and the display, which disrupts the direct mapping between gestures and the manipulated objects on display. Therefore, this paper proposes RimSense, a proof-of-concept design for smart eyewear, to introduce an alternative realm for interaction - touch gestures on eyewear rim. RimSense leverages piezoelectric (PZT) transducers to convert the eyeglass rim into a touch-sensitive surface. When users touch the rim, the alteration in the eyeglass's structural signal manifests its effect into a channel frequency response (CFR). This allows RimSense to recognize the executed touch gestures based on the collected CFR patterns. Technically, we employ a buffered chirp as the probe signal to fulfil the sensing granularity and noise resistance requirements. Additionally, we present a deep learning-based gesture recognition framework tailored for fine-grained time sequence prediction and further integrated with a Finite-State Machine (FSM) algorithm for event-level prediction to suit the interaction experience for gestures of varying durations. We implement a functional eyewear prototype with two commercial PZT transducers. RimSense can recognize eight touch gestures on the eyeglass rim and estimate gesture durations simultaneously, allowing gestures of varying lengths to serve as distinct inputs. We evaluate the performance of RimSense on 30 subjects and show that it can sense eight gestures and an additional negative class with an F1-score of 0.95 and a relative duration estimation error of 11%. We further make the system work in real-time and conduct a user study on 14 subjects to assess the practicability of RimSense through interactions with two demo applications. The user study demonstrates RimSense's good performance, high usability, learnability and enjoyability. Additionally, we conduct interviews with the subjects, and their comments provide valuable insight for future eyewear design.
{"title":"RimSense","authors":"Wentao Xie, Huangxun Chen, Jing Wei, Jin Zhang, Qian Zhang","doi":"10.1145/3631456","DOIUrl":"https://doi.org/10.1145/3631456","url":null,"abstract":"Smart eyewear's interaction mode has attracted significant research attention. While most commercial devices have adopted touch panels situated on the temple front of eyeglasses for interaction, this paper identifies a drawback stemming from the unparalleled plane between the touch panel and the display, which disrupts the direct mapping between gestures and the manipulated objects on display. Therefore, this paper proposes RimSense, a proof-of-concept design for smart eyewear, to introduce an alternative realm for interaction - touch gestures on eyewear rim. RimSense leverages piezoelectric (PZT) transducers to convert the eyeglass rim into a touch-sensitive surface. When users touch the rim, the alteration in the eyeglass's structural signal manifests its effect into a channel frequency response (CFR). This allows RimSense to recognize the executed touch gestures based on the collected CFR patterns. Technically, we employ a buffered chirp as the probe signal to fulfil the sensing granularity and noise resistance requirements. Additionally, we present a deep learning-based gesture recognition framework tailored for fine-grained time sequence prediction and further integrated with a Finite-State Machine (FSM) algorithm for event-level prediction to suit the interaction experience for gestures of varying durations. We implement a functional eyewear prototype with two commercial PZT transducers. RimSense can recognize eight touch gestures on the eyeglass rim and estimate gesture durations simultaneously, allowing gestures of varying lengths to serve as distinct inputs. We evaluate the performance of RimSense on 30 subjects and show that it can sense eight gestures and an additional negative class with an F1-score of 0.95 and a relative duration estimation error of 11%. We further make the system work in real-time and conduct a user study on 14 subjects to assess the practicability of RimSense through interactions with two demo applications. The user study demonstrates RimSense's good performance, high usability, learnability and enjoyability. Additionally, we conduct interviews with the subjects, and their comments provide valuable insight for future eyewear design.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"12 9","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong-Sig Kang, Eunsu Baek, S. Son, Youngki Lee, Taesik Gong, Hyung-Sin Kim
We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5× better accuracy with 20.1× faster video conversion and 16.9× less energy consumption.
{"title":"MIRROR","authors":"Dong-Sig Kang, Eunsu Baek, S. Son, Youngki Lee, Taesik Gong, Hyung-Sin Kim","doi":"10.1145/3631420","DOIUrl":"https://doi.org/10.1145/3631420","url":null,"abstract":"We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5× better accuracy with 20.1× faster video conversion and 16.9× less energy consumption.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 51","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meagan B. Loerakker, Jasmin Niess, Marit Bentvelzen, Paweł W. Woźniak
Wearable personal trackers offer exciting opportunities to contribute to one's well-being, but they also can foster negative experiences. It remains a challenge to understand how we can design personal informatics experiences that help users frame their data in a positive manner and foster self-compassion. To explore this, we conducted a study where we compared different visualisations for user-generated screen time data. We examined positive, neutral and negative framings of the data and whether or not a point of reference was provided in a visualisation. The results show that framing techniques have a significant effect on reflection, rumination and self-compassion. We contribute insights into what design features of data representations can support positive experiences in personal informatics.
{"title":"Designing Data Visualisations for Self-Compassion in Personal Informatics","authors":"Meagan B. Loerakker, Jasmin Niess, Marit Bentvelzen, Paweł W. Woźniak","doi":"10.1145/3631448","DOIUrl":"https://doi.org/10.1145/3631448","url":null,"abstract":"Wearable personal trackers offer exciting opportunities to contribute to one's well-being, but they also can foster negative experiences. It remains a challenge to understand how we can design personal informatics experiences that help users frame their data in a positive manner and foster self-compassion. To explore this, we conducted a study where we compared different visualisations for user-generated screen time data. We examined positive, neutral and negative framings of the data and whether or not a point of reference was provided in a visualisation. The results show that framing techniques have a significant effect on reflection, rumination and self-compassion. We contribute insights into what design features of data representations can support positive experiences in personal informatics.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"3 8","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunpeng Song, Yiheng Bian, Xiaorui Wang, Zhongmin Cai
Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans.
{"title":"Learning from User-driven Events to Generate Automation Sequences","authors":"Yunpeng Song, Yiheng Bian, Xiaorui Wang, Zhongmin Cai","doi":"10.1145/3631427","DOIUrl":"https://doi.org/10.1145/3631427","url":null,"abstract":"Enabling smart devices to learn automating actions as expected is a crucial yet challenging task. The traditional Trigger-Action rule approach for device automation is prone to ambiguity in complex scenarios. To address this issue, we propose a data-driven approach that leverages recorded user-driven event sequences to predict potential actions users may take and generate fine-grained device automation sequences. Our key intuition is that user-driven event sequences, like human-written articles and programs, are governed by consistent semantic contexts and contain regularities that can be modeled to generate sequences that express the user's preferences. We introduce ASGen, a deep learning framework that combines sequential information, event attributes, and external knowledge to form the event representation and output sequences of arbitrary length to facilitate automation. To evaluate our approach from both quantitative and qualitative perspectives, we conduct two studies using a realistic dataset containing over 4.4 million events. Our results show that our approach surpasses other methods by providing more accurate recommendations. And the automation sequences generated by our model are perceived as equally or even more rational and useful compared to those generated by humans.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 4","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139437872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}