Although deep learning holds the promise of novel and impactful interfaces, realizing such promise in practice remains a challenge: since dataset-driven deep-learned models assume a one-time human input, there is no recourse when they do not understand the input provided by the user. Works that address this via deferred inference—soliciting additional human input when uncertain—show meaningful improvement, but ignore key aspects of how users and models interact. In this work, we focus on the role of users in deferred inference and argue that the deferral criteria should be a function of the user and model as a team, not simply the model itself. In support of this, we introduce a novel mathematical formulation, validate it via an experiment analyzing the interactions of 25 individuals with a deep learning-based visiolinguistic model, and identify user-specific dependencies that are under-explored in prior work. We conclude by demonstrating two human-centered procedures for setting deferral criteria that are simple to implement, applicable to a wide variety of tasks, and perform equal to or better than equivalent procedures that use much larger datasets.
{"title":"Human-Centered Deferred Inference: Measuring User Interactions and Setting Deferral Criteria for Human-AI Teams","authors":"Stephan J. Lemmer, Anhong Guo, Jason J. Corso","doi":"10.1145/3581641.3584092","DOIUrl":"https://doi.org/10.1145/3581641.3584092","url":null,"abstract":"Although deep learning holds the promise of novel and impactful interfaces, realizing such promise in practice remains a challenge: since dataset-driven deep-learned models assume a one-time human input, there is no recourse when they do not understand the input provided by the user. Works that address this via deferred inference—soliciting additional human input when uncertain—show meaningful improvement, but ignore key aspects of how users and models interact. In this work, we focus on the role of users in deferred inference and argue that the deferral criteria should be a function of the user and model as a team, not simply the model itself. In support of this, we introduce a novel mathematical formulation, validate it via an experiment analyzing the interactions of 25 individuals with a deep learning-based visiolinguistic model, and identify user-specific dependencies that are under-explored in prior work. We conclude by demonstrating two human-centered procedures for setting deferral criteria that are simple to implement, applicable to a wide variety of tasks, and perform equal to or better than equivalent procedures that use much larger datasets.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114910997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maozheng Zhao, Alec M Pierce, Ran Tan, Ting Zhang, Tianyi Wang, Tanya R. Jonker, Hrvoje Benko, Aakar Gupta
Mid-air text input in augmented or virtual reality (AR/VR) is an open problem. One proposed solution is gesture typing where the user performs a gesture trace over the keyboard. However, this requires the user to move their hands precisely and continuously, potentially causing arm fatigue. With eye tracking available on AR/VR devices, multiple works have proposed gaze-driven gesture typing techniques. However, such techniques require the explicit use of gaze which are prone to Midas touch problems, conflicting with other gaze activities in the same moment. In this work, the user is not made aware that their gaze is being used to improve the interaction, making the use of gaze completely implicit. We observed that a user’s implicit gaze fixation location during gesture typing is usually the gesture cursor’s target location if the gesture cursor is moving toward it. Based on this observation, we propose the Speedup method in which we speed up the gesture cursor toward the user’s gaze fixation location, the speedup rate depends on how well the gesture cursor’s moving direction aligns with the gaze fixation. To reduce the overshooting near the target in the Speedup method, we further proposed the Gaussian Speedup method in which the speedup rate is dynamically reduced with a Gaussian function when the gesture cursor gets nearer to the gaze fixation. Using a wrist IMU as input, a 12-person study demonstrated that the Speedup method and Gaussian Speedup method reduced users’ hand movement by and respectively without any loss of typing speed or accuracy.
{"title":"Gaze Speedup: Eye Gaze Assisted Gesture Typing in Virtual Reality","authors":"Maozheng Zhao, Alec M Pierce, Ran Tan, Ting Zhang, Tianyi Wang, Tanya R. Jonker, Hrvoje Benko, Aakar Gupta","doi":"10.1145/3581641.3584072","DOIUrl":"https://doi.org/10.1145/3581641.3584072","url":null,"abstract":"Mid-air text input in augmented or virtual reality (AR/VR) is an open problem. One proposed solution is gesture typing where the user performs a gesture trace over the keyboard. However, this requires the user to move their hands precisely and continuously, potentially causing arm fatigue. With eye tracking available on AR/VR devices, multiple works have proposed gaze-driven gesture typing techniques. However, such techniques require the explicit use of gaze which are prone to Midas touch problems, conflicting with other gaze activities in the same moment. In this work, the user is not made aware that their gaze is being used to improve the interaction, making the use of gaze completely implicit. We observed that a user’s implicit gaze fixation location during gesture typing is usually the gesture cursor’s target location if the gesture cursor is moving toward it. Based on this observation, we propose the Speedup method in which we speed up the gesture cursor toward the user’s gaze fixation location, the speedup rate depends on how well the gesture cursor’s moving direction aligns with the gaze fixation. To reduce the overshooting near the target in the Speedup method, we further proposed the Gaussian Speedup method in which the speedup rate is dynamically reduced with a Gaussian function when the gesture cursor gets nearer to the gaze fixation. Using a wrist IMU as input, a 12-person study demonstrated that the Speedup method and Gaussian Speedup method reduced users’ hand movement by and respectively without any loss of typing speed or accuracy.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129284108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many situations, it may be impractical or impossible to enter text by selecting precise locations on a physical or touchscreen keyboard. We present an ambiguous keyboard with four character groups that has potential applications for eyes-free text entry, as well as text entry using a single switch or a brain-computer interface. We develop a procedure for optimizing these character groupings based on a disambiguation algorithm that leverages a long-span language model. We produce both alphabetically-constrained and unconstrained character groups in an offline optimization experiment and compare them in a longitudinal user study. Our results did not show a significant difference between the constrained and unconstrained character groups after four hours of practice. As expected, participants had significantly more errors with the unconstrained groups in the first session, suggesting a higher barrier to learning the technique. We therefore recommend the alphabetically-constrained character groups, where participants were able to achieve an average entry rate of 12.0 words per minute with a 2.03% character error rate using a single hand and with no visual feedback.
{"title":"FlexType: Flexible Text Input with a Small Set of Input Gestures","authors":"Dylan Gaines, Mackenzie M Baker, K. Vertanen","doi":"10.1145/3581641.3584077","DOIUrl":"https://doi.org/10.1145/3581641.3584077","url":null,"abstract":"In many situations, it may be impractical or impossible to enter text by selecting precise locations on a physical or touchscreen keyboard. We present an ambiguous keyboard with four character groups that has potential applications for eyes-free text entry, as well as text entry using a single switch or a brain-computer interface. We develop a procedure for optimizing these character groupings based on a disambiguation algorithm that leverages a long-span language model. We produce both alphabetically-constrained and unconstrained character groups in an offline optimization experiment and compare them in a longitudinal user study. Our results did not show a significant difference between the constrained and unconstrained character groups after four hours of practice. As expected, participants had significantly more errors with the unconstrained groups in the first session, suggesting a higher barrier to learning the technique. We therefore recommend the alphabetically-constrained character groups, where participants were able to achieve an average entry rate of 12.0 words per minute with a 2.03% character error rate using a single hand and with no visual feedback.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomas Lawton, F. Ibarrola, Dan Ventura, Kazjon Grace
Over the past few years, rapid developments in AI have resulted in new models capable of generating high-quality images and creative artefacts, most of which seek to fully automate the process of creation. In stark contrast, creative professionals rely on iteration—to change their mind, to modify their sketches, and to re-imagine. For that reason, end-to-end generative approaches limit application to real-world design workflows. We present a novel human-AI drawing interface called Reframer, along with a new survey instrument for evaluating co-creative systems. Based on a co-creative drawing model called the Collaborative, Interactive Context-Aware Design Agent (CICADA), Reframer uses CLIP-guided synthesis-by-optimisation to support real-time synchronous drawing with AI. We present two versions of Reframer’s interface, one that prioritises emergence and system agency and the other control and user agency. To begin exploring how these different interaction models might influence the user experience, we also propose the Mixed-Initiative Creativity Support Index (MICSI). MICSI rates co-creative systems along experiential axes relevant to AI co-creation. We administer MICSI and a short qualitative interview to users who engaged with the Reframer variants on two distinct creative tasks. The results show overall broad efficacy of Reframer as a creativity support tool, but MICSI also allows us to begin unpacking the complex interactions between learning effects, task type, visibility, control, and emergent behaviour. We conclude with a discussion of how these findings highlight challenges for future co-creative systems design.
{"title":"Drawing with Reframer: Emergence and Control in Co-Creative AI","authors":"Tomas Lawton, F. Ibarrola, Dan Ventura, Kazjon Grace","doi":"10.1145/3581641.3584095","DOIUrl":"https://doi.org/10.1145/3581641.3584095","url":null,"abstract":"Over the past few years, rapid developments in AI have resulted in new models capable of generating high-quality images and creative artefacts, most of which seek to fully automate the process of creation. In stark contrast, creative professionals rely on iteration—to change their mind, to modify their sketches, and to re-imagine. For that reason, end-to-end generative approaches limit application to real-world design workflows. We present a novel human-AI drawing interface called Reframer, along with a new survey instrument for evaluating co-creative systems. Based on a co-creative drawing model called the Collaborative, Interactive Context-Aware Design Agent (CICADA), Reframer uses CLIP-guided synthesis-by-optimisation to support real-time synchronous drawing with AI. We present two versions of Reframer’s interface, one that prioritises emergence and system agency and the other control and user agency. To begin exploring how these different interaction models might influence the user experience, we also propose the Mixed-Initiative Creativity Support Index (MICSI). MICSI rates co-creative systems along experiential axes relevant to AI co-creation. We administer MICSI and a short qualitative interview to users who engaged with the Reframer variants on two distinct creative tasks. The results show overall broad efficacy of Reframer as a creativity support tool, but MICSI also allows us to begin unpacking the complex interactions between learning effects, task type, visibility, control, and emergent behaviour. We conclude with a discussion of how these findings highlight challenges for future co-creative systems design.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115355726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takumi Yamamoto, Katsutoshi Masai, A. Withana, Yuta Sugiura
Embedding technology into day-to-day wearables and creating smart devices such as smartwatches and smart-glasses has been a growing area of interest. In this paper, we explore the interaction around face masks, a common accessory worn by many to prevent the spread of infectious diseases. Particularly, we propose a method of using the straps of a face mask as an input medium. We identified a set of plausible gestures on mask straps through an elicitation study (N = 20), in which the participants proposed different gestures for a given referent. We then developed a prototype to identify the gestures performed on the mask straps and present the recognition accuracy from a user study with eight participants. Our results show the system achieves 93.07% classification accuracy for 12 gestures.
{"title":"Masktrap: Designing and Identifying Gestures to Transform Mask Strap into an Input Interface","authors":"Takumi Yamamoto, Katsutoshi Masai, A. Withana, Yuta Sugiura","doi":"10.1145/3581641.3584062","DOIUrl":"https://doi.org/10.1145/3581641.3584062","url":null,"abstract":"Embedding technology into day-to-day wearables and creating smart devices such as smartwatches and smart-glasses has been a growing area of interest. In this paper, we explore the interaction around face masks, a common accessory worn by many to prevent the spread of infectious diseases. Particularly, we propose a method of using the straps of a face mask as an input medium. We identified a set of plausible gestures on mask straps through an elicitation study (N = 20), in which the participants proposed different gestures for a given referent. We then developed a prototype to identify the gestures performed on the mask straps and present the recognition accuracy from a user study with eight participants. Our results show the system achieves 93.07% classification accuracy for 12 gestures.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124270454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lena Hegemann, N. Dayama, Abhishek Iyer, Erfan Farhadi, Ekaterina Marchenko, A. Oulasvirta
Choosing colors is a pivotal but challenging component of graphic design. The paper presents an intelligent interaction technique supporting designers’ creativity in color design. It fills a gap in the literature by proposing an integrated technique for color exploration, assignment, and refinement: CoColor. Our design goals were 1) let designers focus on color choice by freeing them from pixel-level editing and 2) support rapid flow between low- and high-level decisions. Our interaction technique utilizes three steps – choice of focus, choice of suitable colors, and the colors’ application to designs – wherein the choices are interlinked and computer-assisted, thus supporting divergent and convergent thinking. It considers color harmony, visual saliency, and elementary accessibility requirements. The technique was incorporated into the popular design tool Figma and evaluated in a study with 16 designers. Participants explored the coloring options more easily with CoColor and considered it helpful.
{"title":"CoColor: Interactive Exploration of Color Designs","authors":"Lena Hegemann, N. Dayama, Abhishek Iyer, Erfan Farhadi, Ekaterina Marchenko, A. Oulasvirta","doi":"10.1145/3581641.3584089","DOIUrl":"https://doi.org/10.1145/3581641.3584089","url":null,"abstract":"Choosing colors is a pivotal but challenging component of graphic design. The paper presents an intelligent interaction technique supporting designers’ creativity in color design. It fills a gap in the literature by proposing an integrated technique for color exploration, assignment, and refinement: CoColor. Our design goals were 1) let designers focus on color choice by freeing them from pixel-level editing and 2) support rapid flow between low- and high-level decisions. Our interaction technique utilizes three steps – choice of focus, choice of suitable colors, and the colors’ application to designs – wherein the choices are interlinked and computer-assisted, thus supporting divergent and convergent thinking. It considers color harmony, visual saliency, and elementary accessibility requirements. The technique was incorporated into the popular design tool Figma and evaluated in a study with 16 designers. Participants explored the coloring options more easily with CoColor and considered it helpful.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128617786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The most popular way to assess talent around the world is through interviews. Interviewers contribute substantially to candidate experience in many organizations' hiring strategies. There is a lack of comprehensive understanding of what makes for a good interview experience and how interviewers can conduct candidate-centric interviews. An exploratory study with 123 candidates revealed critical metrics about interviewer behavior that affects candidate experience. These metrics informed the design of our AI-driven SmartView system that provides automated post-interview feedback to Interviewers. Real-world deployment of the system was conducted for three weeks with 35 interviewers. According to our study, most interviewers found that SmartView insights helped identify areas for improvement and could assist them in improving their interviewing skills.
{"title":"Interviewing the Interviewer: AI-generated Insights to Help Conduct Candidate-centric Interviews","authors":"Kuldeep Yadav, Animesh Seemendra, A. Singhania, Sagar Bora, Pratyaksh Dubey, Varun Aggarwal","doi":"10.1145/3581641.3584051","DOIUrl":"https://doi.org/10.1145/3581641.3584051","url":null,"abstract":"The most popular way to assess talent around the world is through interviews. Interviewers contribute substantially to candidate experience in many organizations' hiring strategies. There is a lack of comprehensive understanding of what makes for a good interview experience and how interviewers can conduct candidate-centric interviews. An exploratory study with 123 candidates revealed critical metrics about interviewer behavior that affects candidate experience. These metrics informed the design of our AI-driven SmartView system that provides automated post-interview feedback to Interviewers. Real-world deployment of the system was conducted for three weeks with 35 interviewers. According to our study, most interviewers found that SmartView insights helped identify areas for improvement and could assist them in improving their interviewing skills.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132703245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Che-Jui Chang, Samuel S. Sohn, Sen Zhang, R. Jayashankar, Muhammad Usman, M. Kapadia
Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for each modality that are as expressive as real human behaviors. The second challenge is that the affects are modeled independently, which makes it difficult to generate multimodal responses with consistent emotions across all modalities. In this work, we propose a conceptual framework, ACTOR (Affect-Consistent mulTimodal behaviOR generation), that aims to increase the perception of affects by generating multimodal behaviors conditioned on a consistent driving affect. We have conducted a user study with 199 participants to assess how the average person judges the affects perceived from multimodal behaviors that are consistent and inconsistent with respect to a driving affect. The result shows that among all model conditions, our affect-consistent framework receives the highest Likert scores for the perception of driving affects. Our statistical analysis suggests that making a modality affect-inconsistent significantly decreases the perception of driving affects. We also observe that multimodal behaviors conditioned on consistent affects are more expressive compared to behaviors with inconsistent affects. Therefore, we conclude that multimodal emotion conditioning and affect consistency are vital to enhancing the perception of affects for embodied conversational agents.
{"title":"The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents","authors":"Che-Jui Chang, Samuel S. Sohn, Sen Zhang, R. Jayashankar, Muhammad Usman, M. Kapadia","doi":"10.1145/3581641.3584045","DOIUrl":"https://doi.org/10.1145/3581641.3584045","url":null,"abstract":"Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for each modality that are as expressive as real human behaviors. The second challenge is that the affects are modeled independently, which makes it difficult to generate multimodal responses with consistent emotions across all modalities. In this work, we propose a conceptual framework, ACTOR (Affect-Consistent mulTimodal behaviOR generation), that aims to increase the perception of affects by generating multimodal behaviors conditioned on a consistent driving affect. We have conducted a user study with 199 participants to assess how the average person judges the affects perceived from multimodal behaviors that are consistent and inconsistent with respect to a driving affect. The result shows that among all model conditions, our affect-consistent framework receives the highest Likert scores for the perception of driving affects. Our statistical analysis suggests that making a modality affect-inconsistent significantly decreases the perception of driving affects. We also observe that multimodal behaviors conditioned on consistent affects are more expressive compared to behaviors with inconsistent affects. Therefore, we conclude that multimodal emotion conditioning and affect consistency are vital to enhancing the perception of affects for embodied conversational agents.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130810619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaozhu Hu, Yanwen Huang, Bo Liu, Ruolan Wu, Yongquan Hu, A. Quigley, Mingming Fan, Chun Yu, Yuanchun Shi
This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.
{"title":"SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks","authors":"Xiaozhu Hu, Yanwen Huang, Bo Liu, Ruolan Wu, Yongquan Hu, A. Quigley, Mingming Fan, Chun Yu, Yuanchun Shi","doi":"10.1145/3581641.3584069","DOIUrl":"https://doi.org/10.1145/3581641.3584069","url":null,"abstract":"This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133008310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eye-typing is a slow and cumbersome text entry method typically used by individuals with no other practical means of communication. As an alternative, prior HCI research has proposed dwell-free eye-typing as a potential improvement that eliminates time-consuming and distracting dwell-timeouts. However, it is rare that such research ideas are translated into working products. This paper reports on a qualitative deployment study of a product that was developed to allow users access to a dwell-free eye-typing research solution. This allowed us to understand how such a research solution would work in practice, as part of users’ current communication solutions in their own homes. Based on interviews and observations, we discuss a number of design issues that currently act as barriers preventing widespread adoption of dwell-free eye-typing. The study findings are complemented with computational simulations in a range of conditions that were inspired by the findings in the deployment study. These simulations serve to both contextualize the qualitative findings and to explore quantitative implications of possible interface redesigns. The combined analysis gives rise to a set of design implications for enabling wider adoption of dwell-free eye-typing in practice.
{"title":"Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational Simulations","authors":"P. Kristensson, Morten Mjelde, K. Vertanen","doi":"10.1145/3581641.3584093","DOIUrl":"https://doi.org/10.1145/3581641.3584093","url":null,"abstract":"Eye-typing is a slow and cumbersome text entry method typically used by individuals with no other practical means of communication. As an alternative, prior HCI research has proposed dwell-free eye-typing as a potential improvement that eliminates time-consuming and distracting dwell-timeouts. However, it is rare that such research ideas are translated into working products. This paper reports on a qualitative deployment study of a product that was developed to allow users access to a dwell-free eye-typing research solution. This allowed us to understand how such a research solution would work in practice, as part of users’ current communication solutions in their own homes. Based on interviews and observations, we discuss a number of design issues that currently act as barriers preventing widespread adoption of dwell-free eye-typing. The study findings are complemented with computational simulations in a range of conditions that were inspired by the findings in the deployment study. These simulations serve to both contextualize the qualitative findings and to explore quantitative implications of possible interface redesigns. The combined analysis gives rise to a set of design implications for enabling wider adoption of dwell-free eye-typing in practice.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"24 volume 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125147602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}