Proceedings of the 28th International Conference on Intelligent User Interfaces最新文献_第4页

Human-Centered Deferred Inference: Measuring User Interactions and Setting Deferral Criteria for Human-AI Teams 以人为中心的延迟推理:衡量用户交互并为人类-人工智能团队设置延迟标准

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584092

Stephan J. Lemmer, Anhong Guo, Jason J. Corso

Although deep learning holds the promise of novel and impactful interfaces, realizing such promise in practice remains a challenge: since dataset-driven deep-learned models assume a one-time human input, there is no recourse when they do not understand the input provided by the user. Works that address this via deferred inference—soliciting additional human input when uncertain—show meaningful improvement, but ignore key aspects of how users and models interact. In this work, we focus on the role of users in deferred inference and argue that the deferral criteria should be a function of the user and model as a team, not simply the model itself. In support of this, we introduce a novel mathematical formulation, validate it via an experiment analyzing the interactions of 25 individuals with a deep learning-based visiolinguistic model, and identify user-specific dependencies that are under-explored in prior work. We conclude by demonstrating two human-centered procedures for setting deferral criteria that are simple to implement, applicable to a wide variety of tasks, and perform equal to or better than equivalent procedures that use much larger datasets.

尽管深度学习拥有新颖和有影响力的界面的承诺，但在实践中实现这样的承诺仍然是一个挑战:由于数据集驱动的深度学习模型假设一次性的人工输入，当它们不理解用户提供的输入时，就没有追索权。通过延迟推理解决这个问题的作品——在不确定时请求额外的人工输入——显示出有意义的改进，但忽略了用户和模型如何交互的关键方面。在这项工作中，我们关注用户在延迟推理中的作用，并认为延迟标准应该是用户和模型作为一个团队的功能，而不仅仅是模型本身。为了支持这一点，我们引入了一个新的数学公式，通过一个实验来验证它，该实验分析了25个人与基于深度学习的视觉语言学模型的相互作用，并确定了之前工作中未充分探索的用户特定依赖关系。最后，我们展示了两个以人为中心的过程，用于设置延迟标准，这两个过程易于实现，适用于各种任务，并且执行与使用更大数据集的等效过程相同或更好。

{"title":"Human-Centered Deferred Inference: Measuring User Interactions and Setting Deferral Criteria for Human-AI Teams","authors":"Stephan J. Lemmer, Anhong Guo, Jason J. Corso","doi":"10.1145/3581641.3584092","DOIUrl":"https://doi.org/10.1145/3581641.3584092","url":null,"abstract":"Although deep learning holds the promise of novel and impactful interfaces, realizing such promise in practice remains a challenge: since dataset-driven deep-learned models assume a one-time human input, there is no recourse when they do not understand the input provided by the user. Works that address this via deferred inference—soliciting additional human input when uncertain—show meaningful improvement, but ignore key aspects of how users and models interact. In this work, we focus on the role of users in deferred inference and argue that the deferral criteria should be a function of the user and model as a team, not simply the model itself. In support of this, we introduce a novel mathematical formulation, validate it via an experiment analyzing the interactions of 25 individuals with a deep learning-based visiolinguistic model, and identify user-specific dependencies that are under-explored in prior work. We conclude by demonstrating two human-centered procedures for setting deferral criteria that are simple to implement, applicable to a wide variety of tasks, and perform equal to or better than equivalent procedures that use much larger datasets.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114910997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gaze Speedup: Eye Gaze Assisted Gesture Typing in Virtual Reality 注视加速:虚拟现实中的眼睛注视辅助手势输入

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584072

Maozheng Zhao, Alec M Pierce, Ran Tan, Ting Zhang, Tianyi Wang, Tanya R. Jonker, Hrvoje Benko, Aakar Gupta

Mid-air text input in augmented or virtual reality (AR/VR) is an open problem. One proposed solution is gesture typing where the user performs a gesture trace over the keyboard. However, this requires the user to move their hands precisely and continuously, potentially causing arm fatigue. With eye tracking available on AR/VR devices, multiple works have proposed gaze-driven gesture typing techniques. However, such techniques require the explicit use of gaze which are prone to Midas touch problems, conflicting with other gaze activities in the same moment. In this work, the user is not made aware that their gaze is being used to improve the interaction, making the use of gaze completely implicit. We observed that a user’s implicit gaze fixation location during gesture typing is usually the gesture cursor’s target location if the gesture cursor is moving toward it. Based on this observation, we propose the Speedup method in which we speed up the gesture cursor toward the user’s gaze fixation location, the speedup rate depends on how well the gesture cursor’s moving direction aligns with the gaze fixation. To reduce the overshooting near the target in the Speedup method, we further proposed the Gaussian Speedup method in which the speedup rate is dynamically reduced with a Gaussian function when the gesture cursor gets nearer to the gaze fixation. Using a wrist IMU as input, a 12-person study demonstrated that the Speedup method and Gaussian Speedup method reduced users’ hand movement by and respectively without any loss of typing speed or accuracy.

增强现实或虚拟现实(AR/VR)中的空中文本输入是一个悬而未决的问题。一个建议的解决方案是手势输入，用户在键盘上执行手势跟踪。然而，这需要用户精确和连续地移动他们的手，可能会导致手臂疲劳。随着AR/VR设备上可用的眼动追踪，许多作品提出了目光驱动的手势输入技术。然而，这种技术需要明确地使用凝视，这很容易出现迈达斯触摸问题，与同一时刻的其他凝视活动相冲突。在这项工作中，用户并没有意识到他们的目光被用来改善交互，使得目光的使用完全是隐性的。我们观察到，在手势输入过程中，用户的内隐注视位置通常是手势光标的目标位置，如果手势光标正在向它移动。在此基础上，我们提出了加速移动手势光标到用户注视位置的方法，加速速度取决于手势光标的移动方向与注视的对齐程度。为了减少加速方法中目标附近的超调，我们进一步提出了高斯加速方法，即当手势光标靠近注视注视点时，用高斯函数动态降低加速速率。使用手腕IMU作为输入，一项12人的研究表明，加速方法和高斯加速方法分别减少了用户的手部运动，而没有损失输入速度或准确性。

{"title":"Gaze Speedup: Eye Gaze Assisted Gesture Typing in Virtual Reality","authors":"Maozheng Zhao, Alec M Pierce, Ran Tan, Ting Zhang, Tianyi Wang, Tanya R. Jonker, Hrvoje Benko, Aakar Gupta","doi":"10.1145/3581641.3584072","DOIUrl":"https://doi.org/10.1145/3581641.3584072","url":null,"abstract":"Mid-air text input in augmented or virtual reality (AR/VR) is an open problem. One proposed solution is gesture typing where the user performs a gesture trace over the keyboard. However, this requires the user to move their hands precisely and continuously, potentially causing arm fatigue. With eye tracking available on AR/VR devices, multiple works have proposed gaze-driven gesture typing techniques. However, such techniques require the explicit use of gaze which are prone to Midas touch problems, conflicting with other gaze activities in the same moment. In this work, the user is not made aware that their gaze is being used to improve the interaction, making the use of gaze completely implicit. We observed that a user’s implicit gaze fixation location during gesture typing is usually the gesture cursor’s target location if the gesture cursor is moving toward it. Based on this observation, we propose the Speedup method in which we speed up the gesture cursor toward the user’s gaze fixation location, the speedup rate depends on how well the gesture cursor’s moving direction aligns with the gaze fixation. To reduce the overshooting near the target in the Speedup method, we further proposed the Gaussian Speedup method in which the speedup rate is dynamically reduced with a Gaussian function when the gesture cursor gets nearer to the gaze fixation. Using a wrist IMU as input, a 12-person study demonstrated that the Speedup method and Gaussian Speedup method reduced users’ hand movement by and respectively without any loss of typing speed or accuracy.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129284108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

FlexType: Flexible Text Input with a Small Set of Input Gestures 灵活的文本输入与一个小的输入手势集

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584077

Dylan Gaines, Mackenzie M Baker, K. Vertanen

In many situations, it may be impractical or impossible to enter text by selecting precise locations on a physical or touchscreen keyboard. We present an ambiguous keyboard with four character groups that has potential applications for eyes-free text entry, as well as text entry using a single switch or a brain-computer interface. We develop a procedure for optimizing these character groupings based on a disambiguation algorithm that leverages a long-span language model. We produce both alphabetically-constrained and unconstrained character groups in an offline optimization experiment and compare them in a longitudinal user study. Our results did not show a significant difference between the constrained and unconstrained character groups after four hours of practice. As expected, participants had significantly more errors with the unconstrained groups in the first session, suggesting a higher barrier to learning the technique. We therefore recommend the alphabetically-constrained character groups, where participants were able to achieve an average entry rate of 12.0 words per minute with a 2.03% character error rate using a single hand and with no visual feedback.

在许多情况下，通过在物理键盘或触摸屏键盘上选择精确位置来输入文本可能不切实际或不可能。我们提出了一种具有四个字符组的模糊键盘，它具有潜在的应用于无眼文本输入，以及使用单个开关或脑机接口的文本输入。我们开发了一种基于消歧算法的程序来优化这些字符分组，该算法利用了长跨度语言模型。我们在离线优化实验中生成了字母约束和无约束的字符组，并在纵向用户研究中对它们进行了比较。经过四个小时的练习，我们的结果没有显示出受约束和不受约束字符组之间的显著差异。正如预期的那样，在第一阶段，参与者在不受约束的小组中出现了更多的错误，这表明学习这项技术的障碍更高。因此，我们推荐按字母顺序限制的字符组，参与者可以在没有视觉反馈的情况下单手输入12.0个单词，平均每分钟输入2.03%的字符错误率。

{"title":"FlexType: Flexible Text Input with a Small Set of Input Gestures","authors":"Dylan Gaines, Mackenzie M Baker, K. Vertanen","doi":"10.1145/3581641.3584077","DOIUrl":"https://doi.org/10.1145/3581641.3584077","url":null,"abstract":"In many situations, it may be impractical or impossible to enter text by selecting precise locations on a physical or touchscreen keyboard. We present an ambiguous keyboard with four character groups that has potential applications for eyes-free text entry, as well as text entry using a single switch or a brain-computer interface. We develop a procedure for optimizing these character groupings based on a disambiguation algorithm that leverages a long-span language model. We produce both alphabetically-constrained and unconstrained character groups in an offline optimization experiment and compare them in a longitudinal user study. Our results did not show a significant difference between the constrained and unconstrained character groups after four hours of practice. As expected, participants had significantly more errors with the unconstrained groups in the first session, suggesting a higher barrier to learning the technique. We therefore recommend the alphabetically-constrained character groups, where participants were able to achieve an average entry rate of 12.0 words per minute with a 2.03% character error rate using a single hand and with no visual feedback.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Drawing with Reframer: Emergence and Control in Co-Creative AI 用重构器绘图:共同创造AI中的涌现和控制

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584095

Tomas Lawton, F. Ibarrola, Dan Ventura, Kazjon Grace

Over the past few years, rapid developments in AI have resulted in new models capable of generating high-quality images and creative artefacts, most of which seek to fully automate the process of creation. In stark contrast, creative professionals rely on iteration—to change their mind, to modify their sketches, and to re-imagine. For that reason, end-to-end generative approaches limit application to real-world design workflows. We present a novel human-AI drawing interface called Reframer, along with a new survey instrument for evaluating co-creative systems. Based on a co-creative drawing model called the Collaborative, Interactive Context-Aware Design Agent (CICADA), Reframer uses CLIP-guided synthesis-by-optimisation to support real-time synchronous drawing with AI. We present two versions of Reframer’s interface, one that prioritises emergence and system agency and the other control and user agency. To begin exploring how these different interaction models might influence the user experience, we also propose the Mixed-Initiative Creativity Support Index (MICSI). MICSI rates co-creative systems along experiential axes relevant to AI co-creation. We administer MICSI and a short qualitative interview to users who engaged with the Reframer variants on two distinct creative tasks. The results show overall broad efficacy of Reframer as a creativity support tool, but MICSI also allows us to begin unpacking the complex interactions between learning effects, task type, visibility, control, and emergent behaviour. We conclude with a discussion of how these findings highlight challenges for future co-creative systems design.

在过去的几年里，人工智能的快速发展导致了能够生成高质量图像和创造性人工制品的新模型，其中大多数都寻求将创作过程完全自动化。与之形成鲜明对比的是，创意专业人士依赖于迭代——改变他们的想法，修改他们的草图，重新想象。因此，端到端生成方法将应用限制在现实世界的设计工作流中。我们提出了一种新的人类-人工智能绘图界面，称为Reframer，以及一种用于评估共同创造系统的新调查工具。基于一种称为协作、交互上下文感知设计代理(CICADA)的共同创作绘图模型，Reframer使用clip引导的优化合成来支持与AI的实时同步绘图。我们提出了两个版本的Reframer的界面，一个优先考虑出现和系统代理，另一个优先考虑控制和用户代理。为了开始探索这些不同的交互模式如何影响用户体验，我们还提出了混合主动性创造力支持指数(MICSI)。MICSI沿着与人工智能共同创造相关的体验轴对共同创造系统进行评级。我们管理MICSI和一个简短的定性访谈的用户谁从事的重构变体在两个不同的创造性任务。结果显示，Reframer作为一种创造力支持工具的整体广泛功效，但MICSI也允许我们开始解开学习效果、任务类型、可见性、控制和紧急行为之间复杂的相互作用。最后，我们讨论了这些发现如何突出未来共同创造系统设计的挑战。

{"title":"Drawing with Reframer: Emergence and Control in Co-Creative AI","authors":"Tomas Lawton, F. Ibarrola, Dan Ventura, Kazjon Grace","doi":"10.1145/3581641.3584095","DOIUrl":"https://doi.org/10.1145/3581641.3584095","url":null,"abstract":"Over the past few years, rapid developments in AI have resulted in new models capable of generating high-quality images and creative artefacts, most of which seek to fully automate the process of creation. In stark contrast, creative professionals rely on iteration—to change their mind, to modify their sketches, and to re-imagine. For that reason, end-to-end generative approaches limit application to real-world design workflows. We present a novel human-AI drawing interface called Reframer, along with a new survey instrument for evaluating co-creative systems. Based on a co-creative drawing model called the Collaborative, Interactive Context-Aware Design Agent (CICADA), Reframer uses CLIP-guided synthesis-by-optimisation to support real-time synchronous drawing with AI. We present two versions of Reframer’s interface, one that prioritises emergence and system agency and the other control and user agency. To begin exploring how these different interaction models might influence the user experience, we also propose the Mixed-Initiative Creativity Support Index (MICSI). MICSI rates co-creative systems along experiential axes relevant to AI co-creation. We administer MICSI and a short qualitative interview to users who engaged with the Reframer variants on two distinct creative tasks. The results show overall broad efficacy of Reframer as a creativity support tool, but MICSI also allows us to begin unpacking the complex interactions between learning effects, task type, visibility, control, and emergent behaviour. We conclude with a discussion of how these findings highlight challenges for future co-creative systems design.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115355726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Masktrap: Designing and Identifying Gestures to Transform Mask Strap into an Input Interface Masktrap:设计和识别手势转换成一个输入界面的面具带

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584062

Takumi Yamamoto, Katsutoshi Masai, A. Withana, Yuta Sugiura

Embedding technology into day-to-day wearables and creating smart devices such as smartwatches and smart-glasses has been a growing area of interest. In this paper, we explore the interaction around face masks, a common accessory worn by many to prevent the spread of infectious diseases. Particularly, we propose a method of using the straps of a face mask as an input medium. We identified a set of plausible gestures on mask straps through an elicitation study (N = 20), in which the participants proposed different gestures for a given referent. We then developed a prototype to identify the gestures performed on the mask straps and present the recognition accuracy from a user study with eight participants. Our results show the system achieves 93.07% classification accuracy for 12 gestures.

将技术嵌入日常可穿戴设备，创造智能手表和智能眼镜等智能设备，一直是人们越来越感兴趣的领域。在本文中，我们探讨了口罩周围的互动，这是许多人为了防止传染病传播而佩戴的常见配件。特别是，我们提出了一种使用口罩带作为输入介质的方法。我们通过一项启发研究(N = 20)确定了一组貌似合理的面具绑带手势，在该研究中，参与者对给定的参考物提出了不同的手势。然后，我们开发了一个原型来识别在面具带上执行的手势，并从8名参与者的用户研究中呈现识别准确性。结果表明，该系统对12种手势的分类准确率达到了93.07%。

引用次数: 0

CoColor: Interactive Exploration of Color Designs CoColor:色彩设计的互动探索

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584089

Lena Hegemann, N. Dayama, Abhishek Iyer, Erfan Farhadi, Ekaterina Marchenko, A. Oulasvirta

Choosing colors is a pivotal but challenging component of graphic design. The paper presents an intelligent interaction technique supporting designers’ creativity in color design. It fills a gap in the literature by proposing an integrated technique for color exploration, assignment, and refinement: CoColor. Our design goals were 1) let designers focus on color choice by freeing them from pixel-level editing and 2) support rapid flow between low- and high-level decisions. Our interaction technique utilizes three steps – choice of focus, choice of suitable colors, and the colors’ application to designs – wherein the choices are interlinked and computer-assisted, thus supporting divergent and convergent thinking. It considers color harmony, visual saliency, and elementary accessibility requirements. The technique was incorporated into the popular design tool Figma and evaluated in a study with 16 designers. Participants explored the coloring options more easily with CoColor and considered it helpful.

选择颜色是平面设计中一个关键但具有挑战性的组成部分。提出了一种支持色彩设计创意的智能交互技术。它通过提出一种用于颜色探索、分配和改进的集成技术来填补文献中的空白:CoColor。我们的设计目标是:1)让设计师专注于颜色选择，让他们从像素级的编辑中解脱出来;2)支持低级和高级决策之间的快速流动。我们的交互技术采用三个步骤——选择焦点，选择合适的颜色，以及将颜色应用到设计中——其中的选择是相互关联的，并由计算机辅助，从而支持发散和收敛思维。它考虑了色彩和谐、视觉显著性和基本的可访问性要求。该技术被纳入流行的设计工具Figma，并在16位设计师的研究中进行了评估。参与者使用CoColor更容易探索颜色选择，并认为它很有帮助。

引用次数: 4

Interviewing the Interviewer: AI-generated Insights to Help Conduct Candidate-centric Interviews 面试官:人工智能生成的见解帮助进行以候选人为中心的面试

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584051

Kuldeep Yadav, Animesh Seemendra, A. Singhania, Sagar Bora, Pratyaksh Dubey, Varun Aggarwal

The most popular way to assess talent around the world is through interviews. Interviewers contribute substantially to candidate experience in many organizations' hiring strategies. There is a lack of comprehensive understanding of what makes for a good interview experience and how interviewers can conduct candidate-centric interviews. An exploratory study with 123 candidates revealed critical metrics about interviewer behavior that affects candidate experience. These metrics informed the design of our AI-driven SmartView system that provides automated post-interview feedback to Interviewers. Real-world deployment of the system was conducted for three weeks with 35 interviewers. According to our study, most interviewers found that SmartView insights helped identify areas for improvement and could assist them in improving their interviewing skills.

世界上最流行的评估人才的方式是通过面试。在许多组织的招聘策略中，面试官对候选人的经验贡献很大。对于怎样才能获得良好的面试体验，以及面试官如何进行以候选人为中心的面试，人们缺乏全面的理解。一项针对123名求职者的探索性研究揭示了面试官行为影响求职者体验的关键指标。这些指标为我们人工智能驱动的SmartView系统的设计提供了依据，该系统可以为面试官提供自动的面试后反馈。该系统的实际部署进行了三周，有35位采访者。根据我们的研究，大多数面试官发现SmartView的见解有助于确定需要改进的地方，并可以帮助他们提高面试技巧。

引用次数: 0

The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents 多模态情绪条件反射和情感一致性对具身会话主体的重要性

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584045

Che-Jui Chang, Samuel S. Sohn, Sen Zhang, R. Jayashankar, Muhammad Usman, M. Kapadia

Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for each modality that are as expressive as real human behaviors. The second challenge is that the affects are modeled independently, which makes it difficult to generate multimodal responses with consistent emotions across all modalities. In this work, we propose a conceptual framework, ACTOR (Affect-Consistent mulTimodal behaviOR generation), that aims to increase the perception of affects by generating multimodal behaviors conditioned on a consistent driving affect. We have conducted a user study with 199 participants to assess how the average person judges the affects perceived from multimodal behaviors that are consistent and inconsistent with respect to a driving affect. The result shows that among all model conditions, our affect-consistent framework receives the highest Likert scores for the perception of driving affects. Our statistical analysis suggests that making a modality affect-inconsistent significantly decreases the perception of driving affects. We also observe that multimodal behaviors conditioned on consistent affects are more expressive compared to behaviors with inconsistent affects. Therefore, we conclude that multimodal emotion conditioning and affect consistency are vital to enhancing the perception of affects for embodied conversational agents.

以往关于具身虚拟代理情绪感知的研究表明，通过与人类的互动，使用虚拟角色来传达情感是有效的。然而，创建具有表达行为的自主嵌入会话代理存在两个主要挑战。第一个挑战是很难综合每个模态的会话行为，这些行为与真实的人类行为一样具有表现力。第二个挑战是，影响是独立建模的，这使得很难在所有模态中产生具有一致情绪的多模态反应。在这项工作中，我们提出了一个概念框架，ACTOR(影响-一致的多模态行为生成)，旨在通过生成以一致的驱动影响为条件的多模态行为来增加对影响的感知。我们对199名参与者进行了一项用户研究，以评估普通人如何判断从与驾驶影响一致或不一致的多模式行为中感知到的影响。结果表明，在所有模型条件中，我们的影响一致框架在驾驶影响感知方面获得了最高的李克特分数。我们的统计分析表明，使模态影响不一致显著降低驾驶影响的感知。我们还观察到，以一致情感为条件的多模态行为比不一致情感的行为更具表现力。因此，我们得出结论，多模态情绪条件反射和情感一致性对于增强具身会话主体的情感感知至关重要。

{"title":"The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents","authors":"Che-Jui Chang, Samuel S. Sohn, Sen Zhang, R. Jayashankar, Muhammad Usman, M. Kapadia","doi":"10.1145/3581641.3584045","DOIUrl":"https://doi.org/10.1145/3581641.3584045","url":null,"abstract":"Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for each modality that are as expressive as real human behaviors. The second challenge is that the affects are modeled independently, which makes it difficult to generate multimodal responses with consistent emotions across all modalities. In this work, we propose a conceptual framework, ACTOR (Affect-Consistent mulTimodal behaviOR generation), that aims to increase the perception of affects by generating multimodal behaviors conditioned on a consistent driving affect. We have conducted a user study with 199 participants to assess how the average person judges the affects perceived from multimodal behaviors that are consistent and inconsistent with respect to a driving affect. The result shows that among all model conditions, our affect-consistent framework receives the highest Likert scores for the perception of driving affects. Our statistical analysis suggests that making a modality affect-inconsistent significantly decreases the perception of driving affects. We also observe that multimodal behaviors conditioned on consistent affects are more expressive compared to behaviors with inconsistent affects. Therefore, we conclude that multimodal emotion conditioning and affect consistency are vital to enhancing the perception of affects for embodied conversational agents.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130810619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks SmartRecorder:基于imu的智能手机交互任务视频教程制作演示系统

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584069

Xiaozhu Hu, Yanwen Huang, Bo Liu, Ruolan Wu, Yongquan Hu, A. Quigley, Mingming Fan, Chun Yu, Yuanchun Shi

This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.

这项工作关注的是HCI社区的一个活跃话题，即通过示范创建教程。我们提出了一个名为SmartRecorder的新工具，它可以帮助人们在没有视频编辑技能的情况下，为智能手机交互任务创建视频教程。由于自动交互轨迹提取是教程生成的关键组成部分，我们试图解决从屏幕视频中自动提取智能手机上的用户交互轨迹的挑战。独特的是，就该领域的先前研究而言，我们将计算机视觉技术与基于IMU的传感算法相结合，技术评估结果显示了智能手机IMU数据对提高系统性能的重要性。通过提取每个步骤的关键信息，SmartRecorder初始生成教学内容，并为教程创建者提供基于关键步骤的高召回率(99.38%)设计的教程改进编辑器，用于修改初始教学内容。最后，SmartRecorder根据精炼的教学内容生成视频教程。用户研究的结果表明，SmartRecorder允许非专业人士用更少的时间和更高的满意度创建智能手机使用视频教程。

{"title":"SmartRecorder: An IMU-based Video Tutorial Creation by Demonstration System for Smartphone Interaction Tasks","authors":"Xiaozhu Hu, Yanwen Huang, Bo Liu, Ruolan Wu, Yongquan Hu, A. Quigley, Mingming Fan, Chun Yu, Yuanchun Shi","doi":"10.1145/3581641.3584069","DOIUrl":"https://doi.org/10.1145/3581641.3584069","url":null,"abstract":"This work focuses on an active topic in the HCI community, namely tutorial creation by demonstration. We present a novel tool named SmartRecorder that facilitates people, without video editing skills, creating video tutorials for smartphone interaction tasks. As automatic interaction trace extraction is a key component to tutorial generation, we seek to tackle the challenges of automatically extracting user interaction traces on smartphones from screencasts. Uniquely, with respect to prior research in this field, we combine computer vision techniques with IMU-based sensing algorithms, and the technical evaluation results show the importance of smartphone IMU data in improving system performance. With the extracted key information of each step, SmartRecorder generates instructional content initially and provides tutorial creators with a tutorial refinement editor designed based on a high recall (99.38%) of key steps to revise the initial instructional content. Finally, SmartRecorder generates video tutorials based on refined instructional content. The results of the user study demonstrate that SmartRecorder allows non-experts to create smartphone usage video tutorials with less time and higher satisfaction from recipients.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133008310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational Simulations 理解采用无障碍眼控打字的障碍:来自定性部署研究和计算模拟的设计含义

Proceedings of the 28th International Conference on Intelligent User Interfaces

Pub Date : 2023-03-27 DOI: 10.1145/3581641.3584093

P. Kristensson, Morten Mjelde, K. Vertanen

Eye-typing is a slow and cumbersome text entry method typically used by individuals with no other practical means of communication. As an alternative, prior HCI research has proposed dwell-free eye-typing as a potential improvement that eliminates time-consuming and distracting dwell-timeouts. However, it is rare that such research ideas are translated into working products. This paper reports on a qualitative deployment study of a product that was developed to allow users access to a dwell-free eye-typing research solution. This allowed us to understand how such a research solution would work in practice, as part of users’ current communication solutions in their own homes. Based on interviews and observations, we discuss a number of design issues that currently act as barriers preventing widespread adoption of dwell-free eye-typing. The study findings are complemented with computational simulations in a range of conditions that were inspired by the findings in the deployment study. These simulations serve to both contextualize the qualitative findings and to explore quantitative implications of possible interface redesigns. The combined analysis gives rise to a set of design implications for enabling wider adoption of dwell-free eye-typing in practice.

眼睛打字是一种缓慢而繁琐的文本输入方法，通常用于没有其他实用交流手段的个人。作为替代方案，先前的HCI研究已经提出，无驻留的眼睛分型是一种潜在的改进，可以消除耗时和分散注意力的驻留超时。然而，这些研究想法很少被转化为工作产品。本文报告了一项产品的定性部署研究，该产品的开发允许用户访问无驻留的眼睛类型研究解决方案。这使我们能够理解这样的研究解决方案如何在实践中发挥作用，作为用户在自己家中当前通信解决方案的一部分。基于访谈和观察，我们讨论了一些设计问题，这些问题目前阻碍了无人居眼打字的广泛采用。该研究结果与一系列条件下的计算模拟相辅相成，这些模拟是由部署研究中的结果所启发的。这些模拟有助于将定性结果置于背景中，并探索可能的界面重新设计的定量含义。结合分析，产生了一组设计含义，使在实践中更广泛地采用无居住眼打字。

{"title":"Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational Simulations","authors":"P. Kristensson, Morten Mjelde, K. Vertanen","doi":"10.1145/3581641.3584093","DOIUrl":"https://doi.org/10.1145/3581641.3584093","url":null,"abstract":"Eye-typing is a slow and cumbersome text entry method typically used by individuals with no other practical means of communication. As an alternative, prior HCI research has proposed dwell-free eye-typing as a potential improvement that eliminates time-consuming and distracting dwell-timeouts. However, it is rare that such research ideas are translated into working products. This paper reports on a qualitative deployment study of a product that was developed to allow users access to a dwell-free eye-typing research solution. This allowed us to understand how such a research solution would work in practice, as part of users’ current communication solutions in their own homes. Based on interviews and observations, we discuss a number of design issues that currently act as barriers preventing widespread adoption of dwell-free eye-typing. The study findings are complemented with computational simulations in a range of conditions that were inspired by the findings in the deployment study. These simulations serve to both contextualize the qualitative findings and to explore quantitative implications of possible interface redesigns. The combined analysis gives rise to a set of design implications for enabling wider adoption of dwell-free eye-typing in practice.","PeriodicalId":118159,"journal":{"name":"Proceedings of the 28th International Conference on Intelligent User Interfaces","volume":"24 volume 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125147602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3