首页 > 最新文献

Companion Publication of the 2020 International Conference on Multimodal Interaction最新文献

英文 中文
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation 基于扩散的文本和音频联合表示协同语音手势生成
Deichler, Anna, Mehta, Shivam, Alexanderson, Simon, Beskow, Jonas
This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.
本文描述了为GENEA(体现代理的非语言行为的生成和评估)挑战2023开发的系统。我们的解决方案建立在现有的基于扩散的运动合成模型上。我们提出了一个对比语音和动作预训练(CSMP)模块,该模块学习语音和手势的联合嵌入,目的是学习这些模式之间的语义耦合。在基于扩散的手势合成模型中,将CSMP模块的输出作为调理信号,以实现语义感知的同语音手势生成。我们的作品在提交的作品中获得了最高的人类相似性和最高的语言适当性评级。这表明我们的系统是一种很有前途的方法,可以在带有语义的代理中实现类似人类的协同语音手势。
{"title":"Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation","authors":"Deichler, Anna, Mehta, Shivam, Alexanderson, Simon, Beskow, Jonas","doi":"10.1145/3577190.3616117","DOIUrl":"https://doi.org/10.1145/3577190.3616117","url":null,"abstract":"This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135043457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
4th International Workshop on Multimodal Affect and Aesthetic Experience 第四届多模态情感与审美体验国际研讨会
Michal Muszynski, Theodoros Kostoulas, Leimin Tian, Edgar Roman-Rangel, Theodora Chaspari, Panos Amelidis
“Aesthetic experience” corresponds to the inner state of a person exposed to the form and content of artistic objects. Quantifying and interpreting the aesthetic experience of people in various contexts contribute towards a) creating context, and b) better understanding people’s affective reactions to aesthetic stimuli. Focusing on different types of artistic content, such as movie, music, literature, urban art, ancient artwork, and modern interactive technology, the 4th international workshop on Multimodal Affect and Aesthetic Experience (MAAE) aims to enhance interdisciplinary collaboration among researchers from affective computing, aesthetics, human-robot/computer interaction, digital archaeology and art, culture, ethics, and addictive games.
“审美体验”对应的是人接触到艺术对象的形式和内容时的内心状态。量化和解释人们在不同情境下的审美体验有助于a)创造情境,b)更好地理解人们对审美刺激的情感反应。第四届多模态情感与审美体验国际研讨会(MAAE)聚焦于不同类型的艺术内容,如电影、音乐、文学、城市艺术、古代艺术品和现代互动技术,旨在加强情感计算、美学、人机交互、数字考古与艺术、文化、伦理和成瘾游戏等领域研究人员之间的跨学科合作。
{"title":"4th International Workshop on Multimodal Affect and Aesthetic Experience","authors":"Michal Muszynski, Theodoros Kostoulas, Leimin Tian, Edgar Roman-Rangel, Theodora Chaspari, Panos Amelidis","doi":"10.1145/3577190.3616886","DOIUrl":"https://doi.org/10.1145/3577190.3616886","url":null,"abstract":"“Aesthetic experience” corresponds to the inner state of a person exposed to the form and content of artistic objects. Quantifying and interpreting the aesthetic experience of people in various contexts contribute towards a) creating context, and b) better understanding people’s affective reactions to aesthetic stimuli. Focusing on different types of artistic content, such as movie, music, literature, urban art, ancient artwork, and modern interactive technology, the 4th international workshop on Multimodal Affect and Aesthetic Experience (MAAE) aims to enhance interdisciplinary collaboration among researchers from affective computing, aesthetics, human-robot/computer interaction, digital archaeology and art, culture, ethics, and addictive games.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robot Duck Debugging: Can Attentive Listening Improve Problem Solving? 机器鸭调试:专心倾听能提高解决问题的能力吗?
Maria Teresa Parreira, Sarah Gillet, Iolanda Leite
While thinking aloud has been reported to positively affect problem-solving, the effects of the presence of an embodied entity (e.g., a social robot) to whom words can be directed remain mostly unexplored. In this work, we investigated the role of a robot in a “rubber duck debugging” setting, by analyzing how a robot’s listening behaviors could support a thinking-aloud problem-solving session. Participants completed two different tasks while speaking their thoughts aloud to either a robot or an inanimate object (a giant rubber duck). We implemented and tested two types of listener behavior in the robot: a rule-based heuristic and a deep-learning-based model. In a between-subject user study with 101 participants, we evaluated how the presence of a robot affected users’ engagement in thinking aloud, behavior during the task, and self-reported user experience. In addition, we explored the impact of the two robot listening behaviors on those measures. In contrast to prior work, our results indicate that neither the rule-based heuristic nor the deep learning robot conditions improved performance or perception of the task, compared to an inanimate object. We discuss potential explanations and shed light on the feasibility of designing social robots as assistive tools in thinking-aloud problem-solving tasks.
虽然据报道,大声思考对解决问题有积极的影响,但话语可以指向的实体(如社交机器人)的存在的影响仍未得到充分研究。在这项工作中,我们通过分析机器人的倾听行为如何支持大声思考解决问题的过程,研究了机器人在“橡皮鸭调试”设置中的作用。参与者完成了两项不同的任务,同时向机器人或无生命的物体(一只巨大的橡皮鸭)大声说出他们的想法。我们在机器人中实现并测试了两种类型的听众行为:基于规则的启发式和基于深度学习的模型。在一项有101名参与者的受试者间用户研究中,我们评估了机器人的存在如何影响用户在大声思考、任务中的行为和自我报告的用户体验中的参与度。此外,我们还探讨了两种机器人倾听行为对这些指标的影响。与之前的工作相反,我们的结果表明,与无生命物体相比,基于规则的启发式和深度学习机器人条件都没有提高任务的性能或感知。我们讨论了潜在的解释,并阐明了设计社交机器人作为辅助工具来解决问题的可行性。
{"title":"Robot Duck Debugging: Can Attentive Listening Improve Problem Solving?","authors":"Maria Teresa Parreira, Sarah Gillet, Iolanda Leite","doi":"10.1145/3577190.3614160","DOIUrl":"https://doi.org/10.1145/3577190.3614160","url":null,"abstract":"While thinking aloud has been reported to positively affect problem-solving, the effects of the presence of an embodied entity (e.g., a social robot) to whom words can be directed remain mostly unexplored. In this work, we investigated the role of a robot in a “rubber duck debugging” setting, by analyzing how a robot’s listening behaviors could support a thinking-aloud problem-solving session. Participants completed two different tasks while speaking their thoughts aloud to either a robot or an inanimate object (a giant rubber duck). We implemented and tested two types of listener behavior in the robot: a rule-based heuristic and a deep-learning-based model. In a between-subject user study with 101 participants, we evaluated how the presence of a robot affected users’ engagement in thinking aloud, behavior during the task, and self-reported user experience. In addition, we explored the impact of the two robot listening behaviors on those measures. In contrast to prior work, our results indicate that neither the rule-based heuristic nor the deep learning robot conditions improved performance or perception of the task, compared to an inanimate object. We discuss potential explanations and shed light on the feasibility of designing social robots as assistive tools in thinking-aloud problem-solving tasks.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implicit Search Intent Recognition using EEG and Eye Tracking: Novel Dataset and Cross-User Prediction 基于脑电和眼动追踪的隐式搜索意图识别:新数据集和跨用户预测
Mansi Sharma, Shuang Chen, Philipp Müller, Maurice Rekrut, Antonio Krüger
For machines to effectively assist humans in challenging visual search tasks, they must differentiate whether a human is simply glancing into a scene (navigational intent) or searching for a target object (informational intent). Previous research proposed combining electroencephalography (EEG) and eye-tracking measurements to recognize such search intents implicitly, i.e., without explicit user input. However, the applicability of these approaches to real-world scenarios suffers from two key limitations. First, previous work used fixed search times in the informational intent condition - a stark contrast to visual search, which naturally terminates when the target is found. Second, methods incorporating EEG measurements addressed prediction scenarios that require ground truth training data from the target user, which is impractical in many use cases. We address these limitations by making the first publicly available EEG and eye-tracking dataset for navigational vs. informational intent recognition, where the user determines search times. We present the first method for cross-user prediction of search intents from EEG and eye-tracking recordings and reach accuracy in leave-one-user-out evaluations - comparable to within-user prediction accuracy () but offering much greater flexibility.
为了让机器有效地帮助人类完成视觉搜索任务,它们必须区分人类是简单地浏览场景(导航意图)还是搜索目标对象(信息意图)。先前的研究提出结合脑电图(EEG)和眼动追踪测量来隐式识别这些搜索意图,即不需要明确的用户输入。然而,这些方法在实际场景中的适用性受到两个关键限制。首先,之前的工作在信息意图条件下使用固定的搜索时间,这与视觉搜索形成鲜明对比,视觉搜索在找到目标时自然终止。其次,结合EEG测量的方法解决了需要来自目标用户的真实训练数据的预测场景,这在许多用例中是不切实际的。我们通过制作第一个公开可用的脑电图和眼动追踪数据集来解决这些限制,这些数据集用于导航和信息意图识别,其中用户决定搜索时间。我们提出了第一种从脑电图和眼动追踪记录中进行跨用户搜索意图预测的方法,并达到了“留一个用户”评估的准确度——与用户内预测准确度()相当,但提供了更大的灵活性。
{"title":"Implicit Search Intent Recognition using EEG and Eye Tracking: Novel Dataset and Cross-User Prediction","authors":"Mansi Sharma, Shuang Chen, Philipp Müller, Maurice Rekrut, Antonio Krüger","doi":"10.1145/3577190.3614166","DOIUrl":"https://doi.org/10.1145/3577190.3614166","url":null,"abstract":"For machines to effectively assist humans in challenging visual search tasks, they must differentiate whether a human is simply glancing into a scene (navigational intent) or searching for a target object (informational intent). Previous research proposed combining electroencephalography (EEG) and eye-tracking measurements to recognize such search intents implicitly, i.e., without explicit user input. However, the applicability of these approaches to real-world scenarios suffers from two key limitations. First, previous work used fixed search times in the informational intent condition - a stark contrast to visual search, which naturally terminates when the target is found. Second, methods incorporating EEG measurements addressed prediction scenarios that require ground truth training data from the target user, which is impractical in many use cases. We address these limitations by making the first publicly available EEG and eye-tracking dataset for navigational vs. informational intent recognition, where the user determines search times. We present the first method for cross-user prediction of search intents from EEG and eye-tracking recordings and reach accuracy in leave-one-user-out evaluations - comparable to within-user prediction accuracy () but offering much greater flexibility.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices TongueTap:多模态舌手势识别与头戴式设备
Tan Gemicioglu, R. Michael Winters, Yu-Te Wang, Thomas M. Gable, Ivan J. Tashev
Mouth-based interfaces are a promising new approach enabling silent, hands-free and eyes-free interaction with wearable devices. However, interfaces sensing mouth movements are traditionally custom-designed and placed near or within the mouth. TongueTap synchronizes multimodal EEG, PPG, IMU, eye tracking and head tracking data from two commercial headsets to facilitate tongue gesture recognition using only off-the-shelf devices on the upper face. We classified eight closed-mouth tongue gestures with 94% accuracy, offering an invisible and inaudible method for discreet control of head-worn devices. Moreover, we found that the IMU alone differentiates eight gestures with 80% accuracy and a subset of four gestures with 92% accuracy. We built a dataset of 48,000 gesture trials across 16 participants, allowing TongueTap to perform user-independent classification. Our findings suggest tongue gestures can be a viable interaction technique for VR/AR headsets and earables without requiring novel hardware.
基于口部的接口是一种很有前途的新方法,可以与可穿戴设备进行无声、免提和免目的交互。然而,感知口腔运动的界面传统上是定制设计的,并放置在口腔附近或口腔内。TongueTap可以同步来自两款商用耳机的多模态EEG、PPG、IMU、眼动追踪和头部追踪数据,仅使用上面部的现成设备即可实现舌头手势识别。我们对8种闭嘴舌头手势进行了分类,准确率高达94%,为谨慎控制头戴式设备提供了一种看不见、听不见的方法。此外,我们发现IMU单独区分8种手势的准确率为80%,区分4种手势的子集的准确率为92%。我们建立了一个包含16名参与者的48,000个手势试验的数据集,允许TongueTap执行独立于用户的分类。我们的研究结果表明,舌头手势可以成为VR/AR耳机和可穿戴设备的一种可行的交互技术,而不需要新的硬件。
{"title":"TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices","authors":"Tan Gemicioglu, R. Michael Winters, Yu-Te Wang, Thomas M. Gable, Ivan J. Tashev","doi":"10.1145/3577190.3614120","DOIUrl":"https://doi.org/10.1145/3577190.3614120","url":null,"abstract":"Mouth-based interfaces are a promising new approach enabling silent, hands-free and eyes-free interaction with wearable devices. However, interfaces sensing mouth movements are traditionally custom-designed and placed near or within the mouth. TongueTap synchronizes multimodal EEG, PPG, IMU, eye tracking and head tracking data from two commercial headsets to facilitate tongue gesture recognition using only off-the-shelf devices on the upper face. We classified eight closed-mouth tongue gestures with 94% accuracy, offering an invisible and inaudible method for discreet control of head-worn devices. Moreover, we found that the IMU alone differentiates eight gestures with 80% accuracy and a subset of four gestures with 92% accuracy. We built a dataset of 48,000 gesture trials across 16 participants, allowing TongueTap to perform user-independent classification. Our findings suggest tongue gestures can be a viable interaction technique for VR/AR headsets and earables without requiring novel hardware.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ether-Mark: An Off-Screen Marking Menu For Mobile Devices ethernet - mark:移动设备的屏幕外标记菜单
Hanae Rateau, Yosra Rekik, Edward Lank
Given the computing power of mobile devices, porting feature-rich applications to these devices is increasingly feasible. However, feature-rich applications include large command sets, and providing access to these commands through screen-based widgets results in issues of occlusion and layering. To address this issue, we introduce Ether-Mark, a hierarchical, gesture-based, marking menu inspired, around-device menu for mobile devices enabling both on- and near-device interaction. We investigate the design of such menus and their learnability through three experiments. We first design and contrast three variants of Ether-Mark, yielding a zigzag menu design. We then refine input accuracy via a deformation model of the menu. And, we evaluate the learnability of the menus and the accuracy of the deformation model, revealing an accuracy rate up to 98.28%. We finally, compare in-air Ether-Mark with marking menus.Our results argue for Ether-Mark as a promising effective mechanism to leverage proximal around-device space.
考虑到移动设备的计算能力,将功能丰富的应用程序移植到这些设备上越来越可行。然而,功能丰富的应用程序包括大型命令集,并且通过基于屏幕的小部件提供对这些命令的访问会导致遮挡和分层问题。为了解决这个问题,我们引入了以太标记,这是一个分层的,基于手势的,受标记菜单启发的,围绕移动设备的菜单,支持设备上和设备附近的交互。我们通过三个实验来研究这些菜单的设计及其易学性。我们首先设计并对比了eth - mark的三个变体,得到了一个之字形菜单设计。然后,我们通过菜单的变形模型来改进输入精度。并对菜单的易学性和变形模型的准确率进行了评估,准确率高达98.28%。最后,我们比较了空中醚标记与标记菜单。我们的研究结果表明,以太坊标记是一种有前途的有效机制,可以利用近端设备周围空间。
{"title":"Ether-Mark: An Off-Screen Marking Menu For Mobile Devices","authors":"Hanae Rateau, Yosra Rekik, Edward Lank","doi":"10.1145/3577190.3614150","DOIUrl":"https://doi.org/10.1145/3577190.3614150","url":null,"abstract":"Given the computing power of mobile devices, porting feature-rich applications to these devices is increasingly feasible. However, feature-rich applications include large command sets, and providing access to these commands through screen-based widgets results in issues of occlusion and layering. To address this issue, we introduce Ether-Mark, a hierarchical, gesture-based, marking menu inspired, around-device menu for mobile devices enabling both on- and near-device interaction. We investigate the design of such menus and their learnability through three experiments. We first design and contrast three variants of Ether-Mark, yielding a zigzag menu design. We then refine input accuracy via a deformation model of the menu. And, we evaluate the learnability of the menus and the accuracy of the deformation model, revealing an accuracy rate up to 98.28%. We finally, compare in-air Ether-Mark with marking menus.Our results argue for Ether-Mark as a promising effective mechanism to leverage proximal around-device space.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
4th Workshop on Social Affective Multimodal Interaction for Health (SAMIH) 第四届社会情感多模态互动促进健康研讨会
Hiroki Tanaka, Satoshi Nakamura, Jean-Claude Martin, Catherine Pelachaud
This workshop discusses how interactive, multimodal technology, such as virtual agents, can measure and train social-affective interactions. Sensing technology now enables analyzing users’ behaviors and physiological signals. Various signal processing and machine learning methods can be used for prediction tasks. Such social signal processing and tools can be applied to measure and reduce social stress in everyday situations, including public speaking at schools and workplaces.
本次研讨会将讨论交互式、多模式技术(如虚拟代理)如何测量和训练社会情感互动。传感技术现在可以分析用户的行为和生理信号。各种信号处理和机器学习方法可用于预测任务。这种社会信号处理和工具可以用于测量和减少日常情况下的社会压力,包括在学校和工作场所的公开演讲。
{"title":"4th Workshop on Social Affective Multimodal Interaction for Health (SAMIH)","authors":"Hiroki Tanaka, Satoshi Nakamura, Jean-Claude Martin, Catherine Pelachaud","doi":"10.1145/3577190.3616878","DOIUrl":"https://doi.org/10.1145/3577190.3616878","url":null,"abstract":"This workshop discusses how interactive, multimodal technology, such as virtual agents, can measure and train social-affective interactions. Sensing technology now enables analyzing users’ behaviors and physiological signals. Various signal processing and machine learning methods can be used for prediction tasks. Such social signal processing and tools can be applied to measure and reduce social stress in everyday situations, including public speaking at schools and workplaces.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135045203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recording multimodal pair-programming dialogue for reference resolution by conversational agents 记录多模态配对编程对话,以供会话代理进行参考解析
Cecilia Domingo
Pair programming is a collaborative technique which has proven highly beneficial in terms of the code produced and the learning gains for programmers. With recent advances in Programming Language Processing (PLP), numerous tools have been created that assist programmers in non-collaborative settings (i.e., where the technology provides users with a solution, instead of discussing the problem to develop a solution together). How can we develop AI that can assist in pair programming, a collaborative setting? To tackle this task, we begin by gathering multimodal dialogue data which can be used to train systems in a basic subtask of dialogue understanding: multimodal reference resolution, i.e., understanding which parts of a program are being mentioned by users through speech or by using the mouse and keyboard.
结对编程是一种协作技术,已被证明在生成代码和程序员学习收益方面非常有益。随着编程语言处理(PLP)的最新进展,已经创建了许多工具来帮助程序员在非协作环境中(即,技术为用户提供解决方案,而不是一起讨论问题来开发解决方案)。我们如何开发能够帮助结对编程的人工智能,一个协作的环境?为了解决这个问题,我们首先收集多模态对话数据,这些数据可用于训练系统进行对话理解的基本子任务:多模态参考解析,即理解用户通过语音或使用鼠标和键盘提到了程序的哪些部分。
{"title":"Recording multimodal pair-programming dialogue for reference resolution by conversational agents","authors":"Cecilia Domingo","doi":"10.1145/3577190.3614231","DOIUrl":"https://doi.org/10.1145/3577190.3614231","url":null,"abstract":"Pair programming is a collaborative technique which has proven highly beneficial in terms of the code produced and the learning gains for programmers. With recent advances in Programming Language Processing (PLP), numerous tools have been created that assist programmers in non-collaborative settings (i.e., where the technology provides users with a solution, instead of discussing the problem to develop a solution together). How can we develop AI that can assist in pair programming, a collaborative setting? To tackle this task, we begin by gathering multimodal dialogue data which can be used to train systems in a basic subtask of dialogue understanding: multimodal reference resolution, i.e., understanding which parts of a program are being mentioned by users through speech or by using the mouse and keyboard.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Social Cognition and its Neurologic Deficits with Artificial Neural Networks 用人工神经网络模拟社会认知及其神经缺陷
Laurent P. Mertens
Artificial Neural Networks (ANNs) are computer models loosely inspired by the functioning of the human brain. They are the state-of-the-art method for tackling a variety of Artificial Intelligence (AI) problems, and an increasingly popular tool in neuroscientific studies. However, both domains pursue different goals: in AI, performance is key and brain resemblance is incidental, while in neuroscience the aim is chiefly to better understand the brain. This PhD is situated at the intersection of both disciplines. Its goal is to develop ANNs that model social cognition in neurotypical individuals, and that can be altered in a controlled way to exhibit behavior consistent with individuals with one of two clinical conditions, Autism Spectrum Disorder and Frontotemporal Dementia.
人工神经网络(ann)是一种受人脑功能启发的计算机模型。它们是解决各种人工智能(AI)问题的最先进方法,也是神经科学研究中越来越受欢迎的工具。然而,这两个领域追求不同的目标:在人工智能中,性能是关键,大脑相似性是偶然的,而在神经科学中,目标主要是更好地理解大脑。这个博士学位位于两个学科的交叉点。其目标是开发神经正常个体的社会认知模型的人工神经网络,并且可以以可控的方式改变,以显示与患有自闭症谱系障碍和额颞叶痴呆两种临床病症之一的个体一致的行为。
{"title":"Modeling Social Cognition and its Neurologic Deficits with Artificial Neural Networks","authors":"Laurent P. Mertens","doi":"10.1145/3577190.3614232","DOIUrl":"https://doi.org/10.1145/3577190.3614232","url":null,"abstract":"Artificial Neural Networks (ANNs) are computer models loosely inspired by the functioning of the human brain. They are the state-of-the-art method for tackling a variety of Artificial Intelligence (AI) problems, and an increasingly popular tool in neuroscientific studies. However, both domains pursue different goals: in AI, performance is key and brain resemblance is incidental, while in neuroscience the aim is chiefly to better understand the brain. This PhD is situated at the intersection of both disciplines. Its goal is to develop ANNs that model social cognition in neurotypical individuals, and that can be altered in a controlled way to exhibit behavior consistent with individuals with one of two clinical conditions, Autism Spectrum Disorder and Frontotemporal Dementia.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Role of Audiovisual Feedback Delays and Bimodal Congruency for Visuomotor Performance in Human-Machine Interaction 视听反馈延迟和双峰一致性对人机交互中视觉运动表现的影响
Annika Dix, Clarissa Sabrina Arlinghaus, A. Marie Harkin, Sebastian Pannasch
Despite incredible technological progress in the last decades, latency is still an issue for today's technologies and their applications. To better understand how latency and resulting feedback delays affect the interaction between humans and cyber-physical systems (CPS), the present study examines separate and joint effects of visual and auditory feedback delays on performance and the motor control strategy in a complex visuomotor task. Thirty-six participants played the Wire Loop Game, a fine motor skill task, while going through four different delay conditions: no delay, visual only, auditory only, and audiovisual (length: 200 ms). Participants’ speed and accuracy for completing the task and movement kinematic were assessed. Visual feedback delays slowed down movement execution and impaired precision compared to a condition without feedback delays. In contrast, delayed auditory feedback improved precision. Descriptively, the latter finding mainly appeared when congruent visual and auditory feedback delays were provided. We discuss the role of temporal congruency of audiovisual information as well as potential compensatory mechanisms that can inform the design of multisensory feedback in human-CPS interaction faced with latency.
尽管在过去的几十年里取得了令人难以置信的技术进步,但延迟仍然是当今技术及其应用的一个问题。为了更好地理解延迟和由此产生的反馈延迟如何影响人类与网络物理系统(CPS)之间的相互作用,本研究考察了视觉和听觉反馈延迟对复杂视觉运动任务中的表现和运动控制策略的单独和联合影响。36名参与者参加了钢丝圈游戏,这是一项精细运动技能任务,同时经历了四种不同的延迟条件:无延迟、只有视觉、只有听觉和视听(长度:200毫秒)。评估了参与者完成任务的速度和准确性以及运动的运动学。与没有反馈延迟的情况相比,视觉反馈延迟减慢了运动执行速度,降低了精度。相比之下,延迟的听觉反馈提高了精确度。描述性地说,后者的发现主要出现在提供一致的视觉和听觉反馈延迟时。我们讨论了视听信息时间一致性的作用以及潜在的补偿机制,这些机制可以为人类- cps交互面临延迟时的多感官反馈设计提供信息。
{"title":"The Role of Audiovisual Feedback Delays and Bimodal Congruency for Visuomotor Performance in Human-Machine Interaction","authors":"Annika Dix, Clarissa Sabrina Arlinghaus, A. Marie Harkin, Sebastian Pannasch","doi":"10.1145/3577190.3614111","DOIUrl":"https://doi.org/10.1145/3577190.3614111","url":null,"abstract":"Despite incredible technological progress in the last decades, latency is still an issue for today's technologies and their applications. To better understand how latency and resulting feedback delays affect the interaction between humans and cyber-physical systems (CPS), the present study examines separate and joint effects of visual and auditory feedback delays on performance and the motor control strategy in a complex visuomotor task. Thirty-six participants played the Wire Loop Game, a fine motor skill task, while going through four different delay conditions: no delay, visual only, auditory only, and audiovisual (length: 200 ms). Participants’ speed and accuracy for completing the task and movement kinematic were assessed. Visual feedback delays slowed down movement execution and impaired precision compared to a condition without feedback delays. In contrast, delayed auditory feedback improved precision. Descriptively, the latter finding mainly appeared when congruent visual and auditory feedback delays were provided. We discuss the role of temporal congruency of audiovisual information as well as potential compensatory mechanisms that can inform the design of multisensory feedback in human-CPS interaction faced with latency.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135044386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Companion Publication of the 2020 International Conference on Multimodal Interaction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1