Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

英文中文

FilterJoint: Toward an Understanding of Whole-Body Gesture Articulation FilterJoint:迈向对全身手势表达的理解

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418822

Aishat Aloba, Julia Woodward, Lisa Anthony

Classification accuracy of whole-body gestures can be improved by selecting gestures that have few conflicts (i.e., confusions or misclassifications). To identify such gestures, an understanding of the nuances of how users articulate whole-body gestures can help, especially when conflicts may be due to confusion among seemingly dissimilar gestures. To the best of our knowledge, such an understanding is currently missing in the literature. As a first step to enable this understanding, we designed a method that facilitates investigation of variations in how users move their body parts as they perform a motion. This method, which we call filterJoint, selects the key body parts that are actively moving during the performance of a motion. The paths along which these body parts move in space over time can then be analyzed to make inferences about how users articulate whole-body gestures. We present two case studies to show how the filterJoint method enables a deeper understanding of whole-body gesture articulation, and we highlight implications for the selection of whole-body gesture sets as a result of these insights.

通过选择冲突较少(即混淆或错误分类)的手势，可以提高全身手势的分类精度。为了识别这些手势，了解用户如何表达全身手势的细微差别会有所帮助，特别是当看似不同的手势之间的混淆可能导致冲突时。据我们所知，目前文献中缺少这样的理解。作为实现这种理解的第一步，我们设计了一种方法，便于调查用户在执行运动时如何移动身体部位的变化。这个方法，我们称之为filterJoint，选择在运动执行过程中主动移动的关键身体部位。然后，这些身体部位在空间中随时间移动的路径可以被分析，从而推断出用户是如何表达全身手势的。我们提出了两个案例研究来展示filterJoint方法如何能够更深入地理解全身手势发音，并且我们强调了这些见解对选择全身手势集的影响。

{"title":"FilterJoint: Toward an Understanding of Whole-Body Gesture Articulation","authors":"Aishat Aloba, Julia Woodward, Lisa Anthony","doi":"10.1145/3382507.3418822","DOIUrl":"https://doi.org/10.1145/3382507.3418822","url":null,"abstract":"Classification accuracy of whole-body gestures can be improved by selecting gestures that have few conflicts (i.e., confusions or misclassifications). To identify such gestures, an understanding of the nuances of how users articulate whole-body gestures can help, especially when conflicts may be due to confusion among seemingly dissimilar gestures. To the best of our knowledge, such an understanding is currently missing in the literature. As a first step to enable this understanding, we designed a method that facilitates investigation of variations in how users move their body parts as they perform a motion. This method, which we call filterJoint, selects the key body parts that are actively moving during the performance of a motion. The paths along which these body parts move in space over time can then be analyzed to make inferences about how users articulate whole-body gestures. We present two case studies to show how the filterJoint method enables a deeper understanding of whole-body gesture articulation, and we highlight implications for the selection of whole-body gesture sets as a result of these insights.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123779091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Eliciting Emotion with Vibrotactile Stimuli Evocative of Real-World Sensations 用振动触觉刺激激发情感唤起现实世界的感觉

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418812

S. Macdonald, S. Brewster, F. Pollick

This paper describes a novel category of affective vibrotactile stimuli which evoke real-world sensations and details a study into emotional responses to them. The affective properties of short and abstract vibrotactile waveforms have previously been studied and shown to have a narrow emotional range. By contrast this paper investigated emotional responses to longer waveforms and to emotionally resonant vibrotactile stimuli, stimuli which are evocative of real-world sensations such as animal purring or running water. Two studies were conducted. The first recorded emotional responses to Tactons with a duration of 20 seconds. The second investigated emotional responses to novel emotionally resonant stimuli. Stimuli that users found more emotionally resonant were more pleasant, particularly if they had prior emotional connections to the sensation represented. Results suggest that future designers could use emotional resonance to expand the affective response range of vibrotactile cues by utilising stimuli with which users bear an emotional association.

本文描述了一种新颖的情感振动触觉刺激，它能唤起现实世界的感觉，并详细介绍了对它们的情绪反应的研究。短而抽象的振动触觉波形的情感特性已经被研究过，并显示出其情感范围很窄。相比之下，本文研究了对较长波形和情感共振振动触觉刺激的情绪反应，这些刺激唤起了现实世界的感觉，如动物的咕噜声或流水。进行了两项研究。第一组记录了对Tactons的情绪反应，持续时间为20秒。第二组研究了对新奇的情感共鸣刺激的情绪反应。用户发现更能引起情感共鸣的刺激更令人愉快，特别是如果他们先前与所代表的感觉有情感联系的话。结果表明，未来的设计师可以利用情感共鸣来扩大振动触觉线索的情感反应范围，通过使用与用户有情感联系的刺激。

引用次数: 11

Attention Sensing through Multimodal User Modeling in an Augmented Reality Guessing Game 增强现实猜谜游戏中多模态用户建模的注意力感知

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418865

F. Putze, Dennis Küster, Timo Urban, Alexander Zastrow, Marvin Kampen

We developed an attention-sensitive system that is capable of playing the children's guessing game "I spy with my litte eye" with a human user. In this game, the user selects an object from a given scene and provides the system with a single-sentence clue about it. For each trial, the system tries to guess the target object. Our approach combines top-down and bottom-up machine learning for object and color detection, automatic speech recognition, natural language processing, a semantic database, eye tracking, and augmented reality. Our evaluation demonstrates performance significantly above chance level, and results for most of the individual machine learning components are encouraging. Participants reported very high levels of satisfaction and curiosity about the system. The collected data shows that our guessing game generates a complex and rich data set. We discuss the capabilities and challenges of our system and its components with respect to multimodal attention sensing.

我们开发了一个注意力敏感系统，它能够和人类用户一起玩儿童猜谜游戏“我用我的小眼睛侦察”。在这个游戏中，用户从给定的场景中选择一个物体，并向系统提供关于它的一句话线索。每次尝试，系统都会尝试猜测目标物体。我们的方法结合了自顶向下和自底向上的机器学习，用于对象和颜色检测、自动语音识别、自然语言处理、语义数据库、眼动追踪和增强现实。我们的评估显示了显著高于随机水平的性能，大多数单个机器学习组件的结果都是令人鼓舞的。参与者报告了对该系统非常高的满意度和好奇心。收集到的数据表明，我们的猜谜游戏产生了一个复杂而丰富的数据集。我们讨论了我们的系统及其组件在多模态注意力传感方面的能力和挑战。

引用次数: 3

First Workshop on Multimodal e-Coaches 首届多式联运电子巴士研讨会

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3420056

Leonardo Angelini, Mira El Kamali, E. Mugellini, Omar Abou Khaled, Yordan Dimitrov, V. Veleva, Zlatka Gospodinova, Nadejda Miteva, Richard Wheeler, Zoraida Callejas Carrión, D. Griol, Kawtar Benghazi Akhlaki, Manuel Noguera, P. Bamidis, E. Konstantinidis, D. Petsani, A. Beristain, D. Fotiadis, G. Chollet, M. I. Torres, A. Esposito, H. Schlieter

T e-Coaches are promising intelligent systems that aims at supporting human everyday life, dispatching advices through different interfaces, such as apps, conversational interfaces and augmented reality interfaces. This workshop aims at exploring how e-coaches might benefit from spatially and time-multiplexed interfaces and from different communication modalities (e.g., text, visual, audio, etc.) according to the context of the interaction.

“T教练”是一种很有前景的智能系统，旨在支持人类的日常生活，通过应用程序、对话界面、增强现实界面等不同的界面发送建议。本次研讨会旨在探讨电子教练如何根据互动的背景，从空间和时间复用的界面以及不同的交流方式(如文本、视觉、音频等)中获益。

引用次数: 2

"Was that successful?" On Integrating Proactive Meta-Dialogue in a DIY-Assistant using Multimodal Cues “成功了吗?”运用多模态线索在diy助手中整合主动元对话

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418818

Matthias Kraus, Marvin R. G. Schiller, G. Behnke, P. Bercher, Michael Dorna, M. Dambier, Birte Glimm, Susanne Biundo-Stephan, W. Minker

Effectively supporting novices during performance of complex tasks, e.g. do-it-yourself (DIY) projects, requires intelligent assistants to be more than mere instructors. In order to be accepted as a competent and trustworthy cooperation partner, they need to be able to actively participate in the project and engage in helpful conversations with users when assistance is necessary. Therefore, a new proactive version of the DIY-assistant Robert is presented in this paper. It extends the previous prototype by including the capability to initiate reflective meta-dialogues using multimodal cues. Two different strategies for reflective dialogue are implemented: A progress-based strategy initiates a reflective dialogue about previous experience with the assistance for encouraging the self-appraisal of the user. An activity-based strategy is applied for providing timely, task-dependent support. Therefore, user activities with a connected drill driver are tracked that trigger dialogues in order to reflect on the current task and to prevent task failure. An experimental study comparing the proactive assistant against the baseline version shows that proactive meta-dialogue is able to build user trust significantly better than a solely reactive system. Besides, the results provide interesting insights for the development of proactive dialogue assistants.

在执行复杂任务时有效地支持新手，例如自己动手(DIY)项目，需要智能助手不仅仅是教师。为了被接受为一个有能力和值得信赖的合作伙伴，他们需要能够积极参与项目，并在需要帮助时与用户进行有益的对话。因此，本文提出了一种新的主动版本的diy助手Robert。它扩展了之前的原型，包括使用多模态线索启动反射元对话的能力。实施了两种不同的反思性对话策略:基于进展的策略发起关于以往经验的反思性对话，帮助鼓励用户进行自我评价。应用基于活动的策略来提供及时的、与任务相关的支持。因此，使用连接的钻机驱动程序跟踪触发对话的用户活动，以便反映当前任务并防止任务失败。一项比较主动助理和基线版本的实验研究表明，主动元对话能够比单独的反应系统更好地建立用户信任。此外，研究结果为主动对话助手的开发提供了有趣的见解。

{"title":"\"Was that successful?\" On Integrating Proactive Meta-Dialogue in a DIY-Assistant using Multimodal Cues","authors":"Matthias Kraus, Marvin R. G. Schiller, G. Behnke, P. Bercher, Michael Dorna, M. Dambier, Birte Glimm, Susanne Biundo-Stephan, W. Minker","doi":"10.1145/3382507.3418818","DOIUrl":"https://doi.org/10.1145/3382507.3418818","url":null,"abstract":"Effectively supporting novices during performance of complex tasks, e.g. do-it-yourself (DIY) projects, requires intelligent assistants to be more than mere instructors. In order to be accepted as a competent and trustworthy cooperation partner, they need to be able to actively participate in the project and engage in helpful conversations with users when assistance is necessary. Therefore, a new proactive version of the DIY-assistant Robert is presented in this paper. It extends the previous prototype by including the capability to initiate reflective meta-dialogues using multimodal cues. Two different strategies for reflective dialogue are implemented: A progress-based strategy initiates a reflective dialogue about previous experience with the assistance for encouraging the self-appraisal of the user. An activity-based strategy is applied for providing timely, task-dependent support. Therefore, user activities with a connected drill driver are tracked that trigger dialogues in order to reflect on the current task and to prevent task failure. An experimental study comparing the proactive assistant against the baseline version shows that proactive meta-dialogue is able to build user trust significantly better than a solely reactive system. Besides, the results provide interesting insights for the development of proactive dialogue assistants.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117039238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Predicting the Effectiveness of Systematic Desensitization Through Virtual Reality for Mitigating Public Speaking Anxiety 通过虚拟现实预测系统脱敏对缓解演讲焦虑的效果

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418883

M. V. Ebers, E. Nirjhar, A. Behzadan, Theodora Chaspari

Public speaking is central to socialization in casual, professional, or academic settings. Yet, public speaking anxiety (PSA) is known to impact a considerable portion of the general population. This paper utilizes bio-behavioral indices captured from wearable devices to quantify the effectiveness of systematic exposure to virtual reality (VR) audiences for mitigating PSA. The effect of separate bio-behavioral features and demographic factors is studied, as well as the amount of necessary data from the VR sessions that can yield a reliable predictive model of the VR training effectiveness. Results indicate that acoustic and physiological reactivity during the VR exposure can reliably predict change in PSA before and after the training. With the addition of demographic features, both acoustic and physiological feature sets achieve improvements in performance. Finally, using bio-behavioral data from six to eight VR sessions can yield reliable prediction of PSA change. Findings of this study will enable researchers to better understand how bio-behavioral factors indicate improvements in PSA with VR training.

在非正式场合、专业场合或学术场合，演讲是社交的核心。然而，公众演讲焦虑(PSA)被认为影响了相当一部分普通人群。本文利用从可穿戴设备捕获的生物行为指数来量化系统暴露于虚拟现实(VR)受众以减轻PSA的有效性。研究了单独的生物行为特征和人口统计学因素的影响，以及来自VR会话的必要数据量，这些数据可以产生可靠的VR训练效果预测模型。结果表明，VR暴露期间的声学和生理反应性可以可靠地预测训练前后PSA的变化。随着人口统计学特征的增加，声学和生理特征集的性能都得到了改善。最后，使用6至8次VR会话的生物行为数据可以可靠地预测PSA的变化。这项研究的发现将使研究人员更好地了解生物行为因素如何表明VR训练对PSA的改善。

{"title":"Predicting the Effectiveness of Systematic Desensitization Through Virtual Reality for Mitigating Public Speaking Anxiety","authors":"M. V. Ebers, E. Nirjhar, A. Behzadan, Theodora Chaspari","doi":"10.1145/3382507.3418883","DOIUrl":"https://doi.org/10.1145/3382507.3418883","url":null,"abstract":"Public speaking is central to socialization in casual, professional, or academic settings. Yet, public speaking anxiety (PSA) is known to impact a considerable portion of the general population. This paper utilizes bio-behavioral indices captured from wearable devices to quantify the effectiveness of systematic exposure to virtual reality (VR) audiences for mitigating PSA. The effect of separate bio-behavioral features and demographic factors is studied, as well as the amount of necessary data from the VR sessions that can yield a reliable predictive model of the VR training effectiveness. Results indicate that acoustic and physiological reactivity during the VR exposure can reliably predict change in PSA before and after the training. With the addition of demographic features, both acoustic and physiological feature sets achieve improvements in performance. Finally, using bio-behavioral data from six to eight VR sessions can yield reliable prediction of PSA change. Findings of this study will enable researchers to better understand how bio-behavioral factors indicate improvements in PSA with VR training.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128512178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Modeling Socio-Emotional and Cognitive Processes from Multimodal Data in the Wild 从野外多模态数据建模社会情绪和认知过程

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3420053

Dennis Küster, F. Putze, Patrícia Alves-Oliveira, Maike Paetzel, T. Schultz

Detecting, modeling, and making sense of multimodal data from human users in the wild still poses numerous challenges. Starting from aspects of data quality and reliability of our measurement instruments, the multidisciplinary endeavor of developing intelligent adaptive systems in human-computer or human-robot interaction (HCI, HRI) requires a broad range of expertise and more integrative efforts to make such systems reliable, engaging, and user-friendly. At the same time, the spectrum of applications for machine learning and modeling of multimodal data in the wild keeps expanding. From the classroom to the robot-assisted operation theatre, our workshop aims to support a vibrant exchange about current trends and methods in the field of modeling multimodal data in the wild.

在野外检测、建模和理解来自人类用户的多模态数据仍然面临许多挑战。从我们的测量仪器的数据质量和可靠性方面开始，开发人机或人机交互(HCI, HRI)的智能自适应系统的多学科努力需要广泛的专业知识和更综合的努力，以使这些系统可靠，引人入胜和用户友好。与此同时，机器学习和野外多模态数据建模的应用范围不断扩大。从教室到机器人辅助手术室，我们的研讨会旨在支持关于野外多模态数据建模领域的当前趋势和方法的活跃交流。

引用次数: 4

Exploring Personal Memories and Video Content as Context for Facial Behavior in Predictions of Video-Induced Emotions 探索个人记忆和视频内容作为视频诱发情绪预测中面部行为的背景

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418814

Bernd Dudzik, J. Broekens, Mark Antonius Neerincx, H. Hung

Empirical evidence suggests that the emotional meaning of facial behavior in isolation is often ambiguous in real-world conditions. While humans complement interpretations of others' faces with additional reasoning about context, automated approaches rarely display such context-sensitivity. Empirical findings indicate that the personal memories triggered by videos are crucial for predicting viewers' emotional response to such videos ?- in some cases, even more so than the video's audiovisual content. In this article, we explore the benefits of personal memories as context for facial behavior analysis. We conduct a series of multimodal machine learning experiments combining the automatic analysis of video-viewers' faces with that of two types of context information for affective predictions: beginenumerate* [label=(arabic*)] item self-reported free-text descriptions of triggered memories and item a video's audiovisual content endenumerate*. Our results demonstrate that both sources of context provide models with information about variation in viewers' affective responses that complement facial analysis and each other.

经验证据表明，孤立的面部行为的情感含义在现实世界中往往是模糊的。虽然人类会通过额外的上下文推理来补充对他人面部的解释，但自动化方法很少显示出这种上下文敏感性。实证研究结果表明，视频引发的个人记忆对于预测观众对此类视频的情绪反应至关重要——在某些情况下，甚至比视频的视听内容更重要。在这篇文章中，我们探讨了个人记忆作为面部行为分析背景的好处。我们进行了一系列多模态机器学习实验，将视频观众面部的自动分析与两种类型的上下文信息相结合，用于情感预测:beginenumerate* [label=(arabic*)] item自我报告的触发记忆的自由文本描述和item视频的视听内容endenumerate*。我们的研究结果表明，上下文的两种来源都为模型提供了关于观众情感反应变化的信息，这些信息与面部分析互为补充。

{"title":"Exploring Personal Memories and Video Content as Context for Facial Behavior in Predictions of Video-Induced Emotions","authors":"Bernd Dudzik, J. Broekens, Mark Antonius Neerincx, H. Hung","doi":"10.1145/3382507.3418814","DOIUrl":"https://doi.org/10.1145/3382507.3418814","url":null,"abstract":"Empirical evidence suggests that the emotional meaning of facial behavior in isolation is often ambiguous in real-world conditions. While humans complement interpretations of others' faces with additional reasoning about context, automated approaches rarely display such context-sensitivity. Empirical findings indicate that the personal memories triggered by videos are crucial for predicting viewers' emotional response to such videos ?- in some cases, even more so than the video's audiovisual content. In this article, we explore the benefits of personal memories as context for facial behavior analysis. We conduct a series of multimodal machine learning experiments combining the automatic analysis of video-viewers' faces with that of two types of context information for affective predictions: beginenumerate* [label=(arabic*)] item self-reported free-text descriptions of triggered memories and item a video's audiovisual content endenumerate*. Our results demonstrate that both sources of context provide models with information about variation in viewers' affective responses that complement facial analysis and each other.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124220884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Purring Wheel: Thermal and Vibrotactile Notifications on the Steering Wheel 呼噜声方向盘:方向盘上的热和振动触觉通知

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418825

Patrizia Di Campli San Vito, S. Brewster, F. Pollick, Simon Thompson, L. Skrypchuk, A. Mouzakitis

Haptic feedback can improve safety and driving behaviour. While vibration has been widely studied, other haptic modalities have been neglected. To address this, we present two studies investigating the use of uni- and bimodal vibrotactile and thermal cues on the steering wheel. First, notifications with three levels of urgency were subjectively rated and then identified during simulated driving. Bimodal feedback showed an increased identification time over unimodal vibrotactile cues. Thermal feedback was consistently rated less urgent, showing its suitability for less time critical notifications, where vibration would be unnecessarily attention-grabbing. The second study investigated more complex thermal and bimodal haptic notifications comprised of two different types of information (Nature and Importance of incoming message). Results showed that both modalities could be identified with high recognition rates of up to 92% for both and up to 99% for a single type, opening up a novel design space for haptic in-car feedback.

触觉反馈可以提高安全性和驾驶行为。当振动被广泛研究时，其他的触觉模式却被忽视了。为了解决这个问题，我们提出了两项研究，调查在方向盘上使用单峰和双峰振动触觉和热线索。首先，对三个紧急级别的通知进行主观评级，然后在模拟驾驶过程中进行识别。双峰反馈显示识别时间比单峰振动触觉线索增加。热反馈一直被评为不太紧急，这表明它适用于时间不太关键的通知，在这些通知中，振动可能会不必要地吸引注意力。第二项研究调查了由两种不同类型的信息(传入信息的性质和重要性)组成的更复杂的热和双峰触觉通知。结果表明，两种模式的识别率均高达92%，单一类型的识别率高达99%，为车内触觉反馈开辟了新的设计空间。

引用次数: 7

A Multi-Modal Approach for Driver Gaze Prediction to Remove Identity Bias 一种消除身份偏差的多模态驾驶员注视预测方法

Proceedings of the 2020 International Conference on Multimodal Interaction

Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417961

Zehui Yu, Xiehe Huang, Xiubao Zhang, Haifeng Shen, Qun Li, Weihong Deng, Jian-Bo Tang, Yi Yang, Jieping Ye

Driver gaze prediction is an important task in Advanced Driver Assistance System (ADAS). Although the Convolutional Neural Network (CNN) can greatly improve the recognition ability, there are still several unsolved problems due to the challenge of illumination, pose and camera placement. To solve these difficulties, we propose an effective multi-model fusion method for driver gaze estimation. Rich appearance representations, i.e. holistic and eyes regions, and geometric representations, i.e. landmarks and Delaunay angles, are separately learned to predict the gaze, followed by a score-level fusion system. Moreover, pseudo-3D appearance supervision and identity-adaptive geometric normalization are proposed to further enhance the prediction accuracy. Finally, the proposed method achieves state-of-the-art accuracy of 82.5288% on the test data, which ranks 1st at the EmotiW2020 driver gaze prediction sub-challenge.

驾驶员注视预测是高级驾驶辅助系统(ADAS)中的一项重要任务。虽然卷积神经网络(CNN)可以大大提高识别能力，但由于光照、姿势和相机放置的挑战，仍然存在一些未解决的问题。为了解决这些问题，我们提出了一种有效的多模型融合的驾驶员注视估计方法。分别学习丰富的外观表征(即整体和眼睛区域)和几何表征(即地标和Delaunay角)来预测凝视，然后使用分数级融合系统。提出了伪三维外观监督和身份自适应几何归一化，进一步提高了预测精度。最后，该方法在测试数据上达到了82.5288%的准确率，在EmotiW2020驾驶员注视预测子挑战中排名第一。

引用次数: 11

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2020 International Conference on Multimodal Interaction

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀