X-AWARE: ConteXt-AWARE Human-Environment Attention Fusion for Driver Gaze Prediction in the Wild

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI:10.1145/3382507.3417967

Lukas Stappen, Georgios Rizos, Björn Schuller

{"title":"X-AWARE: ConteXt-AWARE Human-Environment Attention Fusion for Driver Gaze Prediction in the Wild","authors":"Lukas Stappen, Georgios Rizos, Björn Schuller","doi":"10.1145/3382507.3417967","DOIUrl":null,"url":null,"abstract":"Reliable systems for automatic estimation of the driver's gaze are crucial for reducing the number of traffic fatalities and for many emerging research areas aimed at developing intelligent vehicle-passenger systems. Gaze estimation is a challenging task, especially in environments with varying illumination and reflection properties. Furthermore, there is wide diversity with respect to the appearance of drivers' faces, both in terms of occlusions (e.g. vision aids) and cultural/ethnic backgrounds. For this reason, analysing the face along with contextual information - for example, the vehicle cabin environment - adds another, less subjective signal towards the design of robust systems for passenger gaze estimation. In this paper, we present an integrated approach to jointly model different features for this task. In particular, to improve the fusion of the visually captured environment with the driver's face, we have developed a contextual attention mechanism, X-AWARE, attached directly to the output convolutional layers of InceptionResNetV2 networks. In order to showcase the effectiveness of our approach, we use the Driver Gaze in the Wild dataset, recently released as part of the Eighth Emotion Recognition in the Wild Challenge (EmotiW) challenge. Our best model outperforms the baseline by an absolute of 15.03% in accuracy on the validation set, and improves the previously best reported result by an absolute of 8.72% on the test set.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3417967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Reliable systems for automatic estimation of the driver's gaze are crucial for reducing the number of traffic fatalities and for many emerging research areas aimed at developing intelligent vehicle-passenger systems. Gaze estimation is a challenging task, especially in environments with varying illumination and reflection properties. Furthermore, there is wide diversity with respect to the appearance of drivers' faces, both in terms of occlusions (e.g. vision aids) and cultural/ethnic backgrounds. For this reason, analysing the face along with contextual information - for example, the vehicle cabin environment - adds another, less subjective signal towards the design of robust systems for passenger gaze estimation. In this paper, we present an integrated approach to jointly model different features for this task. In particular, to improve the fusion of the visually captured environment with the driver's face, we have developed a contextual attention mechanism, X-AWARE, attached directly to the output convolutional layers of InceptionResNetV2 networks. In order to showcase the effectiveness of our approach, we use the Driver Gaze in the Wild dataset, recently released as part of the Eighth Emotion Recognition in the Wild Challenge (EmotiW) challenge. Our best model outperforms the baseline by an absolute of 15.03% in accuracy on the validation set, and improves the previously best reported result by an absolute of 8.72% on the test set.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

X-AWARE:基于上下文感知的人类-环境注意力融合，用于驾驶员注视预测

可靠的自动估计驾驶员目光的系统对于减少交通死亡人数和许多旨在开发智能车乘系统的新兴研究领域至关重要。注视估计是一项具有挑战性的任务，特别是在具有不同照明和反射特性的环境中。此外，司机的面部外观也存在很大的差异，无论是在遮挡(例如视力辅助)还是在文化/种族背景方面。出于这个原因，将人脸与上下文信息(例如，车辆舱室环境)一起分析，为设计稳健的乘客凝视估计系统增加了另一个不那么主观的信号。在本文中，我们提出了一种集成的方法来联合建模不同的特征。特别是，为了改善视觉捕捉环境与驾驶员面部的融合，我们开发了一种上下文注意机制，X-AWARE，直接附加到InceptionResNetV2网络的输出卷积层。为了展示我们方法的有效性，我们使用了驾驶员注视野生数据集，该数据集最近作为第八届野生挑战情感识别(EmotiW)挑战的一部分发布。我们的最佳模型在验证集上的准确度比基线高出15.03%，在测试集上的准确度比之前报告的最佳结果高出8.72%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量

期刊最新文献

OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception Human-centered Multimodal Machine Intelligence Touch Recognition with Attentive End-to-End Model MORSE: MultimOdal sentiment analysis for Real-life SEttings Temporal Attention and Consistency Measuring for Video Question Answering