The acoustics of eye contact: detecting visual attention from conversational audio cues

GazeIn '13 Pub Date : 2013-12-13 DOI:10.1145/2535948.2535949

F. Eyben, F. Weninger, L. Paletta, Björn Schuller

{"title":"The acoustics of eye contact: detecting visual attention from conversational audio cues","authors":"F. Eyben, F. Weninger, L. Paletta, Björn Schuller","doi":"10.1145/2535948.2535949","DOIUrl":null,"url":null,"abstract":"An important aspect in short dialogues is attention as is manifested by eye-contact between subjects. In this study we provide a first analysis whether such visual attention is evident in the acoustic properties of a speaker's voice. We thereby introduce the multi-modal GRAS2 corpus, which was recorded for analysing attention in human-to-human interactions of short daily-life interactions with strangers in public places in Graz, Austria. Recordings of four test subjects equipped with eye tracking glasses, three audio recording devices, and motion sensors are contained in the corpus. We describe how we robustly identify speech segments from the subjects and other people in an unsupervised manner from multi-channel recordings. We then discuss correlations between the acoustics of the voice in these segments and the point of visual attention of the subjects. A significant relation between the acoustic features and the distance between the point of view and the eye region of the dialogue partner is found. Further, we show that automatic classification of binary decision eye-contact vs. no eye-contact from acoustic features alone is feasible with an Unweighted Average Recall of up to 70%.","PeriodicalId":403097,"journal":{"name":"GazeIn '13","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GazeIn '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2535948.2535949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

An important aspect in short dialogues is attention as is manifested by eye-contact between subjects. In this study we provide a first analysis whether such visual attention is evident in the acoustic properties of a speaker's voice. We thereby introduce the multi-modal GRAS2 corpus, which was recorded for analysing attention in human-to-human interactions of short daily-life interactions with strangers in public places in Graz, Austria. Recordings of four test subjects equipped with eye tracking glasses, three audio recording devices, and motion sensors are contained in the corpus. We describe how we robustly identify speech segments from the subjects and other people in an unsupervised manner from multi-channel recordings. We then discuss correlations between the acoustics of the voice in these segments and the point of visual attention of the subjects. A significant relation between the acoustic features and the distance between the point of view and the eye region of the dialogue partner is found. Further, we show that automatic classification of binary decision eye-contact vs. no eye-contact from acoustic features alone is feasible with an Unweighted Average Recall of up to 70%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

目光接触的声学:从对话音频线索中检测视觉注意力

在短对话中，注意力是一个很重要的方面，主要表现在对话双方的目光接触上。在这项研究中，我们首次分析了这种视觉注意力在说话者声音的声学特性中是否明显。因此，我们引入了多模态GRAS2语料库，该语料库被记录下来，用于分析在奥地利格拉茨的公共场所与陌生人的短暂日常生活互动中人与人之间的注意力。语料库中包含四名测试对象的记录，他们配备了眼动追踪眼镜，三个录音设备和运动传感器。我们描述了我们如何以无监督的方式从多通道录音中健壮地识别来自受试者和其他人的语音片段。然后，我们讨论这些片段中声音的声学效果与受试者的视觉注意点之间的相关性。研究发现，声音特征与对话对象的视点和眼睛区域之间的距离有显著的关系。此外，我们表明，仅从声学特征对二元决策进行眼接触与无眼接触的自动分类是可行的，未加权平均召回率高达70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

GazeIn '13

自引率

0.00%

发文量

期刊最新文献

Agent-assisted multi-viewpoint video viewer and its gaze-based evaluation Learning aspects of interest from Gaze A dominance estimation mechanism using eye-gaze and turn-taking information Unrawelling the interaction strategies and gaze in collaborative learning with online video lectures Mutual disambiguation of eye gaze and speech for sight translation and reading