Analyzing and Recognizing Interlocutors' Gaze Functions from Multimodal Nonverbal Cues

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI:10.1145/3577190.3614152

Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka

引用次数: 0

Abstract

A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从多模态非语言线索分析和识别对话者凝视功能

提出了一种分析和识别群体对话中凝视功能的新框架。考虑到注视功能的多样性和模糊性，我们首先定义了43种非排他性注视功能，这些功能在对话中起着重要作用，如监测、调节和表达。在此基础上，本文构建了功能性凝视语料库，并通过语料库分析揭示了说话时的称呼、思考和听者的参与等功能。接下来，针对10个最常见的功能，我们构建卷积神经网络(cnn)来识别来自多模态输入的基于帧的存在/不存在的每个凝视功能，包括头部姿势、话语状态、凝视/回避状态、眼球方向和面部表情。通过对不同输入集的比较，我们的实验证实，使用所有模态输入的CNN在边听边看方面达到了最好的性能，F值为0.839。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Companion Publication of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量