Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka
{"title":"从多模态非语言线索分析和识别对话者凝视功能","authors":"Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka","doi":"10.1145/3577190.3614152","DOIUrl":null,"url":null,"abstract":"A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing and Recognizing Interlocutors' Gaze Functions from Multimodal Nonverbal Cues\",\"authors\":\"Ayane Tashiro, Mai Imamura, Shiro Kumano, Kazuhiro Otsuka\",\"doi\":\"10.1145/3577190.3614152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.\",\"PeriodicalId\":93171,\"journal\":{\"name\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577190.3614152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing and Recognizing Interlocutors' Gaze Functions from Multimodal Nonverbal Cues
A novel framework is presented for analyzing and recognizing the functions of gaze in group conversations. Considering the multiplicity and ambiguity of the gaze functions, we first define 43 nonexclusive gaze functions that play essential roles in conversations, such as monitoring, regulation, and expressiveness. Based on the defined functions, in this study, a functional gaze corpus is created, and a corpus analysis reveals several frequent functions, such as addressing and thinking while speaking and attending by listeners. Next, targeting the ten most frequent functions, we build convolutional neural networks (CNNs) to recognize the frame-based presence/absence of each gaze function from multimodal inputs, including head pose, utterance status, gaze/avert status, eyeball direction, and facial expression. Comparing different input sets, our experiments confirm that the proposed CNN using all modality inputs achieves the best performance and an F value of 0.839 for listening while looking.