Mai Imamura, Ayane Tashiro, Shiro Kumano, Kazuhiro Otsuka
{"title":"对话中头部动作和面部表情的协同功能谱分析","authors":"Mai Imamura, Ayane Tashiro, Shiro Kumano, Kazuhiro Otsuka","doi":"10.1145/3577190.3614153","DOIUrl":null,"url":null,"abstract":"A framework, synergetic functional spectrum analysis (sFSA), is proposed to reveal how multimodal nonverbal behaviors such as head movements and facial expressions cooperatively perform communicative functions in conversations. We first introduce a functional spectrum to represent the functional multiplicity and ambiguity in nonverbal behaviors, e.g., a nod could imply listening, agreement, or both. More specifically, the functional spectrum is defined as the distribution of perceptual intensities of multiple functions across multiple modalities, which are based on multiple raters’ judgments. Next, the functional spectrum is decomposed into a small basis set called the synergetic functional basis, which can characterize primary and distinctive multimodal functionalities and span a synergetic functional space. Using these bases, the input spectrum is approximated as a linear combination of the bases and corresponding coefficients, which represent the coordinate in the functional space. To that purpose, this paper proposes semi-orthogonal nonnegative matrix factorization (SO-NMF) and discovers some essential multimodal synergies in the listener’s back-channel, thinking, positive responses, and speaker’s thinking and addressing. Furthermore, we proposes regression models based on convolutional neural networks (CNNs) to estimate the functional space coordinates from head movements and facial action units, and confirm the potential of the sFSA.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analyzing Synergetic Functional Spectrum from Head Movements and Facial Expressions in Conversations\",\"authors\":\"Mai Imamura, Ayane Tashiro, Shiro Kumano, Kazuhiro Otsuka\",\"doi\":\"10.1145/3577190.3614153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A framework, synergetic functional spectrum analysis (sFSA), is proposed to reveal how multimodal nonverbal behaviors such as head movements and facial expressions cooperatively perform communicative functions in conversations. We first introduce a functional spectrum to represent the functional multiplicity and ambiguity in nonverbal behaviors, e.g., a nod could imply listening, agreement, or both. More specifically, the functional spectrum is defined as the distribution of perceptual intensities of multiple functions across multiple modalities, which are based on multiple raters’ judgments. Next, the functional spectrum is decomposed into a small basis set called the synergetic functional basis, which can characterize primary and distinctive multimodal functionalities and span a synergetic functional space. Using these bases, the input spectrum is approximated as a linear combination of the bases and corresponding coefficients, which represent the coordinate in the functional space. To that purpose, this paper proposes semi-orthogonal nonnegative matrix factorization (SO-NMF) and discovers some essential multimodal synergies in the listener’s back-channel, thinking, positive responses, and speaker’s thinking and addressing. Furthermore, we proposes regression models based on convolutional neural networks (CNNs) to estimate the functional space coordinates from head movements and facial action units, and confirm the potential of the sFSA.\",\"PeriodicalId\":93171,\"journal\":{\"name\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577190.3614153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analyzing Synergetic Functional Spectrum from Head Movements and Facial Expressions in Conversations
A framework, synergetic functional spectrum analysis (sFSA), is proposed to reveal how multimodal nonverbal behaviors such as head movements and facial expressions cooperatively perform communicative functions in conversations. We first introduce a functional spectrum to represent the functional multiplicity and ambiguity in nonverbal behaviors, e.g., a nod could imply listening, agreement, or both. More specifically, the functional spectrum is defined as the distribution of perceptual intensities of multiple functions across multiple modalities, which are based on multiple raters’ judgments. Next, the functional spectrum is decomposed into a small basis set called the synergetic functional basis, which can characterize primary and distinctive multimodal functionalities and span a synergetic functional space. Using these bases, the input spectrum is approximated as a linear combination of the bases and corresponding coefficients, which represent the coordinate in the functional space. To that purpose, this paper proposes semi-orthogonal nonnegative matrix factorization (SO-NMF) and discovers some essential multimodal synergies in the listener’s back-channel, thinking, positive responses, and speaker’s thinking and addressing. Furthermore, we proposes regression models based on convolutional neural networks (CNNs) to estimate the functional space coordinates from head movements and facial action units, and confirm the potential of the sFSA.