The semantic space for emotional speech and the influence of different methods for prosody isolation on its perception

Proceedings of the 15th ACM Symposium on Applied Perception Pub Date : 2018-08-10 DOI:10.1145/3225153.3225156

Martin Schorradt, Susana Castillo, D. Cunningham

{"title":"The semantic space for emotional speech and the influence of different methods for prosody isolation on its perception","authors":"Martin Schorradt, Susana Castillo, D. Cunningham","doi":"10.1145/3225153.3225156","DOIUrl":null,"url":null,"abstract":"Normally, when people talk to other people, they communicate not only using specific words, but also with intentional changes in their voice melody, facial expressions, and gestures. Not only is human communication inherently multimodal, it is also multi-layered. That is, it conveys more than simple semantic information, but also passes on a wide variety of social, emotional, and functional (e.g., conversation control) information. Previous work has examined the perception of socio-emotional information conveyed by words and facial expressions. Here, we build on that work and examine the perception of socio-emotional information based solely on prosody (e.g., speech melody, rate, tempo, intensity). To examine the perception of affective prosody, it is necessary to remove all semantics from the speech signal - without changing the prosody! In this paper, we compare several different state-of-the-art methods for removing semantics. We started by recording an audio database containing a German sentence spoken by 11 people in 62 different emotional states. We then removed or masked the semantics using three different techniques. We also recorded the same 62 states for a pseudo-language phrase. Each of these five sets of stimuli were subjected to a semantic differential rating task to derive and compare the semantic spaces for emotions. The results show that each of the methods successfully removed the semantic component, but also changed the perception of the emotional content. Interestingly, the pseudo-word stimuli diverged most from the normal sentences. Furthermore, although each of the filters affected the perception of the sentence in some manner, they did so in different ways.","PeriodicalId":185507,"journal":{"name":"Proceedings of the 15th ACM Symposium on Applied Perception","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM Symposium on Applied Perception","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3225153.3225156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Normally, when people talk to other people, they communicate not only using specific words, but also with intentional changes in their voice melody, facial expressions, and gestures. Not only is human communication inherently multimodal, it is also multi-layered. That is, it conveys more than simple semantic information, but also passes on a wide variety of social, emotional, and functional (e.g., conversation control) information. Previous work has examined the perception of socio-emotional information conveyed by words and facial expressions. Here, we build on that work and examine the perception of socio-emotional information based solely on prosody (e.g., speech melody, rate, tempo, intensity). To examine the perception of affective prosody, it is necessary to remove all semantics from the speech signal - without changing the prosody! In this paper, we compare several different state-of-the-art methods for removing semantics. We started by recording an audio database containing a German sentence spoken by 11 people in 62 different emotional states. We then removed or masked the semantics using three different techniques. We also recorded the same 62 states for a pseudo-language phrase. Each of these five sets of stimuli were subjected to a semantic differential rating task to derive and compare the semantic spaces for emotions. The results show that each of the methods successfully removed the semantic component, but also changed the perception of the emotional content. Interestingly, the pseudo-word stimuli diverged most from the normal sentences. Furthermore, although each of the filters affected the perception of the sentence in some manner, they did so in different ways.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

情绪言语的语义空间及不同韵律隔离方法对其感知的影响

通常，当人们与他人交谈时，他们不仅使用特定的词语，而且还会有意地改变他们的声音旋律、面部表情和手势。人类的交流不仅本质上是多模式的，而且是多层次的。也就是说，它传达的不仅仅是简单的语义信息，而且还传递各种各样的社会、情感和功能(例如，会话控制)信息。之前的研究研究了通过语言和面部表情传达的社会情感信息的感知。在此，我们以该工作为基础，研究了仅基于韵律(例如，语音旋律，速率，节奏，强度)的社会情感信息感知。为了检验情感韵律的感知，有必要从语音信号中去除所有语义——而不改变韵律!在本文中，我们比较了几种不同的最先进的去除语义的方法。我们首先录制了一个音频数据库，其中包含了11个人在62种不同情绪状态下所说的德语句子。然后，我们使用三种不同的技术删除或掩盖语义。我们还为一个伪语言短语记录了同样的62种状态。这五组刺激中的每一组都进行了语义差异评定任务，以推导和比较情绪的语义空间。结果表明，每种方法都成功地去除了语义成分，但也改变了对情感内容的感知。有趣的是，假词刺激与正常句子的差异最大。此外，尽管每个过滤器都以某种方式影响句子的感知，但它们的作用方式不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 15th ACM Symposium on Applied Perception

自引率

0.00%

发文量