The semantic space for emotional speech and the influence of different methods for prosody isolation on its perception

Martin Schorradt, Susana Castillo, D. Cunningham
{"title":"The semantic space for emotional speech and the influence of different methods for prosody isolation on its perception","authors":"Martin Schorradt, Susana Castillo, D. Cunningham","doi":"10.1145/3225153.3225156","DOIUrl":null,"url":null,"abstract":"Normally, when people talk to other people, they communicate not only using specific words, but also with intentional changes in their voice melody, facial expressions, and gestures. Not only is human communication inherently multimodal, it is also multi-layered. That is, it conveys more than simple semantic information, but also passes on a wide variety of social, emotional, and functional (e.g., conversation control) information. Previous work has examined the perception of socio-emotional information conveyed by words and facial expressions. Here, we build on that work and examine the perception of socio-emotional information based solely on prosody (e.g., speech melody, rate, tempo, intensity). To examine the perception of affective prosody, it is necessary to remove all semantics from the speech signal - without changing the prosody! In this paper, we compare several different state-of-the-art methods for removing semantics. We started by recording an audio database containing a German sentence spoken by 11 people in 62 different emotional states. We then removed or masked the semantics using three different techniques. We also recorded the same 62 states for a pseudo-language phrase. Each of these five sets of stimuli were subjected to a semantic differential rating task to derive and compare the semantic spaces for emotions. The results show that each of the methods successfully removed the semantic component, but also changed the perception of the emotional content. Interestingly, the pseudo-word stimuli diverged most from the normal sentences. Furthermore, although each of the filters affected the perception of the sentence in some manner, they did so in different ways.","PeriodicalId":185507,"journal":{"name":"Proceedings of the 15th ACM Symposium on Applied Perception","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM Symposium on Applied Perception","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3225153.3225156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Normally, when people talk to other people, they communicate not only using specific words, but also with intentional changes in their voice melody, facial expressions, and gestures. Not only is human communication inherently multimodal, it is also multi-layered. That is, it conveys more than simple semantic information, but also passes on a wide variety of social, emotional, and functional (e.g., conversation control) information. Previous work has examined the perception of socio-emotional information conveyed by words and facial expressions. Here, we build on that work and examine the perception of socio-emotional information based solely on prosody (e.g., speech melody, rate, tempo, intensity). To examine the perception of affective prosody, it is necessary to remove all semantics from the speech signal - without changing the prosody! In this paper, we compare several different state-of-the-art methods for removing semantics. We started by recording an audio database containing a German sentence spoken by 11 people in 62 different emotional states. We then removed or masked the semantics using three different techniques. We also recorded the same 62 states for a pseudo-language phrase. Each of these five sets of stimuli were subjected to a semantic differential rating task to derive and compare the semantic spaces for emotions. The results show that each of the methods successfully removed the semantic component, but also changed the perception of the emotional content. Interestingly, the pseudo-word stimuli diverged most from the normal sentences. Furthermore, although each of the filters affected the perception of the sentence in some manner, they did so in different ways.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
情绪言语的语义空间及不同韵律隔离方法对其感知的影响
通常,当人们与他人交谈时,他们不仅使用特定的词语,而且还会有意地改变他们的声音旋律、面部表情和手势。人类的交流不仅本质上是多模式的,而且是多层次的。也就是说,它传达的不仅仅是简单的语义信息,而且还传递各种各样的社会、情感和功能(例如,会话控制)信息。之前的研究研究了通过语言和面部表情传达的社会情感信息的感知。在此,我们以该工作为基础,研究了仅基于韵律(例如,语音旋律,速率,节奏,强度)的社会情感信息感知。为了检验情感韵律的感知,有必要从语音信号中去除所有语义——而不改变韵律!在本文中,我们比较了几种不同的最先进的去除语义的方法。我们首先录制了一个音频数据库,其中包含了11个人在62种不同情绪状态下所说的德语句子。然后,我们使用三种不同的技术删除或掩盖语义。我们还为一个伪语言短语记录了同样的62种状态。这五组刺激中的每一组都进行了语义差异评定任务,以推导和比较情绪的语义空间。结果表明,每种方法都成功地去除了语义成分,但也改变了对情感内容的感知。有趣的是,假词刺激与正常句子的差异最大。此外,尽管每个过滤器都以某种方式影响句子的感知,但它们的作用方式不同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigating perception time in the far peripheral vision for virtual and augmented reality Individual differences and impact of gender on curvature redirection thresholds A comparison of eye-head coordination between virtual and physical realities The role of avatar fidelity and sex on self-motion recognition Effects of virtual acoustics on target-word identification performance in multi-talker environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1