利用语言信息改进多维语音情感识别中的价态预测

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI:10.1109/O-COCOSDA50338.2020.9295032

Bagus Tris Atmaja, M. Akagi

{"title":"利用语言信息改进多维语音情感识别中的价态预测","authors":"Bagus Tris Atmaja, M. Akagi","doi":"10.1109/O-COCOSDA50338.2020.9295032","DOIUrl":null,"url":null,"abstract":"In dimensional emotion recognition, a model called valence, arousal, and dominance is widely used. The current research in dimensional speech emotion recognition has shown a problem that the performance of valence prediction is lower than arousal and dominance. This paper presents an approach to tackle this problem: improving the low score of valence prediction by utilizing linguistic information. Our approach fuses acoustic features with linguistic features, which is a conversion from words to vectors. The results doubled the performance of valence prediction on both single-task learning single-output (predicting valence only) and multitask learning multi-output (predicting valence, arousal, and dominance). Using a proper combination of acoustic and linguistic features not only improved valence prediction, but also improved arousal and dominance predictions in multitask learning.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information\",\"authors\":\"Bagus Tris Atmaja, M. Akagi\",\"doi\":\"10.1109/O-COCOSDA50338.2020.9295032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In dimensional emotion recognition, a model called valence, arousal, and dominance is widely used. The current research in dimensional speech emotion recognition has shown a problem that the performance of valence prediction is lower than arousal and dominance. This paper presents an approach to tackle this problem: improving the low score of valence prediction by utilizing linguistic information. Our approach fuses acoustic features with linguistic features, which is a conversion from words to vectors. The results doubled the performance of valence prediction on both single-task learning single-output (predicting valence only) and multitask learning multi-output (predicting valence, arousal, and dominance). Using a proper combination of acoustic and linguistic features not only improved valence prediction, but also improved arousal and dominance predictions in multitask learning.\",\"PeriodicalId\":385266,\"journal\":{\"name\":\"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/O-COCOSDA50338.2020.9295032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在维度情绪识别中，一种被称为效价-唤醒-支配的模型被广泛使用。目前在多维语音情感识别研究中存在着效价预测低于唤醒和优势预测的问题。本文提出了一种解决这一问题的方法:利用语言信息改善价态预测的低分。我们的方法融合了声学特征和语言特征，这是一个从单词到向量的转换。结果表明，效价预测在单任务学习单输出(仅预测效价)和多任务学习多输出(预测效价、唤醒和支配)上的表现都提高了一倍。在多任务学习中，适当地结合声学和语言特征不仅可以提高效价预测，而且可以提高唤醒和优势预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information

In dimensional emotion recognition, a model called valence, arousal, and dominance is widely used. The current research in dimensional speech emotion recognition has shown a problem that the performance of valence prediction is lower than arousal and dominance. This paper presents an approach to tackle this problem: improving the low score of valence prediction by utilizing linguistic information. Our approach fuses acoustic features with linguistic features, which is a conversion from words to vectors. The results doubled the performance of valence prediction on both single-task learning single-output (predicting valence only) and multitask learning multi-output (predicting valence, arousal, and dominance). Using a proper combination of acoustic and linguistic features not only improved valence prediction, but also improved arousal and dominance predictions in multitask learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)

自引率

0.00%

发文量