基于CNN+LSTM架构的数据增强语音情感识别

Caroline Etienne, Guillaume Fidanza, Andrei Petrovskii, L. Devillers, B. Schmauch
{"title":"基于CNN+LSTM架构的数据增强语音情感识别","authors":"Caroline Etienne, Guillaume Fidanza, Andrei Petrovskii, L. Devillers, B. Schmauch","doi":"10.21437/SMM.2018-5","DOIUrl":null,"url":null,"abstract":"In this work we design a neural network for recognizing emotions in speech, using the IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. We examine the techniques of data augmentation with vocal track length perturbation, layer-wise optimizer adjustment, batch normalization of recurrent layers and obtain highly competitive results of 64.5% for weighted accuracy and 61.7% for unweighted accuracy on four emotions.","PeriodicalId":158743,"journal":{"name":"Workshop on Speech, Music and Mind (SMM 2018)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"78","resultStr":"{\"title\":\"CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation\",\"authors\":\"Caroline Etienne, Guillaume Fidanza, Andrei Petrovskii, L. Devillers, B. Schmauch\",\"doi\":\"10.21437/SMM.2018-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we design a neural network for recognizing emotions in speech, using the IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. We examine the techniques of data augmentation with vocal track length perturbation, layer-wise optimizer adjustment, batch normalization of recurrent layers and obtain highly competitive results of 64.5% for weighted accuracy and 61.7% for unweighted accuracy on four emotions.\",\"PeriodicalId\":158743,\"journal\":{\"name\":\"Workshop on Speech, Music and Mind (SMM 2018)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"78\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Speech, Music and Mind (SMM 2018)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/SMM.2018-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Speech, Music and Mind (SMM 2018)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SMM.2018-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 78

摘要

在这项工作中,我们设计了一个神经网络来识别语音中的情绪,使用IEMOCAP数据集。根据音频分析的最新进展,我们使用了一种架构,包括卷积层,用于从原始频谱图中提取高级特征,以及用于聚合长期依赖关系的循环层。我们研究了用声道长度扰动、分层优化器调整、周期性层的批归一化来增强数据的技术,并在四种情绪上获得了加权准确率64.5%和非加权准确率61.7%的极具竞争力的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation
In this work we design a neural network for recognizing emotions in speech, using the IEMOCAP dataset. Following the latest advances in audio analysis, we use an architecture involving both convolutional layers, for extracting high-level features from raw spectrograms, and recurrent ones for aggregating long-term dependencies. We examine the techniques of data augmentation with vocal track length perturbation, layer-wise optimizer adjustment, batch normalization of recurrent layers and obtain highly competitive results of 64.5% for weighted accuracy and 61.7% for unweighted accuracy on four emotions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Time-frequency spectral error for analysis of high arousal speech A component-based approach to study the effect of Indian music on emotions Analysis of Speech Emotions in Realistic Environments Emotional Speech Classifier Systems: For Sensitive Assistance to support Disabled Individuals Discriminating between High-Arousal and Low-Arousal Emotional States of Mind using Acoustic Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1