An experimental study of speech emotion recognition based on deep convolutional neural networks

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI:10.1109/ACII.2015.7344669

W. Zheng, Jian Yu, Yuexian Zou

{"title":"An experimental study of speech emotion recognition based on deep convolutional neural networks","authors":"W. Zheng, Jian Yu, Yuexian Zou","doi":"10.1109/ACII.2015.7344669","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition (SER) is a challenging task since it is unclear what kind of features are able to reflect the characteristics of human emotion from speech. However, traditional feature extractions perform inconsistently for different emotion recognition tasks. Obviously, different spectrogram provides information reflecting difference emotion. This paper proposes a systematical approach to implement an effectively emotion recognition system based on deep convolution neural networks (DCNNs) using labeled training audio data. Specifically, the log-spectrogram is computed and the principle component analysis (PCA) technique is used to reduce the dimensionality and suppress the interferences. Then the PCA whitened spectrogram is split into non-overlapping segments. The DCNN is constructed to learn the representation of the emotion from the segments with labeled training speech data. Our preliminary experiments show the proposed emotion recognition system based on DCNNs (containing 2 convolution and 2 pooling layers) achieves about 40% classification accuracy. Moreover, it also outperforms the SVM based classification using the hand-crafted acoustic features.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"19 1","pages":"827-831"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"145","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACII.2015.7344669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 145

Abstract

Speech emotion recognition (SER) is a challenging task since it is unclear what kind of features are able to reflect the characteristics of human emotion from speech. However, traditional feature extractions perform inconsistently for different emotion recognition tasks. Obviously, different spectrogram provides information reflecting difference emotion. This paper proposes a systematical approach to implement an effectively emotion recognition system based on deep convolution neural networks (DCNNs) using labeled training audio data. Specifically, the log-spectrogram is computed and the principle component analysis (PCA) technique is used to reduce the dimensionality and suppress the interferences. Then the PCA whitened spectrogram is split into non-overlapping segments. The DCNN is constructed to learn the representation of the emotion from the segments with labeled training speech data. Our preliminary experiments show the proposed emotion recognition system based on DCNNs (containing 2 convolution and 2 pooling layers) achieves about 40% classification accuracy. Moreover, it also outperforms the SVM based classification using the hand-crafted acoustic features.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度卷积神经网络的语音情感识别实验研究

语音情感识别(SER)是一项具有挑战性的任务，因为人们不清楚什么样的特征能够从语音中反映出人类情感的特征。然而，传统的特征提取方法在不同的情感识别任务中表现不一致。显然，不同的谱图提供了反映不同情绪的信息。本文提出了一种基于深度卷积神经网络(DCNNs)的基于标记训练音频数据的有效情感识别系统的系统方法。具体而言，计算对数谱图，并采用主成分分析(PCA)技术进行降维和抑制干扰。然后将PCA白化后的谱图分割成互不重叠的段。构建DCNN是为了从带有标记的训练语音数据片段中学习情感的表示。我们的初步实验表明，基于DCNNs(包含2个卷积和2个池化层)的情绪识别系统的分类准确率约为40%。此外，它还优于使用手工声学特征的基于支持向量机的分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

自引率

0.00%

发文量