Unsupervised learning in cross-corpus acoustic emotion recognition

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163986

Zixing Zhang, F. Weninger, M. Wöllmer, Björn Schuller

引用次数: 111

Abstract

One of the ever-present bottlenecks in Automatic Emotion Recognition is data sparseness. We therefore investigate the suitability of unsupervised learning in cross-corpus acoustic emotion recognition through a large-scale study with six commonly used databases, including acted and natural emotion speech, and covering a variety of application scenarios and acoustic conditions. We show that adding unlabeled emotional speech to agglomerated multi-corpus training sets can enhance recognition performance even in a challenging cross-corpus setting; furthermore, we show that the expected gain by adding unlabeled data on average is approximately half the one achieved by additional manually labeled data in leave-one-corpus-out validation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

跨语料库声学情感识别中的无监督学习

数据稀疏性是自动情绪识别的瓶颈之一。因此，我们通过大规模研究无监督学习在跨语料库声学情感识别中的适用性，包括六个常用的数据库，包括动作和自然情感语音，涵盖各种应用场景和声学条件。我们表明，即使在具有挑战性的跨语料库设置中，将未标记的情感语音添加到聚集的多语料库训练集也可以提高识别性能;此外，我们表明，通过添加未标记数据获得的预期增益平均大约是在leave-one-corpus- outs验证中添加手动标记数据获得的预期增益的一半。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量

期刊最新文献

Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech