Exploring dataset similarities using PCA-based feature selection

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI:10.1109/ACII.2015.7344600

Ingo Siegert, Ronald Böck, A. Wendemuth, Bogdan Vlasenko

引用次数: 6

Abstract

In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets, introducing a new ranking method. Further, this enables us to present a method for automatic identification of groups of corpora with similar characteristics. This answers an urgent question in classifier development, namely whether data from different corpora is similar enough to jointly be used as training material, overcoming shortage of material in matching domains. We compare the results of this method with manual groupings of corpora. We consider the established emotional speech corpora AVIC, ABC, DES, EMO-DB, ENTERFACE, SAL, SMARTKOM, SUSAS and VAM, however our approach is general.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用基于pca的特征选择探索数据集的相似性

在语音情感识别中，一些成熟的语料库被用于分类引擎的开发。数据的注释方式不同，该领域的社区使用各种特征提取方案。本文的目的是研究单个语料库的有前途的特征，然后比较跨数据集提出最优特征的结果，引入一种新的排序方法。此外，这使我们能够提出一种具有相似特征的语料库组的自动识别方法。这就解决了分类器开发中一个迫切需要解决的问题，即不同语料库的数据是否足够相似，可以共同用作训练材料，从而克服匹配领域的材料短缺问题。我们将这种方法的结果与人工对语料库进行分组的结果进行比较。我们考虑已建立的情绪语音语料库AVIC, ABC, DES, EMO-DB, ENTERFACE, SAL, SMARTKOM, SUSAS和VAM，但我们的方法是一般的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

自引率

0.00%

发文量