An Evolutionary-based Generative Approach for Audio Data Augmentation

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2020-09-21 DOI:10.1109/MMSP48831.2020.9287156

Silvan Mertes, Alice Baird, Dominik Schiller, Björn Schuller, E. André

引用次数: 12

Abstract

In this paper, we introduce a novel framework to augment raw audio data for machine learning classification tasks. For the first part of our framework, we employ a generative adversarial network (GAN) to create new variants of the audio samples that are already existing in our source dataset for the classification task. In the second step, we then utilize an evolutionary algorithm to search the input domain space of the previously trained GAN, with respect to predefined characteristics of the generated audio. This way we are able to generate audio in a controlled manner that contributes to an improvement in classification performance of the original task. To validate our approach, we chose to test it on the task of soundscape classification. We show that our approach leads to a substantial improvement in classification results when compared to a training routine without data augmentation and training with uncontrolled data augmentation with GANs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于进化的音频数据增强生成方法

在本文中，我们引入了一个新的框架来增强原始音频数据用于机器学习分类任务。对于我们框架的第一部分，我们使用生成对抗网络(GAN)来为分类任务创建源数据集中已经存在的音频样本的新变体。在第二步中，我们利用进化算法来搜索先前训练的GAN的输入域空间，相对于生成音频的预定义特征。通过这种方式，我们能够以一种可控的方式生成音频，这有助于提高原始任务的分类性能。为了验证我们的方法，我们选择在音景分类任务上进行测试。我们表明，与没有数据增强的训练常规和使用gan进行无控制数据增强的训练相比，我们的方法在分类结果上有了实质性的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

自引率

0.00%

发文量